Computer memory and performance
Computer memory is an essential component of modern computing systems, and it plays a crucial role in the performance of a computer. The memory subsystem is responsible for storing and retrieving data quickly, and its design can significantly impact the overall performance of the system. In this blog post, we will explore how computer memory works and how it affects the performance of a computer. We will take a closer look at the design of computer memory, including the different types of memory and their advantages and disadvantages. By the end of this post, you will have a better understanding of computer memory and how its design can impact the performance of a computer. Whether you’re a computer enthusiast, a software developer, or just someone who wants to optimize their computer’s performance, this post will provide you with valuable insights into the workings of computer memory.
There are five central components of a computer: input, output, memory, datapath, and control unit. Each part does a different task.
Input and output devices
Input devices are mechanisms through which the computer is fed information, such as a keyboard. Output devices, on the other hand, are mechanisms that convey the result of a computation to a user, such as a display, or to another computer. Together, input and output devices are known as “I/O devices.”
There are two common I/O implementations for receiving and reading data: interrupt-driven I/O and programmed I/O.
🌸👋🏻 Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.
With interrupt I/O, an interrupt occurs when input is entered. An interrupt is a type of exception where program execution is suspended. During an interrupt, the OS will save the current program counter (PC) into an exception program counter (EPC), so that the PC can be modified by the interrupt. Next, the computer will jump to the address of the interrupt, and execute the interrupt service routine (ISR). The ISR will save and restore (via the program stack) registers used during the interrupt to avoid overwriting values needed by the program that was interrupted. Lastly, the exception return [
eret] instruction moves the EPC address into the PC, so that the computer can return to running the operation that was interrupted.
With programmed I/O, the computer repeatedly reads the input status to see if data has been entered. Programmed I/O can be inefficient because, “the program must repeatedly check the device for new data” (Patterson and Hennessy).
Memory is how computers access and store information; similarly, it is also where programs live and run. For example, before executing a program, you need to read the program from memory.
Many people view memory as a mystery box where programmers will request and receive information from; oftentimes, this outlook is implemented into courses and textbooks because it is beneficial for simplicity and learning the basics of computer architecture and hardware. For the most part, memory will continue to be a mystery box, but I will cover some fun memory information in this post.
Memory, along with the central processing unit and I/O devices, are usually contained on a logic printed circuit board. Below is an image of a circuit board.
Image source: 8051 Development System Circuit Board
Although the diagram above is great, many people may not regularly use this type of circuit board. Thus, let’s examine the iPhone.
Figure 1.4.2: Components of the Apple iPhone Xs Max cell phone
(Patterson and Hennessy)(Courtesy TechInsights, www.techInsights.com).
On the iPhone, the logic printed circuit board contains the processor and the memory. This can be seen in the photo above to the left of the metal casing. The photo below shows a close-up of the board.
Figure 1.4.3: Logic board of Apple iPhone Xs Max (Patterson and Hennessy)(Courtesy TechInsights, www.techInsights.com).
Each of the small rectangles are known as integrated circuits or “chips,” and these can contain millions of transistors.
The chip labeled “A12” holds two, large ARM processor cores, four little ARM processor cores that run at 2.5 GHz, and 2 GiB of main memory. This is the iPhone’s central processor unit (CPU).
You can see the insides of the chip in the image below.
Computer memory hierarchy
Ideally, memory should be fast, large, and cheap, as its speed often, “shapes performance, capacity limits the size of problems that can be solved, and the cost of memory today is often the majority of computer cost” (Patterson and Hennessy). We can understand the conflicting demands of memory with the “memory hierarchy.” You can think of the memory hierarchy as a triangle with multiple layers.
The [triangular] shape indicates speed, cost, and size. The fastest, smallest, and most expensive memory per bit at the top of the hierarchy and the slowest, largest, and cheapest per bit at the bottom.Patterson and Hennessy
Diagram source: Difference between Register and Memory – GeeksforGeeks; styled by Olivia Gallucci.
Primary computer memory
Primary memory is volatile, meaning it holds programs and the data it needs while the program is running. This type of memory only retains data if it is receiving power. There are different types of primary memory: random and sequential.
Random access memory (RAM) is the most expensive type of memory because it is the fastest and can be accessed directly. This means that memory can be accessed in any order or “randomly” rather than sequentially. In general, the speed of the computer depends on how much RAM it has.
Dynamic random access memory (DRAM) provides RAM to any location and is often in the form of an integrated circuit. This is where the program instructions and data needed by the program are stored when they are running.
Static / Cache
Inside the processor there is cache memory. Cache memory is a “small, fast memory that acts as a buffer for a slower, larger memory,” like DRAM (Patterson and Hennessy).
Caches give the programmer the illusion that main memory is nearly as fast as the top of the hierarchy and nearly as big and cheap as the bottom of the hierarchy.Patterson and Hennessy
Cache is built using static random access memory (SRAM). SRAM is faster, less dense, and more expensive than DRAM. SRAM is also built as an integrated circuit.
Unlike RAM, sequential access memory (SAM) has to be accessed in a specific order. For this reason, SAM is often used as secondary storage via magnetic tapes.
Secondary computer memory
Secondary memory is nonvolatile, which means it stores programs and data between runs. In other words, it is a form of memory, such as a DVD disk, that “retains data even in the absence of a power source” (Patterson and Hennessy). Secondary memory often consists of magnetic disks in servers. Additionally, secondary memory is used in phones, specifically for flash memory (internal storage) and SD cards (external storage).
CPU and processors: datapath and control
The datapath performs arithmetic operations, while the control unit (CU) commands the datapath, memory, and I/O devices according to the instructions of the program (Patterson and Hennessy). The datapath and CU are often combined, and referred to as the “processor” or “central processing unit” (CPU).
The stored-program concept is the “idea that instructions and data of many types can be stored in memory as numbers, leading to the stored-program computer” (Patterson and Hennessy). Since the goal of computers is to perform arithmetic and other numerical operations, we need to understand how the computer can access and modify stored data.
Let’s look at addition in MIPS. The goal is to perform the following: [
a = b + c]. The datapath stores the numerical values of operands a, b, and c. Furthermore, the datapath stores the operation of “+” or “add”.
Before you know what the numerical values of b and c are, can you do addition? No. You need the CU to organize the sequence of events of the datapath.
The CU knows how to organize the sequence of data based on the operation stored, the amount of operands, and order of operands. Together, the operation and operands create a standardized format that the computer can understand.
In MIPS, the instruction to perform addition is formatted like: [
add destination, source_one, source_two]. Now, we know that we want our line of pseudo-code to look like [
add a, b, c].
Real MIPS code would look like [add $t0, $t1, $t2], but we are not covering registers in this blog post.
Here, this instruction commands the computer to add the two variables b and c and to put their sum in a.
Note that you can also do [
add a, a, b], which adds the current value of a with the value of b and overwrites a to contain the sum. For example, if a is initially 5 and b is 4, this instruction overwrites a with 9.
Looking back at the diagram, the “when to do it” inside the CU is answered by the “what to do.” For example, in order to add, the CU knows to do an “add destination, source_one, source_two” by using information stored in the datapath. Once the instruction has been read (this is how you tell the CU “what to do”), the CU commands the datapath to do the following:
- Get the values of [source_one] and [source_two]
- Do the addition
- Save the result back in [destination]
Here, you can see that the “add” instruction told the CU the order to do the steps using the information stored in the datapath.
Similar to what is discussed in Instruction Set Architectures and Performance, program performance depends on software and hardware (including input/output (I/O) operations) used to create, translate, and execute machine instructions. However, it also depends on the algorithms used in the program. In this article, we will discuss a brief overview of some of the components that affect performance.
This table summarizes how the hardware and software affect performance.
|Hardware or software component||What the component changes||How this component affects performance|
|Algorithm||Instruction count, possibly CPI||Determines both the number of source-level statements and the number of I/O operations executed|
|(Together) Programming language, compiler, and architecture||Various.||Determines the number of computer instructions for each source-level statement|
|Programming language||Instruction count, CPI||The programming language certainly affects the instruction count, since statements in the language are translated to processor instructions, which determine instruction count. The language may also affect the CPI because of its features; for example, a language with heavy support for data abstraction (e.g., Java) will require indirect calls, which will use higher CPI instructions.|
|Compiler||Instruction count, CPI||The efficiency of the compiler affects both the instruction count and average cycles per instruction, since the compiler determines the translation of the source language instructions into computer instructions. The compiler’s role can be very complex and affect the CPI in complex ways.|
|Instruction set architecture||Instruction count, clock rate, CPI||The instruction set architecture affects all three aspects of CPU performance, since it affects the instructions needed for a function, the cost in cycles of each instruction, and the overall clock rate of the processor.|
Understanding program performance section of 1.6 (Patterson and Hennessy).
I strongly recommend reading Instruction Set Architectures and Performance before continuing this section because it covers the basics needed to understand the rest of this article.
Performance is measured by the amount of work done in a given time. Thus, when comparing two computers, the computer that can do more work in less time is considered faster.
When it comes to measuring how long a given program takes to run (program execution time), we measure the time it takes to complete the program (seconds per program). However, the speed of the program differs depending on what factors you count when you’re timing. For example, are you including CPU execution time, user CPU time, or system CPU time?
|CPU execution time||Also called CPU time. The actual time the CPU spends computing for a specific task.|
|User CPU time||The CPU time spent in a program itself.|
|System CPU time||The CPU time spent in the operating system performing tasks on behalf of the program.|
The most common definition of time is referred to as “elapsed time,” “wall clock time,” or “response time.” Elapsed time is the “total time to complete a task, including disk accesses, memory accesses, input/output (I/O) activities, operating system overhead—everything” (Patterson and Hennessy).
Design concepts in computer memory
Now that we understand how memory affects performance, let’s examine how engineers use design concepts to develop performance-enhancing systems.
Engineers use abstraction to simplify design. Here, an abstraction “represents the design at different levels of representation.”
Abstraction is the purposeful suppression, or hiding, of some details of a process or artifact, in order to bring out more clearly other aspects, details, or structure.Abstraction by Timothy A. Budd
Let’s take a car for example. If drivers needed to know the intimate details of how the engine worked in order to drive a car, very few people would be able to drive cars. Thus, engineers build cars in a manner that allows people to work them without knowing the intimate details of the engine.
In the technical space, “the interface of a hardware or software component should be independent of its implementation” (Pearce). In the car example, the interface consists of parts of the car such as the wheel and pedals, while the implementation consists of the engine and transmission.
For software, you can think of dependencies.
Increasing performance through the “common case” is where developers focus on optimizing the common uses of a program rather than rare cases.
Performance via parallelism is when a design attempts to increase performance by performing operations in parallel.
Pipelining is a type of parallelism and it is similar to water flowing through a pipe. According to Mohamed Ibrahim, “the basic idea is to split the processor instructions into a series of small independent stages. Each stage is designed to perform a certain part of the instruction. At a very basic level, these stages can be broken down into [four steps].”
- Fetch Unit – Fetch an instruction from memory
- Decode Unit – Decode the instruction be executed
- Execute Unit – Execute the instruction
- Write Unit – Write the result back to register or memory
Mohamed Ibrahim: How Pipelining Improves CPU Performance – Stack Pointer
Increasing performance through prediction follows the belief that it is “faster on average to guess and start working rather than wait until you know for sure, assuming that the mechanism to recover from a misprediction is not too expensive and your prediction is relatively accurate” (Patterson and Hennessy).
Computer memory hierarchies
I covered computer memory hierarchies earlier in this article.
Increasing dependability through redundancy follows the belief that “since any physical device can fail, we make systems dependable by including redundant components that can take over when a failure occurs and to help detect failures” (Patterson and Hennessy).
Computer memory and performance
I hope you enjoyed this post on computer memory and its effects on performance. If you want to read more about hardware and low-level software, consider reading Instruction Set Architectures and Performance or some of my assembly posts.
Citation: Patterson, David A., and John L. Hennessy. Computer Organization and Design MIPS Edition: The Hardware/Software Interface. Morgan Kaufmann, 2020.