Instruction Set Architectures and Performance

Instruction set architectures

Instruction Set Architectures (ISA) or “computer architectures” is a common interface and boundary between software and hardware when you interact with a machine. More formally, ISAs serve as a bridge between hardware and software, documenting hardware capabilities to the operating system. However, I think of ISAs as a definition of what a given set of hardware can understand, so that system software, specifically the compilation process, can accommodate the hardware it is running on. Additionally, ISAs allow programmers to see how high-level languages, such as C and Java, are implemented.

ISAs work by providing a collection of rules that define a machine language; in other words, different ISAs result in different machine languages.

The machine languages that ISAs define are used by “implementations,” which are hardware components that execute instructions, such as the central processing unit (CPU). The collection of rules are known as “sets”—instruction set architecture—and it refers to the set of predefined opcodes that are valid for the given CPU architecture.

Common instruction set architectures

You can tell what ISA a computer uses by looking at assembly language. Popular ISAs include ARM, MIPS, and 80×86.

Generally, ISAs can affect program performance, which is often measured by the execution formula. If you are familiar with the execution formula, continue to the ISA Spectrum section.

Execution formula

The execution formula calculates how long a program takes to run inside the CPU, which is known as the response or execution time. The formula encompasses the total time, meaning that it includes the time taken for disk accesses, memory accesses, I/O activities, operating system overhead, CPU execution time, and so on (Patterson and Hennessy).

Execution = 
          = (IC) * (AT) 
          = (IC) * (CCT) * (CPI)

The instruction count (IC) represents the instructions that will be executed by the CPU, while the average time (AT) represents the mean time it will take each instruction to execute. The AT is calculated by multiplying the clock cycle time (CCT) by cycles per instruction (CPI).

Components of performance	Units of measure
CPU execution time for a program	Seconds for the program
Instruction count (IC)	Instructions executed for the program
Clock cycles per instruction (CPI)	Average number of clock cycles per instruction
Clock cycle time (CCT)	Seconds per clock cycle

Patterson and Hennessy

Clock cycle time

A clock cycle is the time for one clock period, usually of the processor clock, which runs at a constant rate. It is also known as a tick, clock tick, clock period, clock, or cycle (Hennessy and Patterson).

The clock cycle time (CCT) is usually a fixed number for any given machine and is equal to the inverse of a computer’s frequency (the processor’s operational clock cycles per second). I need to specify “usually” because of turbo capabilities.

Turbo

Computers rely on electricity, wires, and resistance to communicate with electric signals. Wires transfer the electricity, which powers the computer. However, electricity creates heat, warming the computer’s wires. As the wire’s temperature increases, so does the resistance, which slows down the speed of electricity and clock frequency, and as a result, the computer. The computer monitors the temperature, so that it does not overheat.

However, if a computer has turbo capabilities, it can monitor its overall temperature and the temperature of each core. If only a few cores are running, turbo capabilities allow the computer to raise the temperature and clock speed, which increases the CCT.

For example, if you are only using two cores on an eight-core computer, the computer allows the overall internal temperature to increase because it knows you are not maxing out all of the cores’ memory.

Cycles per instruction

Cycles per instruction (CPI) is a measure of sequential instructions a task will generally take. I need to specify ‘generally’ because we are referring to an average. Every instruction will take a different number of cycles. However, cycles also depend on the data you are using. For example, are you manipulating a 4-char string versus a 1000-char string? For this blog post, I pretend CPI is a complete average and do not perform mathematical calculations. Instead, I’m using generalizations and tables.

🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, academics, boats, software freedom, you get the idea.

ISA spectrum

The ISA spectrum classifies ISAs by architectural complexity. The two extremes are Complex Instruction Set Computers (CISC) and Reduced Instruction Set Computers (RISC); note that in this instance ISA and Instruction Set Computer (ISC) are synonymous. In other words, the two extremes are complex ISA versus reduced ISA.

Compatibility

It is important to note that if you use the wrong ISA, the software won’t work. The software literally can’t run because the hardware is only compatible with a given ISA. To give a somewhat unrelated example, think of running a DS game cartridge on a PSP; it just won’t work. Thus, comparing CISC and RISC on a specific machine is a matter of compatibility, not a matter of efficiency.

We will compare CISC and RISC at a broad and generalized level.

Comparing CISC and RISC

Extremes on the instruction set architecture spectrum

Comparing CISC and RISC from a technical standpoint is difficult if you do not know how to read multiple ISAs. Thus, I’ll explain using people building a house. I also include a comparison table near the end.

CISC

In CISC architectures (e.g., VAX), some instructions do a lot of work. In CISC, the instruction count (IC)—you know, in the execution time formula—will be low because the instructions/steps do more work. With CISC, you need fewer big steps to complete a task. As a result, CISC architectures have higher CPI cycles because every instruction does more work.

I like to compare CISC architectures to workers building a home with four prefabs and a roof. A significant part of the house is complete when each prefab is attached. However, the workers require more muscle to lift the prefabs when compared with building a house out of many small components like wooden planks and nails.

In this example, the prefabs represent how CISC architectures have fewer instructions, but the instructions are complex (i.e., heavy) and complete more of a task (i.e., adding a prefab to build the house). The worker’s increased strength represents how it takes more CPI cycles to execute each instruction (i.e., attach each part of the house).

Prefab gif, used on a post about instruction set architectures. — Architectural Details: The Perfect Prefabricated Home

RISC

In RISC architectures (e.g., MIPS), all instructions do less work (or “a little” work). In RISC, the IC will be high because the steps only do a little work. Thus, RISC programs require many little instructions to complete a task. Since the instructions are small, they accomplish less work, so the CPI will have fewer cycles.

I like to compare RISC architectures to workers building a home with wooden planks and nails. Only a small portion of the house is complete when each plank and nail is attached. However, the workers require less muscle to attach the planks when compared to building a house with larger components, like prefabs.

In this example, the planks and nails represent how RISC architectures have more instructions, but the instructions are simpler (i.e., lighter) and complete a small task (i.e., adding a plank or nail to build the house). The worker’s decreased strength represents how it takes fewer CPI cycles to execute each instruction (i.e., attach each part of the house).

Barn raising GIf. Used on a post about instruction set architectures. — Ohio Amish Barn Raising on 13 May 2014

When comparing the same program in RISC and CISC formats, RISC files often (not always) have fewer instruction sets than CISC. Thus, substituting the “reduced” in RISC with “simple” makes more sense logically. However, it would be inconvenient if “simple” replaced “reduced” because many people phonetically pronounce “RISC” and “CISC” even though they are acronyms. It would be hard to understand what people are talking about if it was “SISC” and “CISC” because they are phonetically identical.

Comparison

Table

CISC	RISC
Emphasis on hardware	Emphasis on software
Complex instructions	Reduced instructions only
Memory-to-memory: “LOAD” and “STORE” are incorporated in the instructions. This means it can perform REG to REG or REG to MEM or MEM to MEM.	Register to register: “LOAD” and “STORE” are independent instructions. This means it can only perform REG to REG arithmetic operations.
Small code sizes, high cycles per second	Low cycles per second, large code sizes
Includes multi-clock: Instruction may take more than one clock cycle to execute.	Single-clock: Instructions are executed in a single clock cycle.
Transistors are primarily used for storing complex instructions.	Spends more transistors on memory registers. In other words, transistors are primarily used for more registers.
Uses both hardwired and microprogrammed control unit	Uses only hardwired control unit
Variable sized instructions	Fixed sized instructions
Instructions are larger than the size of one word	An instruction fit in one word

Combination of information found in Computer Organization of RISC and CISC on GeeksforGeeks and RISC vs. CISC by Stanford University

Program performance

Generally, instruction set architectures can affect program performance; CISC and RISC illustrate this concept well. On one hand, CISC minimizes the number of instructions per program, which sacrifices the number of cycles per instruction. On the other hand, RISC reduces the cycles per instruction, which sacrifices the number of instructions per program.

Execution	IC	CPI	CCT
CISC	lower	higher	same
RISC	higher	lower	same

Comparing program performance of CISC and RISC

Again, these comparisons are generalizations. If you want to compare a specific program performance on CISC and RISC architectures, it is better to run the program on the architectures you want to compare to see which best suits your needs.

Have two instruction set architectures or CPUs?

If you want to compare two CPUs, you can use the CPU performance or execution time:

There are two things to remember with this formula.

First, remember that performance is the inverse of execution time, which is why subscript B is in the denominator for performance and in the numerator for execution.

Second, please note that the subscript represents a computer. For instance, Performance _A / Performance _B = n to calculates how many times faster Computer _A is than Computer _B.

Here, you will know computer _A is faster than computer _B if n is above 1. Computer _B is faster if n is below 0. They have equal performance if n is equal to 1 (Patterson and Hennessy).

Instruction set architecture articles

If you enjoyed this post on instruction set architectures, checkout my other assembly posts, like Understanding Computer Architecture and Development Objectives. Additionally, I recommend, Converting High Level Languages to Machine Language.

Citations