CFI with Clang, macOS, and Clang on macOS

It’s hard to believe there was a time when I didn’t know what control flow integrity (CFI) was; now, understanding it is part of my job as an Apple security researcher. Admittedly, I still struggle with it. If you’re reading my posts as a way to get into security research (or at least the type of research I often write about), learning about CFI is where you want to start.

CFI ensures that program execution follows legitimate flows as expected by the developer. It aims to prevent me from altering the execution path of an application, which could lead to exploits (e.g., code injection, buffer overflows, or return-oriented programming (ROP) attacks).

Let’s consider a scenario where I try to exploit a buffer overflow to hijack the control flow of a program:

Without CFI, I can overwrite a return address on the stack and divert execution to a ROP chain, executing malicious code.
With CFI, the runtime checks will ensure that any indirect control transfers (e.g., function pointers or return addresses) are valid according to the control flow graph (more on this later). When I try to redirect execution outside of valid control paths, the runtime check will fail, and the program will terminate or trigger an exception.

In short, CFI helps prevent me from hijacking a program’s control flow. This blog post explores CFI and its implementation in Clang, macOS, and Clang on macOS.

🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, boats, reversing, software freedom, you get the idea.

CFI and CFG overview

CFI works by enforcing that the control flow of a program’s execution (during runtime) adheres to a precomputed control flow graph (CFG). This graph defines all of the valid execution paths of the program, and it helps prevent me from redirecting the program’s execution to unintended parts of the code that allow me to carry out attacks.

Let’s explore CFG diagrams.

Diagrams can illustrate CFGs. Components of CFG diagrams include:

Nodes – represent blocks of sequentially executed code.
Edges (arrows) – indicate possible jumps, branches, or function calls between code blocks.
- Direct edges – show simple control transfers like moving to the next line of code or a fixed function call.
- Indirect edges – show control transfers that depend on dynamic values like function pointers or return addresses.

CFG example. Used on a post about control flow

Source

Control-flow-graph-CFG-of-the-DEC-operation Used on a post about control flow

Source

CFG Examples

The CFG diagrams depicts the possible control flow paths a program can take during its execution. Each node in the diagram typically represent a basic block of code (a sequence of instructions that execute sequentially without any jumps or branches). The edges (arrows) between nodes represent control transfers such as jumps, function calls, and returns.

Of course, the computer does not use visual diagrams for CFI. Instead, it relies on an implementation of the logic displayed in these diagrams, which is usually an adjacency list, matrices, or custom graph-based representations.

The CFG helps to map out all the legitimate ways the program can move from one instruction to another. This structure is precomputed during the intermediate code generation phase of compilation (before the program runs), and is used as a reference for CFI checks at runtime.

What are all the steps of CFI protection?

TLDR steps

Static analysis or precomputing control flow: During intermediate code generation in compilation, a GFG is built to represent all potential execution paths based on control structures (e.g., loops, conditionals, function calls). CFI will use the CFG’s outline of valid control flows later to monitor execution and ensure control transfers follow predefined paths.
Code instrumentation (aka instrumenting code): The compiler inserts runtime checks at critical points (e.g., indirect function calls or returns) to ensure execution follows the valid paths defined in the CFG.
1. Runtime enforcement: At runtime, CFI ensures the program follows only the precomputed paths. Any deviation triggers alerts or halts execution to prevent potential attacks.
2. Security enforcement: CFI detects and blocks deviations from expected control flow using the CFG, protecting against attacks like ROP or code injection.

Long steps

Earlier, I stated that:

“CFI works by enforcing that the control flow of a program adheres to a precomputed CFG, which defines all the valid execution paths of the program.”

So, how does this work?

The phrase “precomputed control flow” refers to the CFG being computed in advance, before the program’s execution. This precomputation is needed to enforce security.

The goal of CFI is to restrict the set of possible control-flow transfers to those that are strictly required for correct program execution. This prevents code-reuse techniques (e.g., ROP) from working because they would cause the program to execute control-flow transfers, which are illegal under CFI.

Conceptually, most CFI mechanisms follow a two-phase process. An analysis phase constructs the CFG which approximates the set of legitimate control-flow transfers. This CFG is then used at runtime (“runtime phase”) by an enforcement component to ensure that all executed branches correspond to edges in the CFG.

Step 1. Analysis (CFG construction) phase

During compilation (or a dedicated static/preprocessing step), the compiler, or an offline analyzer, builds an approximate CFG for each module or for the whole program.

That CFG encodes all legal targets for every indirect control‐flow transfer (i.e., indirect calls, indirect jumps, and often, return sites). Because it’s built via static analysis, it tends to be conservative: any edge that might be taken under some combination of inputs ends up in the graph.

The result is a mapping from each indirect‐call site (or return site) to a set of legal targets: function entry points, jump labels, or return-address checks.

Step 2. Instrumentation + runtime enforcement phase

Next comes code instrumentation.

Here, the compiler inserts checks around every indirect control transfer. For example, just before an indirect call, I might see something like:

	// pseudocode inserted by the compiler
	// ∉ is set-theory notation for "is not an element of."
	if (target_address ∉ valid_targets_for_site_X) {
	halt_or_report_violation();
	}
	goto *target_address;

view raw CFI_check_pseudocode.txt hosted with ❤ by GitHub

Likewise, return‐address checks (or a shadow stack) ensure that the function return is going back to a valid callsite.

A shadow stack is a separate, protected stack that mirrors return addresses so the program can verify that actual returns match the expected call sites; I’ll explain more about these later.

When the program runs, each time it’s about to do an indirect call/jump (or a return), it looks up the precomputed list of allowed targets and verifies that the actual target is in that list. If not, it raises an exception, terminates the process, or otherwise alerts the system that a control‐flow hijack was attempted. This process is known as “runtime enforcement,” and I like to this of it like a table of outcomes with corresponding actions in x86 terms:

    

Instrumentation Point
Inserted Check
Data Structure
On Failure


Indirect call (call *rsi)
is_valid_target_for_site_X(target_address)
Per-site whitelist (array, map)
Abort or log & continue

Indirect jump (jmp *rax)
is_valid_target_for_site_Y(target_address)
Ditto
Abort or log & continue

Function return
Compare pop(shadow_stack) vs. CPU’s return‐address register
Shadow stack (separate stack)
Abort or log & continue

Virtual-call dispatch (C++)
Same as indirect call: check vtable pointer against allowed set
Vtable whitelist per call-site
Abort or log & continue



  

Instrumentation Point	Inserted Check	Data Structure	On Failure
Indirect call (`call *rsi`)	`is_valid_target_for_site_X(target_address)`	Per-site whitelist (array, map)	Abort or log & continue
Indirect jump (`jmp *rax`)	`is_valid_target_for_site_Y(target_address)`	Ditto	Abort or log & continue
Function return	Compare `pop(shadow_stack)` vs. CPU’s return‐address register	Shadow stack (separate stack)	Abort or log & continue
Virtual-call dispatch (C++)	Same as indirect call: check vtable pointer against allowed set	Vtable whitelist per call-site	Abort or log & continue

view raw Runtime-check-table.md hosted with ❤ by GitHub

In totality, this process gets more complicated, and thus, there are nuances to these steps.

Nuances

First, is precision vs. performance.

If the CFG is too coarse (e.g., grouping all functions of the same type into one bucket), I’ll get lower runtime overhead but weaker security, because an attacker can still jump between any two functions in the same bucket. This is called coarse-grained CFI.

If the CFG is very fine-grained (e.g., each indirect callsite has its own whitelist of a handful of function entry points), I’ll get stronger protection but more instrumentation overhead and a more complicated static analysis. This is called fine-grained CFI.

Another nuance is return-address protection.

Some CFI schemes instrument only indirect calls/jumps and do not explicitly check returns, instead relying on a separate shadow stack for return addresses. Others embed return‐address checks directly into the CFG (so “return” is just another kind of indirect branch that must point back to a valid callsite).

On that note, there is also…

CFG over-approximation

The static analysis usually “over-approximates” all possible targets because it must remain sound in the face of dynamic languages, function pointers in C/C++, virtual calls in C++, etc.

Put simply, static analysis tends to cast a “wide net,” listing every function that might possibly be called, because features like function pointers, virtual methods, and other dynamic behaviors make it impossible to know exactly which target will actually be used at runtime.

In practice this means:

If you have a function pointer fp of type void(*)(int, int), the analysis might include every function with the same signature as a possible target, unless you do whole-program devirtualization or link-time type inference.
For virtual methods in C++, the analysis often builds a v-table-based CFG, but if you use multiple inheritance or dynamic loading, you still end up adding more edges to be safe.

The last nuance is runtime overhead.

Because every indirect call and/or return now has a check, there’s a small but measurable performance cost (often in the single-digit percentage range, depending on how aggressive the CFI is).

In many toolchains (e.g., LLVM’s CFI, Microsoft’s Control Flow Guard, or other commercial/hardened toolchains), the checks are optimized into a read-only table lookup or are held in hardware-accelerated metadata if the platform supports it.

Outside of all of these nuances though, the two-phase description is generally true. Conceptually, every CFI implementation has to (1) build or approximate a CFG of legal edges, and (2) insert runtime checks so that, at execution time, every indirect branch or return is validated against that precomputed CFG.

In regards to how “real implementations” of CFI work, there are some variations in how “fine” the CFG is, how returns are handled (shadow stack vs. explicit return checks), and how checks are implemented (software versus hardware-assisted).

But these nuances are only half the battle. That said, you might be wondering…

How can I tell when programs deviate during runtime enforcement?

At runtime, when a CFI check “fails” (i.e., an indirect-branch goes to an unapproved target), I typically see one of two broad behaviors.

First, is a halt of the process.

On the console or stderr, Clang’s CFI sanitizer (enabled with -fsanitize=cfi) by default will immediately abort the program when a bad indirect call is detected. If I compile with diagnostics enabled (e.g. -fno-sanitize-trap=cfi), I’ll even get a message like this before termination:

	foo.cpp:42:15: runtime error: control flow integrity check for type 'A*' failed during indirect call
	0x7fff12345678: note: vtable is of type 'B'

	// It indicates that at runtime, an indirect call via a pointer declared as type A*
	// actually targeted an object whose vtable is for type B, violating the CFI
	// policy and triggering a runtime abort.

view raw cfi-error.txt hosted with ❤ by GitHub

Then the process raises SIGABRT (or on some platforms an illegal-instruction trap) and crashes.

An illegal-instruction trap is a hardware-generated exception that occurs when the CPU tries to execute an undefined or privileged opcode, causing the operating system to intercept and usually terminate or handle the offending process.

Steps

Run under a debugger (GDB, LLDB) and catch the SIGABRT/exception. You’ll see the backtrace land in the CFI trap routine.
Inspect the core dump or crash dump for the violation site.
Watch stderr for the short CFI error diagnostic if it’s available on the OS.

There is also another broad behavior.

Alerting or logging (instead of, or in addition to, crash)

Some deployments prefer to log a violation and continue (or shut down more gracefully).

Let’s look at sanitizer “recover” mode. By adding -fsanitize-recover=cfi, I tell Clang to print the error and keep running (at my own risk). The runtime emits the same diagnostic but hands control back to the program. I can then hook a custom handler to collect metrics.

There are also custom violation handlers. Some implementations let me replace the default trap with a user function, e.g.:

 void __cfi_violation_callback(uintptr_t bad_target) {
   // sends an alert-level msg to the system log reporting the CFI violation and the bad target addr
   syslog(LOG_ALERT, "CFI violation: jump to %#lx", bad_target);
   // maybe unwind or dump stack
 }

view raw replace-default-trap.c hosted with ❤ by GitHub

Then, I’ll see a clear log entry in /var/log/syslog or wherever your app writes its audit trail.

	~/Library/Logs/DiagnosticReports/
	/Library/Logs/DiagnosticReports/
	log show # filter this
	log stream # filter this

view raw Mac-logs.sh hosted with ❤ by GitHub

The steps for checking this are as follows:

Log files or syslog: look for “CFI violation” or your custom tag.
Metrics: if you increment a counter in your handler, you’ll see spikes.
Performance tools: under high-attack volume you’ll observe bursts of aborts or exception-dispatch calls in profilers like perf or some other sort of trace.

Profilers are tools that measure and analyze a program’s runtime performance (e.g., CPU usage, cache misses). Linux’s perf leverages hardware counters to profile system and application events, while other OS-es like Windows provide a kernel- and user-mode event-logging setup. Also, see DTrace and Instruments on XCode for macOS.

In practice, development builds often crash fast with a clear message so you notice CFI misconfigurations immediately. Production builds may log and continue (or eject the offending module) so you can remediate without denying service outright.

I recommend combining stderr/console diagnostics, OS-level crash dumps, and structured logs/events. This way, you get immediate visibility (a crash you can debug) and long-term metrics (audit logs or SIEM alerts) of CFI enforcement in action.

Ok… that was a lot. BUT. THERE. IS. MORE!

CFI implementation RAHHH

Now that we understand how CFI works and how to detect deviations, let’s explore how CFI is implemented.

Control flow

CFI mitigations are primarily concerned with two types of control flow:

Direct control transfers – straightforward jumps and calls to fixed addresses (e.g., function calls and returns).
Indirect control transfers – involve jumps or calls to dynamic addresses (e.g., function pointers, virtual method calls, or return addresses).

Here is an example of direct vs indirect control transfers (in this case, function calls):

	// direct call
	void func() { /* do something */ }
	func(); // direct call

	// indirect call thru a f(x) pointer
	void (*funcPtr)() = &func;
	funcPtr(); // indirect call

view raw direct-vs-indirect.c hosted with ❤ by GitHub

Control flow edges

Related to CF transfers is a concept known as “edges.” There are two types of edges.

Forward-edge

Forward-edge control flow means the flow is moving forward along the intended program path (like calling a function). Forward edge control flow can be direct or indirect.

Forward-edge CFI usually deals with indirect control transfers such as function pointers, virtual calls (in object-oriented programming), or indirect jumps/calls that point to arbitrary code locations. Here, if I can corrupt or overwrite the address used by an indirect jump or call, I can redirect execution to malicious code or ROP gadgets, thereby hijacking the program’s control flow for arbitrary code execution.

Backward edge

On the other hand, backward edge refers to a return address that is used to return from a function to its caller. When a function finishes execution, the program’s control flow typically goes back to the instruction following the call to that function.

In this context, backward edges can be a target for exploitation, especially via stack-based buffer overflows. I can try to overwrite the return address (i.e., the backward edge).

How CFI works in Clang!

“Control Flow Integrity in the Linux Kernel” – Kees Cook (LCA 2020)

Note: In this section, all examples were created by Kees Cook. The image of this blog post is actually with Kees Cook and Alice Ryhl, who helped implement CFI in the Linux kernel and Rust compiler.

Forward edge protection

Before we examine forward edge protection, we need to understand where it is implemented: Clang.

As we all know and love, Clang is a front-end compiler for C, C++, and Objective-C, which is built on top of Low-Level Virtual Machine (LLVM).

LLVM is the underlying framework that provides a set of reusable compilers and toolchains. It performs optimizations, generates machine code, and handles backend processes for multiple architectures. This allows companies to support different hardware targets, such as x86_64 and ARM64.

Clang compiles source code written in these languages into LLVM’s intermediate representation (IR), which can be further optimized and translated into machine code for the target platform.

Ok, so… We understand where forward edge protection was implemented, right? (LLVM’s Clang compiler). Let’s examine the implementation details.

How does CFI implement forward edge protection in these runtime checks?

During runtime, functions and other code are run. CFI validates indirect function pointers at “call time” (i.e., when a function is called during runtime).

Here, CFI groups functions into classes via a function prototype using its return type and argument types, creating a “uniqueness” key.

If functions have the same prototype, the call site can choose any matching function:

int do_fast_path(unsigned long, struct file *file)

int do_slot_path(unsigned long, struct file * file)

If functions have different prototypes, calls cannot be mixed:

void foo(unsigned long)

int bar(unsigned long)

f(x) signature checks

This (the section above) highlights how software CFI uses function-signature checks to whitelist call targets, whereas hardware-assisted CFI (e.g., Intel’s IBT) only enforces that indirect branches land at any valid boundary without distinguishing between different function prototypes.

Hardware‐assisted CFI is very lazy: it only tells me “you’re at an indirect‐branch boundary.” But, it can’t distinguish which targets are valid, so enforcement can’t prevent a jump to any allowed call site.

Software‐based, signature-or prototype-based CFI (as in LLVM) improves on this by using each function’s return-and-parameter types as a signature class. At compile time the compiler groups all functions with the same prototype into a small jump table. Then, at each indirect call, it checks that the target’s signature matches the call-site’s expected prototype.

A jump table is a compile-time-generated array of code addresses (or function pointers) that lets an indirect call or switch statement dispatch or validate its target by indexing into the table.

This approach relies on these per-site jump tables for fast membership tests, but its granularity suffers when prototypes are generic—especially void(void) or functions taking no arguments—because they clump hundreds of functions into a single class. This bloats allowed-target sets and weakens protection.

Granularity refers to how precisely CFI enforcement can distinguish and restrict valid indirect‐call targets. Fine granularity means small, per‐call‐site target sets, while coarse granularity lumps many functions into broad categories.

Forward edge protection in Clang

Global site visibility and link time optimization (LTO)

“In Clang, forward edge protection needs global site visibility, so link time optimization (LTO) becomes a prerequisite.” – Kees Cook

Again, CFI helps protect the “forward edges” of a program’s control flow, which means it ensures that it’s calling the correct one when my program calls a function.

For forward edge protection to work, Clang needs to see the entire code at once to analyze it properly. This is called “global site visibility.”

To get this global view of the code, Clang needs to use link time optimization (LTO), which allows the compiler to do optimization and analysis during the linking stage after all the code is compiled.

Here, LTO is required for CFI because without it, the compiler can’t fully verify control flow across different parts of the program.

SO, how does LTO change the build process?

TLDR: “This needs a fair bit of build script changes because .o files aren’t actually object files anymore, they are LLVM IR, so standard (bfd) binutils don’t work on them anymore (need LLM tools instead). Some symbols get weird due to LTO’s aggressive inlining and other optimizations. Functions with the same prototype, collected into jump tables and checks added at each call site.” – Kees Cook

Normally, when I compile code, the compiler generates .o files (object files). These files are ready for linking and include machine code. Here, bfd (binary file descriptor) and binutils (GNU’s suite of binary utilities for handling object and executable files) work with these binaries and their linking and assembling processes.

For example, bfd allows programs to handle multiple file formats (e.g., ELF, PE, or Mach-O) uniformly.

However, with LTO the .o files aren’t regular .o object files anymore. Instead, they contain LLVM intermediate representation (IR), which is a kind of high-level version of the code that Clang can further optimize.

Because of this change, standard build tools (like bfd binutils) can’t work with these new LLVM IR files. I need to use LLVM tools (e.g., llvm-objdump, llvm-as, and llvm-link) to handle them instead.

Regarding LTO’s impact on the build process, it can make build scripts more complicated. The compiler needs to adjust the tools I am using to account for LLVM IR files instead of regular object files. Also, LTO can make programs behave differently: like how functions are inlined (moved directly into other code for optimization) and performance nonsense.

Function tables and checks

After all of this, Clang collects functions that have the same function signature (meaning they take the same type of inputs and return the same type of output) into jump tables.

At each call site (where a function is called in the code), checks are added to make sure that the function being called matches the expected one. This is part of the security feature in jump tables to prevent malicious or incorrect function calls.

This simplifies the process and the effects of enabling CFI with LTO.

Given we talked about forward-edge protection, we should probably consider…

Backward edge protection

Backward-edge protection focuses on ensuring the return addresses are protected. Clang has a method to keep track of trusted return addresses using a separate stack (i.e., a stack separated from the main stack) called the “shadow call stack,” which is managed by a dedicated register.

Specifically, it’s held in the dedicated SCSReg register, which on AArch64 is X18, on RISC-V’s software shadow stack is X3 (gp), and on RISC-V’s hardware shadow stack is the ssp register.

Shadow stack

ShadowStack – Aaron Yoo

Remember, the stack is a runtime data structure that holds local variables, return addresses, and other data.

The shadow stack is a separate stack designed to hold copies of the return addresses from the original stack. Its purpose is to prevent buffer overflow attacks by checking the return address on the original stack against its copy before proceeding with program execution.

Put simply, the shadow stack is a defense mechanism that validates return addresses to maintain CFI.

How a shadow stack works

When a function is called, a copy of the return address is stored in both the original stack and the shadow stack.

If the program detects a mismatch between the return addresses, the shadow stack forces a crash, stopping the attack.

In normal execution, the return address on the original stack matches the one on the shadow stack, and the program proceeds.

In case of a buffer overflow, where the return address is overwritten, the addresses on the original and shadow stacks won’t match, causing the program to crash and prevent further damage.

In short, shadow stacks show how mismatched return addresses can expose buffer overflows. Because return edges dictate where execution resumes, they are a prime target for da hacker attackers.

Why are return edges vulnerable?

As I discussed previously, backward-edges are usually the target of attacks, and many return-edges are backward-edges.

Return edges are more vulnerable because they rely on return addresses that are dynamically stored on the stack. This is unlike forward edges that are predetermined in the program code.* Buffer overflow attacks target these return addresses to redirect execution.

*As a reminder, forward edges (e.g., function calls) are usually not vulnerable since they are hard-coded into the program’s execution.

However, there is an additional type of protection here.

The BIRDZ (aka stack canaries)

Let’s talk about birds. Stack canaries are a simple yet effective backward-edge defense. The compiler reserves a random “canary” value just before the saved return address on the stack, then instruments each function’s prologue to load that canary and its epilogue to verify it hasn’t been altered before returning.

Aaron Yoo’s Stack Canary Video.

When you compile with -fstack-protector, GCC and Clang automatically inject these checks around every buffer-vulnerable function, aborting the program if the canary’s integrity fails.

The main advantage of canaries is their minimal performance impact and broad availability across platforms. However, they only protect against contiguous overflows (limited depth) and can be undermined if an attacker can leak memory values. Once the canary is known, it can be overwritten as easily as the return address it was meant to guard. Ooops.

Yet, stack canaries and shadow stacks are just two types of CFI protection. At present, we only have an understanding of the CFI implementation in Clang (especially around areas like the shadow stack), let’s examine an alternate approach to implement CFI: pointer authentication (PA).

CFI with pointer authentication

PA is a technique used in systems (e.g., ARM processors) to ensure the integrity of pointers: a variable that stores the memory address of another variable, enabling indirect access and manipulation of that data.

It works by cryptographically signing pointers, including return addresses and function pointers.

Cryptographic signing is the process of using a private key to generate a digital signature on data. It allows anyone with the corresponding public key to verify the data’s authenticity and integrity.

Within the PA process, pointer authentication codes (PACs) are the specific cryptographic codes generated and embedded in the pointer.

PAC is a security feature provided by ARMv8.3-A architecture and above.

When a pointer is used, the system verifies its authenticity before dereferencing it. In this context, the “system” means the combination of your CPU’s pointer‐authentication hardware (for example, ARMv8.3-A’s Pointer Authentication Extension) together with the software (compiler/OS runtime) that automatically signs pointers when they’re created and verifies those signatures before dereferencing them.

If the signature doesn’t match, the control transfer is halted, preventing redirection of control flow.

Put simply, PAC provides hardware-level enforcement of CFI by signing return addresses and function pointers when they are saved to memory and verifying them upon their usage. This helps mitigate attacks like ROP and JOP by ensuring the integrity of control flow data.

See macOS and iOS Security Internals Advent Calendar – Day 3 by xpl0n1c. Also, I realize I have cited this video in at least three blog posts, but I have learned so much from it, and it was my starting point for everything CFI and PAC related. I have watched it at least three times!

CFI in Clang on Apple Silicon: PA vs. shadow stack

In systems where PA is implemented (e.g., some BSD-based systems) the shadow stack is not used because ARM’s Pointer Authentication Extension already secures return addresses in hardware with minimal performance overhead. Thus, Clang on ARM-based systems omits/lacks a separate software shadow stack.

In other words, Clang on macOS does not natively implement the shadow stack for CFI protection.

Instead, Clang/LLVM on macOS has built-in support for PAC, like when compiling for Apple Silicon. It automatically inserts PAC instructions (signing and verification) as part of the compilation process.

Why does macOS’ Clang omit the shadow stack?

Implementing a shadow stack comes with performance costs. It requires additional memory operations and synchronization between the regular stack and the shadow stack.

PAC and CFI (without shadow stack) offer a lower-overhead alternative that is integrated into the existing ARM architecture.

Here, ARM implements backward-edge CFI via Pointer Authentication (PA/PAC-RET) and forward-edge CFI via Branch Target Identification (BTI), rather than using a software shadow stack.

Indirect Branch Tracking (IBT) is Intel’s forward-edge mechanism equivalent.

Hardware, rather than software through shadow stacks, often has better backward-edge protection through Intel’s Control-flow Enforcement Technology (CET) and ARM’s PA.

So, how can Clang have CFI with shadow stacks, but runs on mac, which does not use shadow stacks?

Clang implements forward-edge, software-based CFI (enabled via flags like -fsanitize=cfi) by inserting runtime type-checks that validate indirect calls against expected signatures, and includes an optional ShadowCallStack pass for backward-edge protection. However, on macOS this pass isn’t enabled by default because compiler-rt lacks the necessary runtime support.

The compiler-rt project is LLVM’s runtime-library subproject that provides support routines (builtins), sanitizer and profiling runtimes, and target-specific code-generation hooks for Clang/LLVM-generated binaries.

On Apple Silicon, Clang/LLVM instead relies on ARM’s PA (PAC-RET) to sign and verify return addresses in hardware, eliminating the need for a software shadow stack.

Here, Clang/LLVM leverage BTI for coarse-grained forward-edge enforcement.

Using PAC and CFI together on macOS

On macOS systems, PAC and CFI are used together for control flow protection.

Clang inserts PAC instructions into the compiled code (i.e., for functions that handle pointers or return addresses). CFI is applied at the software-level to enforce that indirect branches only go to valid targets.

This combination protects against many control-flow attacks without needing a shadow stack.

How to enable PAC and CFI in Clang on macOS (this is usually automatic!)

From my understanding (email me if I am wrong), Clang’s driver enables the PA intrinsics by default on the Darwin/arm64e triple (via -fptrauth-intrinsics), so you don’t need any extra flags beyond targeting the right architecture. Thus, this is somewhat silly to explicitly enable.

The Darwin/arm64e triple is Clang’s target specification for generating binaries for Apple’s Darwin OS on ARMv8.3-A “arm64e” hardware with built-in PA support.

But, to explicitly enable CFI, I can pass these flags to Clang:

clang -fsanitize=cfi -flto -o my_program my_program.c

view raw fsanitize-flto.sh hosted with ❤ by GitHub

This will enable CFI during the build process. LTO (-flto) is often required for CFI to work effectively since it enables whole-program analysis.

Going back to the big picture though, if you are compiling for Apple Silicon or another ARMv8.3+ target, PAC is automatically used in conjunction with CFI without the overhead of a shadow stack.

PAC vs shadow stack comparison

Feature	Pointer authentication (PAC)	Shadow stack
Main protection / scope	Protects both forward (function pointers) and backward edges (return addresses)	Primarily protects backward edges (return addresses)
Mechanism	Uses cryptographic signatures (PAC) to authenticate pointers	Stores a protected copy of return addresses and a separate stack
Hardware requirement	Requires hardware support (ARMv8.3-A or newer)	Can be implemented with or without hardware support
Performance impact	Minimal – it only involves generating and verifying PACs	Higher – involves maintaining an additional stacking comparing return addresses
Attack protections (generalized)	Mitigates ROP, JOP, and other pointer-based attacks	Primarily protects against ROP attacks

Conclusion

In short, CFI defends against attacks that hijack a program’s execution; most notably memory-corruption, and code-reuse like ROP and JOP. By ensuring that every branch and call in a running program follows legit paths defined at compile time, CFI blocks me from chaining together gadgets or diverting execution to my yikes code.

Implementations of CFI vary in precision, performance, and coverage, but share the goal of enforcing a program’s control-flow graph. On BSD-based platforms (including iOS and macOS), LLVM-based CFI instruments binaries at compile time to embed runtime checks that validate control transfers against their original CFG.

PAC strengthens this by signing pointers in hardware, detecting corruption before it can be exploited. With ARMv8.3-A and later, much of this enforcement is offloaded to the processor itself, enabling CFI with good performance. These compiler and hardware techniques form a barrier that preserves the intended flow of execution.

While CFI is great, it isn’t a silver bullet. First, the added runtime checks can have performance overhead. Second, the efficacy of CFI hinges on the precision of the control-flow graph: broad CFGs leave “legal” attack paths, whereas strict CFGs risk breaking valid functionality. Finally, you can still probe for weaknesses (e.g., gaps in CFG construction or in the CFI implementation itself) to avoid these defenses.

If you enjoyed this blog post on CFI, consider reading The Anatomy of a Mach-O: Structure, Code Signing, and PAC.