Signature-based Analysis for Reversing

Woah, we are gonna talk about signature-based analysis. Everyone’s favorite pastime! Signatures are distinctive byte or instruction sequences derived from reference code. Signature-based analysis scans a target binary (or data stream) for known signatures to automatically identify and label matching functions or data segments.

Note this post was separated from “macOS Reversing: Bridging Source and Binary with Open Source as a Guide” due to how long this section was. This posts covers tips 4 and 5:

Tip 4 – Matching binary code to source functions: Identify how compiled functions correlate with their related source code to help understand the binary’s behavior and context.
Tip 5 – Analyzing syscalls and inter-process communication (IPC) messages: During signature-based analysis, understanding system calls and IPC patterns can reveal how a binary interacts with its environment.

Reversing Apple’s OS components often involves a hybrid approach: using whatever source code or headers are available, alongside disassembly and decompilation of the binaries.

Disassembly translates machine code into assembly instructions that closely mirror the original binary, whereas decompilation attempts to reconstruct high-level source code (e.g., C, Objective-C, C++, Java, etc…) from that binary, abstracting away hardware details and recovering structures like loops and functions.

Tools like IDA, Ghidra, Hopper, and Binary Ninja are excellent for analyzing macOS/iOS binaries. There’s always debate about which tool is best for a given task. While that debate has merit, I think it’s a bit like choosing a programming language: once you’ve learned a few, that initial question of which one to start with stops being so important. You can switch tools with a bit of practice. The important part is just starting.

Hopper, Ghidra, Binary Ninja, and IDA Pro app icons.

The goal is to connect functions in disassembly to known open source code where possible, and infer behavior where source is absent.

Matching binary code to source functions (tip 4)

If I have access to the source from Apple’s releases or headers, I can try to correlate it with the disassembled binary.

Headers

# command used to open the IOKitLib.h header in Xcode
open -a Xcode /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk/System/Library/Frameworks/IOKit.framework/Headers/IOKitLib.h

Command to open the header in Xcode. macOS reversing

I recognize I mentioned “headers” my previous post, but those were a different type of headers. In this context, “headers” are the C (or Objective-C) header files that Apple publishes alongside, or as part of, its open-source releases of system libraries and frameworks.

A header file (usually with a .h extension) contains the declarations of:

Functions and methods: their names, parameter types, return type
Data types and structures: (e.g., structs, enums, typedefs)
Constants and macros (#defines, enum values)
Class and protocol interfaces in Objective-C

By loading those headers into my disassembly environment (or just reading them), I can get names, types and API boundaries that I can match up against machine instructions. That makes it easier to say “Ah, at address 0x1000 the code is calling IOServiceOpen(IOService, mach_port_t, uint32_t, olivia_likes_cats, io_connect_t *),” instead of just seeing an opaque call to a numeric symbol.

A symbol is a named identifier (such as a function, variable, or object) used by a compiler or linker to reference a memory address or code location in a program.

Look for unique strings or constants

Back to matching binary code to source functions…

One technique I use is to look for unique strings or constants in the source and search for them in the binary. Some Apple OSS projects include debug messages, assertion texts, or constant values that will appear in the compiled binary.

Finding these in a disassembler helps pinpoint the corresponding function. For instance, if an OSS daemon prints "Error: could not open %s" on failure, searching the binary for that substring takes me right to the error-handling code in assembly.

Use code structure as a map

I can also use the structure of the code as a map.

If I see in the source that function A calls B then C in sequence, I can look at the assembly for similar call patterns. The OSS can reveal function names and algorithm details; even if symbol names are stripped in the binary, the sequence of syscalls or library calls might match the source. This way, I can label sections of the disassembly with the likely function names from the source.

Tools

Where available, I use tools and/or plugins that can help with matching OSS code to binary. These tools use signature matching to automatically recognize functions in a binary that correspond to known open-source implementations. The specific tool I use depends heavily on context; sometimes I only know how to use a particular tool on one platform, even if a similar tool exists on another.

Regardless, signature matching tools are particularly useful for well-known libraries or drivers that haven’t diverged much from the published code. Let’s look at some examples of how researchers have discovered vulnerabilities through this technique.

Example 1: BinDiff and Diaphora with fG!

A great example of this is how fG! uses Diaphora in IDA to diff the macOS syslogd daemon across updates.

Apple provides XNU source after the fact, but to find the bug in syslogd (man syslogd on your mac if you need more info) they used “binary diffing to find differences” between the vulnerable and patched syslogd binaries. fG! notes that “the usual tool is BinDiff but there is also a free alternative called Diaphora.” By diffing the 10.11.2 vs 10.11.3 syslogd executables in IDA, the only changed function was quickly identified.

Once the altered function was found, grepping the Apple open‑source syslogd code for a distinctive string ("add_lockdown_session: realloc failed…") confirmed the match.

🌸👋🏻 Join 10,000+ followers! Let’s take this to your inbox. You’ll receive occasional emails about whatever’s on my mind—offensive security, open source, boats, reversing, software freedom, you get the idea.

Example 2: BinDiff with Wu and Li

This approach has been documented in academic papers and conferences too. Example 2 fuzzes kernel extensions (kexts). A kext on macOS is a loadable kernel module, often encapsulating a device driver. When those drivers are closed-source they are distributed as proprietary KEXT binaries that must be reversed for diff-based analysis.

Wu and Li fuzz Apple kernels by identifying new or changed interfaces since Apple “open-sourced its XNU project for both macOS and iOS.” New kernel entry points can be found by diffing XNU source!

A kernel entry point is any interface exposed by the OS kernel (e.g., a system call, IOCTL command, or driver callback) through which user‐space or other components invoke kernel functionality. Each exposed entry point expands the kernel’s attack surface by having more areas for potential vulnerabilities.

For closed-source drivers, they recommend using an IDA script with BinDiff to diff the same driver across versions. For iOS, all drivers live in the kernelcache, so they suggest splitting them out and then diffing.

Visual by Wu and Li.

One figure even shows BinDiff results for changed driver functions (see above). In their talk (“Fresh Apples,” HITB 2019), Li and Wu used an IDA pro script and the BinDiff plugin to diff multiple iOS/macOS KEXT binaries, finding “entry functions” added or altered between OS versions (their slide caption: “Using IDA pro script with plugin ‘BinDiff’” on KEXTs).

Example 3: Miller and Iozzo

Finally, Miller and Iozzo’s BlackHat 2009 slides (“Fun and Games with OS X/iPhone Payloads“) state that “by using the last version of BinDiff we were able to verify that OS X kernel and iPhone kernel share a similarity of 0.65 and 0.74 on the code signing code.” Here, BinDiff was used to match code between Apple’s Mac and iPhone XNU kernels, identifying common functions.

All of these cases show BinDiff and/or Diaphora effectively matching Apple kernel components (XNU, KEXTs, daemons) across binaries, and thus, annotating patched or known functions.

Ghidra plugins and custom frameworks

If IDA isn’t your thing, there are Ghidra options.

Ghidra_kernelcache provides a Python toolkit for iOS kernelcaches, and allows for loading XNU function signatures into Ghidra. You paste C++ function prototypes from Apple’s source headers into a signatures/ directory, and run load_signatures.py to import and apply all defined signatures (e.g., from an xnu.json or kernel.txt list).

This means Ghidra will rename and prototype any function that matches a known signature.

Other tools like bazad’s ida_kernelcache (IDA Pro) insert class and vtable names using iOS metadata (e.g., OSMetaClass or IOKit info) too. While not all of these are signature-based, these scripts use the known structure of XNU or IOKit classes to annotate the binary.

Example components

Studies, researchers, etc. have covered many Apple components via diffing:

The XNU kernel and its Mach interfaces by diffing MIG/syscall tables via XNU source or using “Anchor-string signatures.” Anchor string signatures are unique strings left in a binary (like error messages or class names) that you can use as fixed reference points to locate and analyze nearby code.
The kernelcache on iOS (e.g., using ida_kernelcache to map Objective-C classes, or ghidra_kernelcache to import XNU functions.
IOKit drivers (KEXTs) by using BinDiff to diff and/or identify new external methods.

Userland system libraries like libsystem and dyld are less often documented, but the same binary-matching tools apply. They diff-ed /usr/lib/system/libsystem*.dylib across OS versions and/or used FLIRT / Lumina in IDA against Apple’s open source libc/libdispatch. I am not that familiar with these as technologies in itself, but I am mentioning them since they are components of these researchers’ workflows.

This is a cool post: Preserving Your Digital Sandcastles with an IDA Plugin by Cra0.

The BH2009 work on dyld injection implies diffing or scanning dyld logic via symbol lookups. In all these cases, the idea is to leverage Apple’s published sources (or repeatable builds) to recognize functions.

Ok… enough said. Time to move on to another technique.

Analyzing syscalls and IPC (tip #5)

When dealing with OS code, understanding system call usage and inter-process communication (IPC) is a good starting point (tip #5).

I learn a lot by observing how a binary interacts with the kernel and other processes.

Syscalls

For example, macOS/iOS binaries often invoke system calls (like open, mmap, etc.) or Mach traps. In assembly, these might appear as an svc #0 (for ARM64) with a syscall number in a register. By cross-referencing the syscall number with an OSS syscall table, I can try to determine which calls are made.

If I see a binary executing syscall number 0x5, I can check XNU’s source to find that corresponds to open(). This helps to map binary code to higher-level operations.

Mach IPC and XPC

Another example surrounds how Apple uses Mach messages (Mach Interface Generator (MIG) interfaces) and cross-process communication (XPC) for daemon communication.

XPC is a higher-level IPC that simplifies secure and structured messaging between apps, daemons, and services.

Geeks for Geeks image for an RPC working. Used on post about signature / signatures.

Visual by Geeks for Geeks.

For reference, MIG interfaces are autogenerated C interfaces that allow user-space code to communicate with Mach kernel services through Mach messages using defined RPC-like syntax. Here, Mach messages are the low-level transport mechanism used by MIG interfaces to perform IPC.

This can help me identify Mach port usage or XPC service names by looking at strings or constructor calls.

For instance, if a daemon uses XPC, I might find a string that looks like a service name (e.g., com.apple.some.service) or calls to functions like xpc_connection_create. Similarly, Mach interfaces can have associated constant identifiers or selector names in MIG-generated code, which might be partially available via headers. By matching these patterns to documentation or header files, I can try to figure out what inter-process channels the binary is using. For more explicit examples, see:

A Look into XPC Internals: Reverse Engineering the XPC Objects by Kai Lu
Baby’s first Rust with extra steps (XPC, launchd, and FFI) by David Stancu
macOS XPC Exploitation – Sandbox Share case study by Eloi Benoist-Vanderbeken

Vtables and class methods

Many macOS/iOS components are written in C++ or Objective-C. C++ classes (e.g., IOKit drivers, system C++ libraries) use virtual function tables (vtables), and Objective-C uses runtime method dispatch.

derived vtable (Pokemon) example from Sean Deaton. Used on post about signature / signatures.

derived vtable (Caterpie) example from Sean Deaton. Used on post about signature / signatures.

Base vtable (Pokemon) and derived vtable (Caterpie) examples from Sean Deaton.

For reference, vtables are data structures used to support dynamic method resolution. This allows the dynamic dispatch runtime mechanism to select the appropriate method implementation during program execution based on the object’s runtime type. Put simply, runtime method dispatch is the process of selecting which method implementation to use.

I can try to bridge the gap between “source-level declarations” and “in-memory object layouts” by reconstructing class layouts from the binary.

In practice, this means something kinda like:

Dump the vtable pointers (e.g., with IDA, Hopper, or MachOView) to find each class’s vtable.
Parse out each virtual table to enumerate method slots in order.
Follow the RTTI/typeinfo structures (in C++ binaries) or Objective-C objc_image_info sections to recover superclass chains and ivar layouts.
Rebuild each class’s field offsets by correlating access patterns (e.g., mov rax, [rdi + 0x10] hints at a data member at offset 0x10 in x86).

That reconstructed layout lets you map from the incomplete public headers to a runtime representation, so you know where each method and data member lives, even when Apple’s open-source headers don’t provide much info.

See: Gotta RE ’em All: Reversing C++ Virtual Function Tables with Binary Ninja by Sean Deaton.

C++

For C++ classes, the binary’s data section will contain vtables. Each vtable is an array of function pointers for an object’s virtual methods.

By identifying a vtable (like see a sequence of addresses pointing into the TEXT section, sometimes preceded by an RTTI name or other marker), I can label those addresses as methods.

Disassemblers can sometimes automatically detect C++ vtables. I refer to any available headers or documentation to guess the method names.

Used on post about signature / signatures.

Image from Writing an iOS Kernel Exploit from Scratch by K³

In kernel drivers (IOKit), Apple’s OSMetaClass metadata provides class names and relationships that can be extracted to assist in mapping out vtables and class hierarchies.

Objective-C

For Objective-C code, the binary contains metadata for classes, selectors (method names), and protocols. I can leverage this through my disassembler’s built-in analyzer to parse Objective-C metadata.

For example, class names and method names can often be listed with a tool like class-dump. By dumping the Objective-C classes in a binary, I’ll obtain a header-style listing of methods which helps in understanding the disassembled code.

Even if I don’t have the implementation source, knowing the names and signatures of methods is incredibly useful to guide reverse engineering. Many disassemblers will automatically label Objective-C methods with these names, but if not, running class-dump manually can fill the gap.

Some great examples of this process can be found here:

iOS App Security: An Introduction to Objective-C Metadata & Symbols in Swift & Objective-C Apps (Part 1 of 2) by André Jacobs
Reverse Engineering iOS Apps – iOS 11 Edition (Part 2) by Ivan Rodriguez
Reverse Engineering Objective-C by Jeff Hui

Conclusion

In practice, bridging source and binary is an iterative process: I identify known pieces from source in the binary, label them, then use those anchors to make educated guesses about the unknown parts around them. Over time, a picture of the binary’s functionality emerges, augmented by insights from whatever source Apple has provided.

If you enjoyed this post on signature-based analysis for reversing, consider reading binary extraction with visual tooling.

Sources, thanks, and where to learn more

Research and blogs

Reversing Apple’s syslogd bug – fG!
Fresh Apples: Researching New Attack Interfaces on iOS and OSX + and slides + Play fuzzing machine: Hunting iOS/macOS kernel vulnerabilities automatically and smartly – Lilang Wu and Moony Li
Fun and Games with Mac OS X and iPhone Payloads, Black Hat Europe 2009 – Charlie Miller and Vincenzo Iozzo
Preserving Your Digital Sandcastles with an IDA Plugin – Cra0
Kernel Symbolication – Blacktop
SusanRTTI: an IDAPython plugin for viewing run-time type information – Tyler Colgan

Tools and repos

ghidra_kernelcache: a Ghidra framework for iOS kernelcache reverse engineering – 0x36
OSS Distributions – Apple
ida_kernelcache: An IDA Toolkit for analyzing iOS kernelcaches – bazad
MachOView – gdbinit

Olivia A. Gallucci

Signature-based Analysis for Reversing