DECAF (short for Dynamic Executable Code Analysis Framework) is a binary analysis platform based on QEMU.
DECAF++, the new version of DECAF, taint analysis is around 2X faster making it the fastest, to the best of our knowledge, whole-system dynamic taint analysis framework. This results in a much better usability imposing only 4% overhead (SPEC CPU2006) when no suspicious (tainted) input exists. Even under heavy taint analysis workloads, DECAF++ has a much better performance, around 25% faster on nbench, because of its elasticity. DECAF++ elasticity makes it a very suitable case for security analysis tasks that would selectively analyze the input e.g. Intrusion Detection Systems (IDS) that can filter out benign traffic. For further technical details, see our RAID 2019 paper. To activate the optimizations, see our DECAF++ wiki page.
DECAF (Dynamic Executable Code Analysis Framework) is the successor to the binary analysis techniques developed for TEMU (dynamic analysis component of BitBlaze ) as part of Heng Yin‘s work on BitBlaze project headed up by Dawn Song. DECAF builds upon TEMU. We appreciate all that worked with us on that project.
Fig 1 the overall architecture of DECAF
Fig 1 illustrates the overall architecture of DECAF. DECAF is a platform-agnostic whole-system dynamic binary analysis framework. It provides the following key features.
It provides the following key features.
Just-in-Time Virtual Machine Introspection
Different from TEMU, DECAF doesn’t use a guest driver to retrieve os-level semantics. The VMI component of DECAF is able to reconstruct a fresh OS-level view of the virtual machine, including processes, threads, code modules, and symbols to support binary analysis. Further, in order to support multiple architectures and operating systems, it follows a platform-agnostic design. The workflow for extracting OS-level semantic information is common across multiple architectures and operating systems. The only platform-specific handling lies in what kernel data structures and what fields to extract information from.
Support for Multiple Platforms
Ideally, we would like to have the same analysis code (with minimum platform-specific code) to work for different CPU architectures (e.g, x86 and ARM) and different operating systems (e.g., Windows and Linux). It requires that the analysis framework hide the architecture and operating system specific details from the analysis plugins. Further, to make the analysis framework itself maintainable and extensible to new architectures and operating systems, the platform-specific code within the framework should also be minimized. DECAF can provide support for both multiple architectures and multiple operating systems. Currently, DECAF supports 32 bit Windows XP/Windows 7/linux and X86/arm.
Precise and Lossless Tainting
DECAF ensures precise tainting by maintaining bit-level precision for CPU registers and memory and inlining precise tainting rules in the translated code blocks. Thus, the taint status for each CPU register and the memory location is processed and updated synchronously during the code execution of the virtual machine. The propagation of taint labels is done in an asynchronous manner. By implementing such a tainting logic mainly in the intermediate representation level (more concretely, TCG IR level), it becomes easy to extend tainting support to a new CPU architecture.
Event-driven programming interfaces
DECAF provides an event-driven programming interface. It means that the paradigm of ”instrument” in the translation phase and then analyze in the execution phase” is invisible to the analysis plugins. The analysis plugins only need to register for interesting events and implement corresponding event handling functions. The details of code instrumentation are taken care of by the framework.
Dynamic instrumentation management
To reduce runtime overhead, the instrumentation code is inserted into the translated code only when necessary. For example, when a plugin registers a function hook at a function’s entry point, the instrumentation code for this hook is only placed at the function entry point. When the plugin unregisters this function hook, the instrumentation code will also be removed from the translated code accordingly. To ease the development of plugins, the management of dynamic code instrumentation is completely taken care of in the framework, and thus invisible to the plugins.