Skip to content
Tech News
← Back to articles

Static Devirtualization of Themida

read original more articles

Introduction

Before reading this article I highly recommend studying the following community research on binary deobfuscation.

This article demonstrates devirtualization of CodeVirtualizer/Themida protected code, however the techniques described here apply to pretty much every virtual machine based obfuscator. Only requiring some minor modifications to support each of them. The following is a non-exhaustive list of obfuscators that can be reduced using the technique described in this article.

Themida Architecture Analysis

Themida’s virtual machine architecture differs from VMProtect primarily in its support for nested virtualization. This is made possible by the fact that the VM context and virtual stack live inside the binary itself rather than on the native stack as they do in VMProtect. This article will not go deep on the architecture since it is largely not relevant to the devirtualization approach. The only VM-specific components that matter here are virtual branching and VMEXIT behavior, both of which are covered in their own sections. For a thorough breakdown of the Themida architecture, see this research.

Warning To The Wise

Pattern matching VM handlers back to x86 instructions is not an approach I recommend. I have tried it, and it does not scale. Any small change the protector vendor makes to handler layout, opcode tables, or dispatch logic can silently break your tooling across an entire version range. The more your implementation depends on VM-specific behavior, the more fragile it becomes.

The approach presented in this article deliberately minimizes VM-specific knowledge. That is what makes it work across a wide range of Themida versions. That said, studying the VM architecture is still worthwhile, not to pattern match against it, but to orient yourself within it and make informed decisions about how to guide the symbolic evaluation engine.

The vast majority of devirtualization work is done by a handful of general optimizations. VM-specific knowledge only becomes necessary when dealing with control flow, specifically virtual branching and virtualized calls.

Guided Symbolic Evaluation

... continue reading