Post

Confusing .NET Decompilers: The CallVirt OpCode

In a previous post we dove deep into the inner workings of the call opcode, and used it to confuse decompilers and deobfuscators. We will continue this story by also giving the callvirt opcode some attention as well. We will be exploiting some interesting implementation details of the CLR, that will allow us to call functions we are not supposed to be able to call in the first place.

Since when can we not trust simple assignments anymore?

Similar to before, a lot of the things discussed here are still very much in the realm of undefined behavior. It is still not recommended for production use, as implementations of the runtime can (and will) change all the time.

Objects and Virtual Dispatch

Before we can proceed with our hacking efforts, we need to step back a bit and get to know more about the internal workings of callvirt. The callvirt opcode makes use of a process called virtual dispatch. In the previous post, we quickly glanced over how this worked on a high level. We are going to dive a bit deeper now, and see how objects work under the hood, and what the process of overriding virtual methods really entails.

Virtual function tables

Let us focus on one class for now, that overrides the ToString method:

1
2
3
4
public class MyClass
{
    public override string ToString() => "My custom output";
}

To understand how virtual functions are implemented, we need to know about Method Tables and Virtual Function Tables. A Method Table is the runtime’s internal representation of a type. It contains all kinds of information, such as its name, the type it extends, its total size in bytes, as well as its metadata token. One of the things it also includes is a Virtual Function Table or VFTable for short. A VFTable is a list of function pointers that records all the code addresses used for each virtual method defined in the class. Every time a call is made to a virtual method on an object, the runtime will look up the corresponding entry in its VFTable table, and transfer control to the address that is stored there. This means that if we override for example the ToString() method in a class, all that needs to happen on the runtime’s side of things is to put a new address into the corresponding entry of the type’s VFTable. Below an example of how that may look in memory:

The VFTables of System.Object and MyClass.

On the left side of the figure, we see the VFTable of the System.Object type. It contains pointers to implementations of all its virtual methods. On the right side of the figure, we see the VFTable of MyClass instead. Here, we see that all function pointers are the same as the one found in the VFTable of System.Object, except for the entry of ToString(). This new code address points to a different implementation of the function, namely the one that eventually returns the string "My custom output". This means that when the runtime then ends up calling the ToString() method, it will find the address to the new implementation in the table rather than to the original code.

Below is a (simplified) visualization of what the entire virtual dispatch process may look like, from start to finish:

Resolution of the ToString() method of an object of type MyClass.

We start by having the this object reference stored in some register (typically RCX on x86-64). We can follow this pointer to get to the actual contents of the object. The first field of every object in .NET is the address to the object’s Method Table. We dereference this pointer, and look up the entry within its VFTable that stores the address of the latest version of the ToString() method. We then make a call to this address, effectively transferring execution to the ToString() method that was reimplemented by the MyClass type.

The JIT-generated code

This mechanism is nicely reflected in the native code that the JIT compiler generates. For example, consider the following snippet:

1
2
3
4
public static void MyMethod(object x)
{
    Console.WriteLine(x.ToString());
}

The corresponding CIL code sequence may look something like the following.

1
2
3
4
5
6
7
.method public static hidebysig void MyMethod(object x)
{
    ldarg.0                                                         // x
    callvirt instance string [mscorlib] System.Object::ToString()   // .ToString()
    call void [mscorlib] System.Console::WriteLine(string)          // Console.WriteLine(...)
    ret
}

We can use tools like SharpLab or a debugger like WinDbgX and its SOS extensions for .NET to see what kind of native code the runtime generates for us. For the ones not too familiar with reading x86-64 code, I annotated the code the best I could:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
// void MyMethod(object x)
//   x is stored in register rcx

L0000:  sub rsp, 0x28

// @rax = x.ToString();
L0004:  mov rax, [rcx]               // Dereference @rcx to get the object's Method Table
L0007:  mov rax, [rax+0x40]          // Dereference +0x40 to get to the VFTable
L000b:  call qword [rax+0x0]         // Invoke ToString() entry stored at offset 0x0.

// Console.WriteLine(@rax);
L000d:  mov rcx, rax                 // Arguments[0] := @rax (return value)
L0010:  call System.Console.WriteLine(System.String)

L0015:  nop
L0016:  add rsp, 0x28
L001a:  ret

Offsets L0004-L000b implement the virtual call to Object::ToString(). We see it following the diagram to the letter, by first reading the Method Table of the object, then moving to the VFTable, and finally reading the address of the latest ToString() implementation (which happens to be at offset 0x0 in the VFTable) and calling it. Note that for Console::WriteLine(string) we don’t need virtual dispatch, and a normal direct call is emitted by the runtime instead.

Breaking Type Safety

A lot of theory… What can we do with it?

The astute reader would have noticed that the native code implementing the virtual call really only consists of a couple of dereferences, followed up by a call. With this, the runtime makes one big assumption. It assumes that the compiler that wrote the original CIL code was well-behaved, and has ensured that the this-object that was pushed is of the correct type. There are no checks done to verify this at runtime, it just silently assumes everything is in order as was presented in the original CIL code, and starts happily dereferencing, following the same path as we did in the previous section.

For our intents and purposes, this also means that the runtime made one big mistake.

We are not a well-behaved compiler writing proper CIL code :).

Although, one could argue that sometimes the C## compiler isn’t well-behaved either…

Let us see how we can exploit this intricate detail.

Assigning variables with the wrong type

In the example code of the previous post, we introduced two classes A and B, where class B inherited from class A. We are going to reuse this same setup but change our code slightly. We start by decoupling our two classes such that they do not inherit from each other anymore. We keep the methods marked virtual, however:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
class A
{
    public virtual string Foo() => "This is A::Foo()";
}

class B
{
    public virtual string Bar() => "This is B::Bar()";
}

class Program
{
    public static void Main()
    {
        A a = new A();
        Console.WriteLine(a.Foo());
    }
}

The Main method could be implemented as follows in CIL:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
.method public static void Main()
{
    .entrypoint
    .locals init (
        [0] class A
    )

    // A a = new A();
    newobj void A::.ctor()
    stloc.0

    // a.Foo()
    ldloc.0
    callvirt instance string A::Foo()

    // Console.WriteLine(...)
    call void [mscorlib] System.Console::WriteLine(string)
    ret
}

Note how both the classes A and B inherit from System.Object, and as such both of their VFTables start with the virtual methods defined in System.Object. This also means that any new virtual method defined in the class will be placed right after the virtual methods of System.Object in the VFTable.

Two VFTables for classes A and B.

Since both A::Foo() and B::Bar() are the first virtual methods defined in their declaring type after the ones defined by System.Object, the index of the VFTable entry of A::Foo() in A will be the same as the VFTable entry of B::Bar() in B.

What would happen if we push an object reference of type B, but we keep calling A::Foo() using callvirt?

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
.method public static void Main()
{
    .entrypoint
    .locals init (
        [0] class A
    )

    // A a = new B();       // <-- Create an instance of `B` instead of `A`.
    newobj void B::.ctor()
    stloc.0

    // a.Foo()
    ldloc.0
    callvirt instance string A::Foo()

    // Console.WriteLine(...)
    call void [mscorlib] System.Console::WriteLine(string)
    ret
}

ILSpy tries to “fix” the malformed CIL code by adding a bunch of casts:

Casting to System.Object seems necessary… I guess?

dnSpy, a decompiler that promises to be more robust against obfuscated binaries, doesn’t even bother with the casts and just happily assigns the value of type B to the variable of type A. Needless to say, this decompiled code does not compile:

dnSpy is a very progressive decompiler, every object should be treated equally no matter its type.

However, when we run the program, we get the output of B::Bar():

1
2
Z:\> test.exe
B::Bar()

From the runtime’s perspective, a reference to an object of type A is indistinguishable from a reference to an object of type B. While the method B::Bar() does not exist in type A, it does share a VFTable index with A::Foo(). Since A::Foo() is referenced in the operand of the callvirt instruction, the runtime will emit code that follows the same virtual dispatch procedure of class A that we have seen before, get the VFTable of type B instead in the process (as there are no type checks done), and happily call B::Bar() instead.

We can make this more confusing for the reverse engineer by introducing both Foo and Bar to both classes A and B:

1
2
3
4
5
6
7
8
9
10
11
12
13
class A
{
    public virtual string Foo() => "A::Foo()";

    public virtual string Bar() => "A::Bar()";
}

class B
{
    public virtual string Bar() => "B::Bar()";

    public virtual string Foo() => "B::Foo()";
}

Note how the order in which the methods are defined is flipped for B, and as such, A::Foo() shares the VFTable index with B::Bar() and A::Bar() has the same index as B::Foo(). Now let us try calling these two functions with callvirt one after the other:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
.method public static void Main()
{
    .entrypoint
    .locals init (
        [0] class A
    )

    // Create a new object of type `B` instead of `A`:
    newobj void B::.ctor()
    stloc.0

    // Call method `A::Foo()` and print result.
    ldloc.0
    callvirt instance string A::Foo()
    call void [mscorlib] System.Console::WriteLine(string)

    // Call method `A::Bar()` and print result.
    ldloc.0
    callvirt instance string A::Bar()
    call void [mscorlib] System.Console::WriteLine(string)

    ret
}

From the decompiled code you would expect Foo then Bar to be printed to the standard output:

Calling Foo then Bar, or are we?

But in reality, we get exactly the opposite:

1
2
3
Z:\> test.exe
B::Bar()
B::Foo()

By cleverly aligning methods in each VFTable, we can make it seem like the program is calling one method, but in fact, call another method instead.

Cool stuff!

Hiding our tracks

This is interesting, but for someone who exports the code to a C## project with ILSpy, this will not be too effective. They will quickly notice that the decompiled code will not compile back to the same executable, as the program throws an exception after recompiling:

1
2
3
4
5
6
Z:\> cd decompiled

Z:\decompiled> dotnet run 

Unhandled Exception: System.InvalidCastException: Unable to cast object of type 'A' to type 'B'.
    at Program.Main()

If you had exported it with dnSpy instead, we do not even get to run the decompiled program as it does not even compile in the first place:

1
2
3
4
5
Z:\decompiled> dotnet run

Z:\decompiled\Program.cs(9,9): error CS0029: Cannot implicitly convert type 'B' to 'A' [Z:\decompiled\test.csproj]

The build failed. Fix the build errors and run again. 

And that makes sense. In both cases, this is because B does not inherit from A, which means the first assignment A a = new B(); cannot be done. A reverse engineer will see this and immediately know where to look for any discrepancies with the original and be able to fix it quickly. Thus, our next goal is to somehow let the decompiler still generate code in such a way that it is compilable and executable, while still not revealing the true behavior of the original program.

It so turns out, that this does not require many extra creative steps. In fact, we can use the exact same trick that we used before, with only a small difference. Previously, we passed on a this object with the wrong type to a virtual method. This time around, we are going to pass on the right object of the right type, but call the wrong method with the wrong declaring type instead.

In simple terms, we restore our first assignment statement to the original and instantiate an object of type A, but we are going to call methods from type B on it:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
.method public static void Main()
{
    .entrypoint
    .locals init (
        [0] class A
    )

    // Create a new object of type `A`.
    newobj void A::.ctor()
    stloc.0

    // Call method `B::Foo()` and print result.
    ldloc.0
    callvirt instance string B::Foo()
    call void [mscorlib] System.Console::WriteLine(string)

    // Call method `B::Bar()` and print result.
    ldloc.0
    callvirt instance string B::Bar()
    call void [mscorlib] System.Console::WriteLine(string)

    ret
}

ILSpy still produces non-runnable code, by inserting many weird-looking casts:

ILSpy likes to add casts.

However, dnSpy doesn’t mind this at all, and just displays the calls as any other call, without giving us any indication of something weird going on under the hood:

In dnSpy the code seems very normal.

The program happily invokes A::Foo() and prints the expected “unexpected” result:

1
2
3
Z:\> original.exe
A::Bar()
A::Foo()

Since both Foo and Bar are defined in types A and B, dnSpy will now produce compilable and runnable code. However, after recompiling the decompiled code, we will get a different result:

1
2
3
Z:\decompiled> dotnet run
A::Foo()
A::Bar()

Other variations

The VFTable trick also works for properties and events, since they are simply virtual methods with some extra metadata attached to them. This way, you could for example make properties return unexpected values, by aligning the getter method with another virtual method that returns something else:

Assigning a value to MyProperty and getting something different back.

Another thing to have fun with is giving these methods names of well-known BCL methods like ToString(). In such a case, it looks like the program is just calling yet another normal virtual method on the object, but in fact is doing something completely different:

This ToString() method call is not what it seems!

We can also call methods that you should not have access to from outside their declaring types, such as protected methods:

Calling protected methods from outside the class

As long as you align the VFTable indices properly, you can pretty much do anything you want.

So much for type safety in .NET!

Lessons learned and final words

Virtual dispatch is one of the fundamental operations that make object orientation possible. It allows us to define virtual methods that can be overridden by sub-classes, in such a way that classes can be extended in functionality with ease. However, we have seen that the current implementation of the runtime makes some big assumptions on what kind of objects are being passed around, and is reluctant to insert type checks in the native code of a managed method. This allowed us to craft some specific CIL code, perform type-confusion, and call virtual methods that we are not supposed to be calling. This confuses decompilers a lot, since it is an unusual construction never emitted by any compiler itself, which allowed us to hide the true behavior of a program.

Note that many of these examples are probably still relatively easy to defeat. By renaming all the metadata in the assembly to unique names, it is possible to spot our CIL code-golfing efforts and quickly realize that something is going on under the hood. However, given a sufficiently large program with many different classes and virtual methods, it would not be obvious that such a trick was put in place. Furthermore, deobfuscators like de4dot do not generate names that are unique across all methods in the entire assembly by default, let alone rename methods with normal printable names. Thus, it is still quite an effective method to hide the true behavior of a program and prevent decompilers from outputting semantically equivalent code.

I would like to emphasize once more that this is probably not something that you should be doing in production code. While the idea is interesting and works with the current implementation of the CLR, it may not work in the future as it is not written in the specification and would thus be considered undefined behavior. Protections that relies on undefined behavior are bound to break in a future update of the runtime, rendering all your programs processed with this method of obfuscation useless. Therefore, consider this write-up more as an educational article rather than something you should implement in your next obfuscator.

This post is licensed under CC BY 4.0 by the author.