VM Time A - Autorské řešení úlohy

Table of Contents

Warning! This post contains a writeup for the challenge. You can find the assignment in our CTFd.

Solution #

We have an ELF which implements a custom VM. Fortunately, the author decided NOT to strip this binary, so we can have a look at these wonderful named functions.

It looks like there is an object which represents the VM and it has a function called execute_code (usually, the object is first argument of its method in Ghidra, because it can’t decompile back to OOP).

execute_code seems to be a switch which chooses from various functions based on the opcode of the instruction. We can see some interesting names, such as cmp or syscall.

Since we can see the opcodes in the decompiled code, we can use them to make a basic disassembler.

I put the raw bytes of the VM code to compiled_code, so that we can disassemble it using ./disassembler.py compiled_code. (Otherwise, they can be found also in the decompilation or disassembly)

#!/usr/bin/env python3
import sys

if len(sys.argv) < 2:
    print("Usage: ./disassembler.py filename.a3")
    exit(0)

def convert(arg, memi, imm = False):
    if memi or imm:
        return hex(int(arg))
    lookup_table = {0x11 : "A", 0x12 : "B", 0x13 : "C", 0x14 : "D", 0x15 : "E", 0x0 : "N"}
    try:
        return lookup_table[arg]
    except KeyError:
        return hex(int(arg))

code = b""
with open(sys.argv[1], "rb") as inp:
    result = ''
    code = inp.read()

for instruction in range(0, len(code), 3):
    # print(f"{line = }")
    cmd = code[instruction]
    arg1 = code[instruction + 1]
    arg2 = code[instruction + 2]

    cmd_lookup_table = {
        0x42: "mov",
        0x24: "stack",
        0x12: "syscall",
        0x06: "movi",
        0x03: "memov",
        0x04: "memovi",
        0x13: "memld",
        0x14: "memldi",
        0x10: "add",
        0x11: "addi",
        0x45: "sub",
        0x25: "xor",
        0x26: "xori",
        0x62: "cmp",
        0x27: "nop"
    }

    command = ""
    try:
        command = cmd_lookup_table[cmd]
    except KeyError:
        print(f"Invalid Instruction {instruction // 3}: {cmd} {arg1} {arg2}")
        exit(0)
    imm = command[-1] == "i"
    memi = command == "memovi" or command == "memldi"
    if command == "syscall": # I like to differentiate syscall from other instructions since it is the most important thing
        result += f"\n{command} {convert(arg1, memi)}, {convert(arg2, memi, imm)}\n\n\n"
    else:
        result += f"{command} {convert(arg1, memi)}, {convert(arg2, memi, imm)}\n"
    # print(f"{result[-3:]}")
print(result.strip())

Length of the flag can be found either using strace or by looking at the code.

strace ./chall

...
write(1, "This challenge uses vm obfuscati"..., 90This challenge uses vm obfuscation. You need to reverse the architecture to get the flag.
) = 90
write(1, "Flag:\n", 6Flag:
)                  = 6
read(0, AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
"AAAAAAAAAAAAAAAAAAAAAAAAAAAAAA", 30) = 30
write(1, "Your flag is incorrect!\n", 24Your flag is incorrect!
) = 24
...

In this case, it seems like the flag length is 30 characters. If we dive deeper into the code, however, we will see that the code really operates only with the first 29 characters (the last one is reserved for \n).

When we go through the code, we can see some memovi instructions and syscalls after them, those are presumably write syscalls. We can skip them and look for something different. First code that somehow differs from just IO operations is the following:

movi A, 0x0
movi D, 0x1c
memld B, A # B = memory[0]
memld C, D # C = memory[28]
memov D, B # memory[28] = B = memory[0]
memov A, C # memory[0] = C = memory[28]
addi A, 0x1
sub D, 0x1
memld B, A # B = memory[1]
memld C, D # C = memory[27]
memov D, B # memory[27] = B = memory[1]
memov A, C # memory[1] = C = memory[27]
addi A, 0x1
sub D, 0x1
memld B, A
memld C, D
memov D, B
memov A, C
addi A, 0x1
sub D, 0x1
...

It seems like our input is being reversed.

There is another thing that mangles the input and by if we look at the decompilation (or simply the symbols), we see that there are functions xor and xori.

Now, let’s look for xor in the VM code and find what’s being xored there.

movi B, 0xa
xor A, B # xor register A with value in B = 0xa
movi B, 0x0
memov B, A # memory[0] = A ^ 0xa
movi A, 0x0
movi B, 0x1
memld C, A
memld D, B # first and second characters are loaded into A, B (but because the flag is already reversed, they are actually the last and the one before the last)
xor C, D
memov B, C # memory[1] = memory[0] ^ memory[1]
addi A, 0x1
addi B, 0x1
memld C, A
memld D, B
xor C, D
memov B, C # memory[2] = memory[1] ^ memory[2]
addi A, 0x1
addi B, 0x1
memld C, A
memld D, B
xor C, D

This means that the input is after reversing mangled by something like the following formula:

reversed_input[0] ^= 0xa
mangled_input = reversed_input[0] + [reversed_input[i] ^ reversed_input[i + 1] for i in range(28)]

Now, we know that the flag is reversed and xored, we can find out the correct value for the input and solve the challenge. The only thing we need now is the mangled flag.

If we look at the start of the code, the flag is probably being loaded at the start of the program into memory at position 0x60-0x7c (it’s 0x7c - 0x60 = 28 => 29 characters):

memovi 0x60, 0x77
memovi 0x61, 0x19
memovi 0x62, 0x29
memovi 0x63, 0x18
memovi 0x64, 0x6c
memovi 0x65, 0x58
memovi 0x66, 0x3b
memovi 0x67, 0xc
memovi 0x68, 0x79
memovi 0x69, 0x1f
memovi 0x6a, 0x7d
memovi 0x6b, 0x4d
memovi 0x6c, 0x12
memovi 0x6d, 0x7f
memovi 0x6e, 0x9
memovi 0x6f, 0x56
memovi 0x70, 0x22
memovi 0x71, 0x43
memovi 0x72, 0x70
memovi 0x73, 0x12
memovi 0x74, 0x4d
memovi 0x75, 0x38
memovi 0x76, 0x8
memovi 0x77, 0x71
memovi 0x78, 0xa
memovi 0x79, 0x6d
memovi 0x7a, 0xc
memovi 0x7b, 0x60
memovi 0x7c, 0x6
memovi 0x0, 0x54 # moving the values for the write
memovi 0x1, 0x68
memovi 0x2, 0x69
memovi 0x3, 0x73
...

Therefore the following code should produce the flag:

mflag = [0x77, 0x19, 0x29, 0x18, 0x6c, 0x58, 0x3b, 0xc, 0x79, 0x1f, 0x7d, 0x4d, 0x12, 0x7f, 0x9, 0x56, 0x22, 0x43, 0x70, 0x12, 0x4d, 0x38, 0x8, 0x71, 0xa, 0x6d, 0xc, 0x60, 0x6]

flag = [mflag[0] ^ 0xa]
flag += [mflag[i] ^ mflag[i + 1] for i in range(28)]

print(f'{"".join([chr(ch) for ch in flag[::-1]])}')

flag{y0u_b3at_vm_0bfu7c4t10n}