Follow by Email

Blog Archive

Search This Blog

Using Ghidra to attack crackme

In order to experience a new tool for reverse, created in the walls of the NSA, I decided to break the remarkable and uncomplicated MalwareTech crack using Ghidra
Share it:
A practical example of using Ghidra to attack crackme
First Download crackme from the site MalwareTech , the password to the archive - too  MalwareTech.
example of using Ghidra

So, let's see what is in the archive. We see the executable file  vm1.exe and dump file  ram.bin. The explanation on the site says that we are dealing with an eight-bit virtual machine. The dump file is nothing more than a chunk of memory, in which random data and a flag that we need to find are interspersed. Let's leave the dump file alone for a while and take a look at  vm1.exe through the DiE program.
CrackMy Analyzer Detect It Easy
DiE does not show anything interesting, everything is fine with entropy. It means that there is no hinged protection, but it was still worth checking. Let's load this file into Ghidra and see what it gives out. I will give a complete listing of the application without functions (it is quite small) - so that you understand what we are dealing with.
PUSH   EBP
MOV    EBP ,ESP
SUB    ESP ,0x94
LEA    ECX =>local_94 ,[0xffffff70  + EBP ]
CALL   MD5::MD5
PUSH   0x1fb
PUSH   0x0
CALL   dword ptr [->KERNEL32.DLL::GetProcessHeap ]
PUSH   EAX
CALL   dword ptr [->KERNEL32.DLL::HeapAlloc ]
MOV    [DAT_0040423c ],EAX
PUSH   0x1fb
PUSH   DAT_00404040
MOV    EAX ,[DAT_0040423c ]
PUSH   EAX
CALL   memcpy
ADD    ESP ,0xc
CALL   FUN_004022e0
MOV    ECX ,dword ptr [DAT_0040423c ]
PUSH   ECX
LEA    ECX =>local_94 ,[0xffffff70  + EBP ]
CALL   MD5::digestString
MOV    dword ptr [local_98  + EBP ],EAX
PUSH   0x30
PUSH   s_We've_been_compromised!_0040302c
MOV    EDX ,dword ptr [local_98  + EBP ]
PUSH   EDX
PUSH   0x0
CALL   dword ptr [->USER32.DLL::MessageBoxA ]
PUSH   0x0
CALL   dword ptr [->KERNEL32.DLL::ExitProcess ]
XOR    EAX ,EAX
MOV    ESP ,EBP
POP    EBP
RET
As you can see, the code is simple and easy to read. Let's use the Ghidra decompiler and see what it produces.
undefined4 entry(void)
{
HANDLE hHeap;
char *lpText;
DWORD dwFlags;
SIZE_T dwBytes;
MD5 local_94 [144];
 
MD5(local_94);
 
dwBytes = 0x1fb;
dwFlags = 0;
 
hHeap = GetProcessHeap();
DAT_0040423c = (char *)HeapAlloc(hHeap,dwFlags,dwBytes);
memcpy(DAT_0040423c,&DAT_00404040,0x1fb);
 
FUN_004022e0();
 
lpText = digestString(local_94,DAT_0040423c);
MessageBoxA((HWND)0x0,lpText,"We\'ve been compromised!",0x30);
 
ExitProcess(0);
return 0;
}
I added indentation for readability - separated the variable declarations from the rest of the code. The code is very simple: first, memory is allocated in the heap ( GetProcessHeap...  HeapAlloc), then 0x1fb (507) bytes from are copied into it  DAT_00404040. But we have nothing interesting in 00404040! We recall that the crack manual said that ram.bin is a piece of memory. Of course, if you look at the file size, it turns out to be 507 bytes.
*
We load  ram.bin in HxD or any other hex editor and watch.
File ram.bin in HxD Hex Editor

Alas, there is nothing intelligible there. But the logic of the work DAT_0040423c is cleared up a bit:  - this is ram.bin (our dedicated 507 bytes on the heap). Let's rename  DAT_0040423c RAM to make it easier to navigate the code. Next, go to the function  FUN_004022e0.

Graphic representation of the function FUN_004022e0
Here is the decompiled function code:
void FUN_004022e0(void)
{
byte bVar1;
uint uVar2;
byte bVar3;
byte local_5;
 
local_5 = 0;
do {
uVar2 = (uint)local_5;
bVar1 = local_5 + 1;
bVar3 = local_5 + 2;
local_5 = local_5 + 3;
uVar2 = FUN_00402270((byte *)(uint)*(byte *)(RAM + 0xff + uVar2),
                     (uint)*(byte *)(RAM + 0xff + (uint)bVar1),
                     *(undefined *)(RAM + 0xff + (uint)bVar3));
} while ((uVar2 & 0xff) != 0);
return;
}
Since we still know that we have a virtual machine, everything becomes more or less clear. But in order to truly understand pseudocode, one must always look into the disassembler, otherwise pseudocode can be confusing.

Ghidra pseudocode and disassembler
I outlined the instructions that perform the increment of variables by one. Remember that we have a function  FUN_00402270 that is initialized with three parameters. We look at the initialization of the first parameter.
MOVZX  ECX ,byte ptr [EBP  + local_5 ]
MOV    EDX ,dword ptr [RAM ]
MOVZX  EAX ,byte ptr [0xff  + EDX  + ECX *0x1 ]
MOV    dword ptr [EBP  + local_14 ],EAX
MOV    CL,byte ptr [EBP  + local_5 ]
 
ADD    CL,0x1   ; Variable increment
Obviously, a byte is taken from  [RAM] and the variable is initialized. And the same code when initializing each function argument, the only difference is that the registers in which the function arguments will be changed  FUN_00402270 . As a result, the function call looks like this:
MOV    ECX ,dword ptr [EBP  + local_c ]
PUSH   ECX
MOV    EDX ,dword ptr [EBP  + local_10 ]
PUSH   EDX
MOV    EAX ,dword ptr [EBP  + local_14 ]
PUSH   EAX
CALL   FUN_00402270
So, FUN_00402270 three parameters are transmitted - three bytes from [RAM], following each other. Go to the function FUN_00402270, here is its pseudocode:
uint FUN_00402270(byte *param_1,int param_2,undefined param_3)
{
if (param_1 == (byte *)0x1) {
*(undefined *)(RAM + param_2) = param_3;
}
else {
if (param_1 == (byte *)0x2) {
  param_1 = (byte *)(RAM + param_2);
  DAT_00404240 = *param_1;
}
else {
  if (param_1 != (byte *)0x3) {
    return (uint)param_1 & 0xffffff00;
  }
  param_1 = (byte *)(RAM + param_2);
  *(byte *)(RAM + param_2) = *param_1 ^ DAT_00404240;
}
}
return CONCAT31((int3)((uint)param_1 >> 8),1);
Here the first byte passed to the function is checked, and if it matches with  0x1,  0x2 or  0x3, the next two arguments are processed. The parsing of the first parameter is especially clearly readable in the disassembled listing. Apparently, this is a virtual machine command interpreter that contains only three VM commands.
Graphic representation of the interpreter in Ghidra

PUSH   EBP
MOV    EBP ,ESP
PUSH   ECX
MOV    EAX ,dword ptr [EBP  + param_1 ]
MOV    dword ptr [EBP  + local_8 ],EAX
CMP    dword ptr [EBP  + local_8 ],0x1
JZ     LAB_0040228e
CMP    dword ptr [EBP  + local_8 ],0x2
JZ     LAB_0040229e
CMP    dword ptr [EBP  + local_8 ],0x3
JZ     LAB_004022b0
JMP    LAB_004022d1
At this stage I will discuss a little more to summarize. So, we have an application that works with 507 bytes of memory, the dump of which we have is ram.bin. Inside this dump, the data that is interesting to us is mixed with other data that we do not need. The application vm1.exe reads byte memory in search of instructions 0x1, 0x2 and 0x3, and as soon as one of them is found, the next two bytes after them are processed.
In other words, we have mnemonic commands (p-code, pi-code) that work with their two arguments, and a memory area of ​​507 bytes is nothing more than a pi-code tape mixed with garbage. In fact, do not be afraid of garbage - processing commands will begin with finding the desired byte of the opcode, and the following two values ​​will be taken, and the garbage is simply skipped.

INFO
P-code, or "pi-code", is the implementation of mnemonics for its own command interpreter. It is also called the “hypothetical processor” code - after all, in fact, the processor for the execution of pi-code was written by someone independently.

Now let's analyze the programmed opcodes of commands that are parsed by the code shown above. I will immediately provide a C code similar to the disassembler listing.
LAB_0040228e:  
MOV    ECX ,dword ptr [RAM ]
ADD    ECX ,dword ptr [EBP  + param_2 ]
MOV    DL,byte ptr [EBP  + param_3 ]
MOV    byte ptr [ECX ],DL
JMP    LAB_004022d5
Let's start to restore the logic of the virtual machine. Announce char ram[507] - it will be the memory of the virtual machine. Using this function fopen → fread →, fwrite write the contents of the ram.bin file to this array. Four lines of the assembler code and the transition - everything is simple: in the array ram by value we [EBP + param_2] move the value param_3. In the code, it will look like this:
ram[val_01] = val_02;
We start analyzing the following subroutine:
LAB_0040229e:  
MOV    EAX ,[RAM ]
ADD    EAX ,dword ptr [EBP  + param_2 ]
MOV    CL,byte ptr [EAX ]
MOV    byte ptr [r1 ],CL    ; DAT_00404240
JMP    LAB_004022d5
It is very similar to the previous one, it is also an analogue of the MOV operation, but one of the two registers of the virtual machine ( DAT_00404240 in the listing) is already used here , into which the value from the VM memory is put. And from our point of view - from the array ram, which is addressed param_2 in the disassembler code, and in ours - val_01. In other words, an operation MOV reg,[mem].
int r1 = 0, r2 = 0; // We declare VM registers
r1 = ram[val_01];
The last subroutine is twice as difficult - instead of four lines of code, here are eight! We take the value from memory (remember our array ram, where did we write the contents of ram.bin?) And save it to the virtual machine register (EDX), then take the first value after the mnemonics in the pi code (ECX) and perform the XOR operation between them . The result is put back into memory.
LAB_004022b0:
MOVZX  EDX ,byte ptr [r1 ]  ; DAT_00404240
MOV    EAX ,[RAM ]
ADD    EAX ,dword ptr [EBP  + param_2 ]
MOVZX  ECX ,byte ptr [EAX ]
XOR    ECX ,EDX
MOV    EDX ,dword ptr [RAM ]
ADD    EDX ,dword ptr [EBP  + param_2 ]
MOV    byte ptr [EDX ],CL
JMP    LAB_004022d5
In C, it will look like this:
r2 = ram[val_01];
ram[val_01] = r2 ^ r1;
That's all. The three-team virtual machine has been restored, it remains to apply the results of our work to the ram.bin file in order to get the required cracking flag. As I said, for this we read the file in char ram[507] and use the VM code decompiler. As a bonus, the cycle will display the virtual machine mnemonics in a readable form, and at the end will print the desired flag. I added clarifying comments to the code.
char ram[507];      // VM memory, ram.bin
int r1 = 0, r2 = 0; // VM registers
 
for (;;)
{
    int command = (int)ram[x];  // We take command opcode
    int val_01 = (int)ram[x + 1];   // First operand of the command
    int val_02 = (int)ram[x + 2];   // Second operand command
 
    // Decoding the code
    if (command == 0x1)
    {
        ram[val_01] = val_02;
        cout << "mov " << "[" <<(int)ram[val_01] << "]" << "," << val_02 << endl;
    }
    if (command == 0x2)
    {
        r1 = ram[val_01];
        cout << "mov " << "r1" << "," << "[" << (int)ram[val_01] << "]" << endl;
    }
    if (command == 0x3)
    {
        r2 = ram[val_01];
        ram[val_01] = r2 ^ r1;
        cout << "xor " << "r2" << "," << "r1" << endl;
    }
    if (command > 3 || command < 1) break;
    x += 3;
}
 
printf("\n%s\n", &ram);     // Print the result
After executing this code, we will get the disassembled VM and flag.

Result of the restored virtual machine

Conclusion
I hope that, after reading the article, you will stop being afraid of the words “virtual machine” or “pi-code”. Of course, in real commercial protectors like VMProtect or Themida everything will be much more complicated: there can be used a lot of virtual machine commands, their mnemonic codes can change constantly, there are virtual machines, different anti-debugging and anti-dump techniques written in pi-code, and much more . But you got the first idea.

At the same time, we became more closely acquainted with the toolbox called Ghidra and performed the first hack using it, even if it was a crack!
Share it:

reverse engineering

Post A Comment:

0 comments: