Basic Anti-Cheat Evasion

So it’s been a while since I posted a blog. I was so busy with other things, especially adjusting the schedule with my work and my studies.

This short article I’ll discuss some very basic techniques on evading anti-cheat. Of course, you would still need to adjust the evasion mechanism depending on the anti-cheat you are trying to defeat.

On this blog, we will focus on Internal anti-cheat evasion techniques.

Part 1: The injector

First part of making your “cheat” is creating an executable that would inject your .dll into the process, A.K.A the game.

There are lot of injection mechanisms (copied from cynet). Below is the list but not limited to:

Classic DLL injection 

Classic DLL injection is one of the most popular techniques in use. First, the malicious process injects the path to the malicious DLL in the legitimate process’ address space. The Injector process then invokes the DLL via a remote thread execution. It is a fairly easy method, but with some downsides: 

Reflective DLL injection

Reflective DLL injection, unlike the previous method mentioned above, refers to loading a DLL from memory rather than from disk. Windows does not have a LoadLibrary function that supports this. To achieve the functionality, adversaries must write their own function, omitting some of the things Windows normally does, such as registering the DLL as a loaded module in the process, potentially bypassing DLL load monitoring. 

Thread execution hijacking

Thread Hijacking is an operation in which a malicious shellcode is injected into a legitimate thread. Like Process Hollowing, the thread must be suspended before injection.

PE Injection / Manual Mapping

Like Reflective DLL injection, PE injection does not require the executable to be on the disk. This is the most often used technique seen in the wild. PE injection works by copying its malicious code into an existing open process and causing it to execute. To understand how PE injection works, we must first understand shellcode. 

Shellcode is a sequence of machine code, or executable instructions, that is injected into a computer’s memory with the intent of taking control of a running program.  Most shellcodes are written in assembly language. 

Manual Mapping + Thread execution hijacking = Best Combo

Above all of this, I think the very stealthy technique is the manual mapping with thread hijacking.
This is because when you manual map a DLL into a memory, you wouldn’t need to call DLL related WinAPI as you are emulating the whole process itself. Windows isn’t aware that a DLL has been loaded, therefore it wouldn’t link the DLL to the PEB, and it would not create structs nor thread local storage.
Aside from these, since you would be having thread hijacking to execute the DLL, then you are not creating a new thread, therefore you are safe from anti-cheat that checks for suspicious threads that are spawned. After the DLL sets up all initialization and hooks, it would return the control of the hijacked thread its original state, therefore, like nothing happened.

POC

https://github.com/mlesterdampios/manual_map_dll-imgui-d3d11/blob/main/injector/injection.cpp

This repository demonstrate a very simple injector. The following are the steps to achieve the DLL injection:

  • Elevate injector’s process to allow to get handle with PROCESS_ALL_ACCESS permission
  • VirtualAllocEx the dll image to the memory
  • Resolve Imports
  • Resolve Relocations
  • Initialize Cookie
  • VirtualAllocEx the shellcode
  • Fix the shellcode accordingly
  • Stop the thread and adjust it’s RIP pointing to the EntryPoint
  • Resume the thread

The shellcode

byte thread_hijack_shell[] = {
	0x51, // push rcx
	0x50, // push rax
	0x52, // push rdx
	0x48, 0x83, 0xEC, 0x20, // sub rsp, 0x20
	0x48, 0xB9, // movabs rcx, ->
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x48, 0xBA, // movabs rdx, ->
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x48, 0xB8, // movabs rax, ->
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0xFF, 0xD0, // call rax
	0x48, 0xBA, // movabs rdx, ->
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x48, 0x89, 0x54, 0x24, 0x18, // mov qword ptr [rsp + 0x18], rdx
	0x48, 0x83, 0xC4, 0x20, // add rsp, 0x20
	0x5A, // pop rdx
	0x58, // pop rax
	0x59, // pop rcx
	0xFF, 0x64, 0x24, 0xE0 // jmp qword ptr [rsp - 0x20]
};

The line 7 is where you put the image base address, the line 9 is for dwReason, the line 11 is for DLL’s entrypoint and the line 14 is for the original thread RIP that it would jump back after finishing the DLL’s execution.

This injection mechanism is prone to lot of crashes. Approximately around 1 out of 5 injection succeeds. You need to load the game until on the lobby screen, then open the injector, if it crashes, just reboot the game and repeat the process until successful injection.

Part 2: The DLL

Of course, in the dll itself, you still need to do some cleanups. The injection part is done but the “main event of the evening” is just getting started.

POC

https://github.com/mlesterdampios/manual_map_dll-imgui-d3d11/blob/main/example_dll/dllmain.cpp

In the DLL main, we can see cleanups.

UnlinkModuleFromPEB

This one is unlinking the DLL from PEB. But since we are doing Manual Map, it wouldn’t have an effect at all, because windows didn’t even know that a DLL is loaded at all. This is useful tho, if we injected the DLL using classic injection method.

FakePeHeader

This one is replacing the PE header of DLL with a fakeone. Most memory scanner, tries to find suspicious memory location by checking if a PE exists. An MS-DOS header begins with the magic code 0x5A4D, so if an opcodes begin with that magic bytes, chances are, a PE is occupying that space. After that, the memory scanner might read that header for more information on what is really loaded with that memory location.

No Thread Creation

THIS IS IMPORTANT! Since we are hooking the IDXGISwapChain::Present, then we don’t see any reason to keep another thread running, so after our DLL finishes the setup, we then return the control of the thread to its original state. We can use the PresentHook to continue our “dirty business” inside the programs memory. Besides, as mentioned earlier, having threads can lead to anti-cheat flagging.

Obfuscation thru Polymorphism and Instantiation

This technique is already discussed on another blog: Obfuscation thru Polymorphism and Instantiation.

CALLBACKS_INSTANCE = new CALLBACKS();
MAINMENU_INSTANCE = new MAINMENU();

XORSTR

Ah, yes, the XORSTR. We can use this to hide the real string and will only be calculated upon usage.
To demonstrate the XORSTR, here is a sample usage. Focus on the line with “##overlay” string.

xorstr

And this is what it looks like after compiling and putting it under decompiler.

IDA Decompile

Other methodologies

There are some few more basic methodologies that wasn’t applied in the project. Below are following but not limited to:

  • Anti-debugging
  • Anti-VM
  • Polymorphism and Code mutation (to avoid heuristic patten scanners)
  • Syscall hooks
  • Hypervisor-assisted hooking
  • Scatter Manual Mapper (https://github.com/btbd/smap)
  • and etc…

This blog is not meant to teach reversing a game, but if you would like to deep dive more on reverse engineering, checkout: https://www.unknowncheats.me/ and https://guidedhacking.com/

Other resources:

POC and Conclusion

So, with the basic knowledge we have here, we tried to inject this on one of a common game that is still on ring3 (because ring0 AC’s are much more harder to defeat 😭).

BEWARE THAT THE ABOVE SCREENSHOTS ARE ONLY DONE IN A NON-COMPETITIVE MODE, AND ONLY STANDS FOR EDUCATIONAL PURPOSES ONLY. I AM NOT RESPONSIBLE FOR ANY ACTION YOU MAKE WITH THE KNOWLEDGE THAT I SHARED WITH YOU.

And now, we reached the end of this blog, but before I finished this article, I want to say thank you for reading this entire blog, also, I just want to say that I also passed the CISSP last October 2023, but wasn’t able to update here due to lot of workloads.

Again, I am really grateful for your time. Until next time!

Win11 22H2: Heaven’s Gate Hook

This won’t get too long. Just a quick fix for heavens gate hook (https://mark.rxmsolutions.com/through-the-heavens-gate/) as Microsoft updates the wow64cpu.dll that manages the translation from 32bit to 64bit syscalls of WoW64 applications.

To better visualize the change, here is the comparison of before and after.

Prior to 22h2, down until win10.
win11 22h2

With that being said, you cannot place a hook on 0x3010 as it would take a size of 8 bytes replacement. And would destroy the call mechanism even if you fix the displacement of call.

The solution

The solution is pretty simple. As in very very simple. Copy all the bytes from 0x3010 down until 0x302D. Fix the displacement only for the copied jmp at 0x3028. Then place the hook at 0x3010.
Basically, the copied gate (via VirtualAlloc or Codecave) will continue execution from original 0x3010. And so, the original 0x3015 and onwards will not be executed ever again.

Pretty easy right?

Notes

In the past, Microsoft tends to use far jump to set the CS:33. CS:33 signify that the execution will be a long 64 bit mode in order to translate from 32bit to 64bit. Now, they managed to create bridge without the need for far jmp. Lot of readings need to be cited in order to understand these new mechanism but please do let me know!

Conquering Userland (1/3): DKOM Rootkit

I am now close at finishing the HTB Junior Pentester role course but decided to take a quick brake and focus on one of my favorite fields: reversing games and evading anti-cheat.

The goal

The end goal is simple, to bypass the Cheat Engine for usermode anti-cheats and allow us to debug a game using type-1 hypervisor.

This writeup will be divided into 3 parts.

  • First will be the concept of Direct Kernel Object Manipulation to make a process unlink from eprocess struct.
  • Second, the concept of hypervisor for debugging.
  • And lastly, is the concept of Patchguard, Driver Signature Enforcement and how to disable those.

So without further ado, let’s get our hands dirty!

Difference Between Kernel mode and User mode

https://mark.rxmsolutions.com/wp-content/uploads/2023/09/Difference-Between-User-Mode-and-Kernel-Mode-fig-1.png
Kernel-mode vs User modeIn kernel mode, the program has direct and unrestricted access to system resources.In user mode, the application program executes and starts.
InterruptionsIn Kernel mode, the whole operating system might go down if an interrupt occursIn user mode, a single process fails if an interrupt occurs.  
ModesKernel mode is also known as the master mode, privileged mode, or system mode.User mode is also known as the unprivileged mode, restricted mode, or slave mode.
Virtual address spaceIn kernel mode, all processes share a single virtual address space.In user mode, all processes get separate virtual address space.
Level of privilegeIn kernel mode, the applications have more privileges as compared to user mode.While in user mode the applications have fewer privileges.
RestrictionsAs kernel mode can access both the user programs as well as the kernel programs there are no restrictions.While user mode needs to access kernel programs as it cannot directly access them.
Mode bit valueThe mode bit of kernel-mode is 0.While; the mode bit of user-mode is 3.
Memory ReferencesIt is capable of referencing both memory areas.It can only make references to memory allocated for user mode. 
System CrashA system crash in kernel mode is severe and makes things more complicated.
 
In user mode, a system crash can be recovered by simply resuming the session.
AccessOnly essential functionality is permitted to operate in this mode.User programs can access and execute in this mode for a given system.
FunctionalityThe kernel mode can refer to any memory block in the system and can also direct the CPU for the execution of an instruction, making it a very potent and significant mode.The user mode is a standard and typical viewing mode, which implies that information cannot be executed on its own or reference any memory block; it needs an Application Protocol Interface (API) to achieve these things.
https://www.geeksforgeeks.org/difference-between-user-mode-and-kernel-mode/

Basically, if the anti-cheat resides only in usermode, then the anti-cheat doesn’t have the total control of the system. If you manage to get into the kernelmode, then you can easily manipulate all objects and events in the usermode. However, it is not advised to do the whole cheat in the kernel alone. One single mistake can cause Blue Screen Of Death, but we do need the kernel to allow us for easy read and write on processes.

EPROCESS

The EPROCESS structure is an opaque structure that serves as the process object for a process.

Some routines, such as PsGetProcessCreateTimeQuadPart, use EPROCESS to identify the process to operate on. Drivers can use the PsGetCurrentProcess routine to obtain a pointer to the process object for the current process and can use the ObReferenceObjectByHandle routine to obtain a pointer to the process object that is associated with the specified handle. The PsInitialSystemProcess global variable points to the process object for the system process.

Note that a process object is an Object Manager object. Drivers should use Object Manager routines such as ObReferenceObject and ObDereferenceObject to maintain the object’s reference count.

https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/eprocess

Interestingly, the EPROCESS contains an important handle that can enumerate the running process.
This is where the magic comes in.

typedef struct _EPROCESS
{
     KPROCESS Pcb;
     EX_PUSH_LOCK ProcessLock;
     LARGE_INTEGER CreateTime;
     LARGE_INTEGER ExitTime;
     EX_RUNDOWN_REF RundownProtect;
     PVOID UniqueProcessId;
     LIST_ENTRY ActiveProcessLinks;
     ULONG QuotaUsage[3];
     ULONG QuotaPeak[3];
     ULONG CommitCharge;
     ULONG PeakVirtualSize;
     ULONG VirtualSize;
     LIST_ENTRY SessionProcessLinks;
     PVOID DebugPort;
     union
     {
          PVOID ExceptionPortData;
          ULONG ExceptionPortValue;
          ULONG ExceptionPortState: 3;
     };
     PHANDLE_TABLE ObjectTable;
     EX_FAST_REF Token;
     ULONG WorkingSetPage;
     EX_PUSH_LOCK AddressCreationLock;
...
https://mark.rxmsolutions.com/wp-content/uploads/2023/09/0cb07-capture.jpg

Each list element in LIST_ENTRY is linked towards the next application pointer (flink) and also backwards (blink) which then from a circular list pattern. Each application opened is added to the list, and removed also when closed.

Now here comes the juicy part!

Unlinking the process

Basically, removing the pointer of an application in the ActiveProcessLinks, means the application will now be invisible from other process enumeration. But don’t get me wrong. This is still detectable especially when an anti-cheat have kernel driver because they can easily scan for unlinked patterns and/or perform memory pattern scanning.

A lot of rootkits use this method to hide their process.

adios

Visualization

Before / Original State
After Modification

Checkout this link for image credits and for also a different perspective of the attack.

Kernel Driver

NTSTATUS processHiderDeviceControl(PDEVICE_OBJECT, PIRP irp) {
	auto stack = IoGetCurrentIrpStackLocation(irp);
	auto status = STATUS_SUCCESS;

	switch (stack->Parameters.DeviceIoControl.IoControlCode) {
	case IOCTL_PROCESS_HIDE_BY_PID:
	{
		const auto size = stack->Parameters.DeviceIoControl.InputBufferLength;
		if (size != sizeof(HANDLE)) {
			status = STATUS_INVALID_BUFFER_SIZE;
		}
		const auto pid = *reinterpret_cast<HANDLE*>(stack->Parameters.DeviceIoControl.Type3InputBuffer);
		PEPROCESS eprocessAddress = nullptr;
		status = PsLookupProcessByProcessId(pid, &eprocessAddress);
		if (!NT_SUCCESS(status)) {
			KdPrint(("Failed to look for process by id (0x%08X)\n", status));
			break;
		}

Here, we can see that we are finding the eprocessAddress by using PsLookupProcessByProcessId.
We will also get the offset by finding the pid in the struct. We know that ActiveProcessLinks is just below the UniqueProcessId. This might not be the best possible way because it may break on the future patches when a new element is inserted below UniqueProcessId.

Here is a table of offsets used by different windows versions if you want to use manual offsets rather than the method above.

Win7Sp00x188
Win7Sp10x188
Win8p10x2e8
Win10v16070x2f0
Win10v17030x2e8
Win10v17090x2e8
Win10v18030x2e8
Win10v18090x2e8
Win10v19030x2f0
Win10v19090x2f0
Win10v20040x448
Win10v20H10x448
Win10v20090x448
Win10v20H20x448
Win10v21H10x448
Win10v21H20x448
ActiveProcessLinks offsets
		auto addr = reinterpret_cast<HANDLE*>(eprocessAddress);
		LIST_ENTRY* activeProcessList = 0;
		for (SIZE_T offset = 0; offset < consts::MAX_EPROCESS_SIZE / sizeof(SIZE_T*); offset++) {
			if (addr[offset] == pid) {
				activeProcessList = reinterpret_cast<LIST_ENTRY*>(addr + offset + 1);
				break;
			}
		}

		if (!activeProcessList) {
			ObDereferenceObject(eprocessAddress);
			status = STATUS_UNSUCCESSFUL;
			break;
		}

		KdPrint(("Found address for ActiveProcessList! (0x%08X)\n", activeProcessList));

		if (activeProcessList->Flink == activeProcessList && activeProcessList->Blink == activeProcessList) {
			ObDereferenceObject(eprocessAddress);
			status = STATUS_ALREADY_COMPLETE;
			break;
		}

		LIST_ENTRY* prevProcess = activeProcessList->Blink;
		LIST_ENTRY* nextProcess = activeProcessList->Flink;

		prevProcess->Flink = nextProcess;
		nextProcess->Blink = prevProcess;

We also want the process-to-be-hidden to link on its own because the pointer might not exists anymore if the linked process dies.

		activeProcessList->Blink = activeProcessList;
		activeProcessList->Flink = activeProcessList;

		ObDereferenceObject(eprocessAddress);
	}
		break;
	default:
		status = STATUS_INVALID_DEVICE_REQUEST;
		break;
	}

	irp->IoStatus.Status = status;
	irp->IoStatus.Information = 0;
	IoCompleteRequest(irp, IO_NO_INCREMENT);
	return status;
}

POC

Before
After

Warnings

There are 2 problems that you need to solve first before being able to do this method.

First: You need to disable Driver Signature Enforcement

You need to load your driver to be able to execute kernel functions. You either buy a certificate to sign your own driver so you do not need to disable DSE or you can just disable DSE from windows itself. The only problem of disabling DSE is that some games requires you to have enabled DSE before playing.

Second: Bypass Patchguard

Manually messing with DKOM will result you to BSOD. They got a tons of checks. But luckily we have some ways to bypass patchguard.

These 2 will be tackled on the 3rd part of the writeup. Stay tuned!

Abusing Windows Data Executing Privacy (DEP)

Data Execution Prevention (DEP) is a system-level memory protection feature that is built into the operating system starting with Windows XP and Windows Server 2003. DEP enables the system to mark one or more pages of memory as non-executable. Marking memory regions as non-executable means that code cannot be run from that region of memory, which makes it harder for the exploitation of buffer overruns.

DEP prevents code from being run from data pages such as the default heap, stacks, and memory pools. If an application attempts to run code from a data page that is protected, a memory access violation exception occurs, and if the exception is not handled, the calling process is terminated.

DEP is not intended to be a comprehensive defense against all exploits; it is intended to be another tool that you can use to secure your application.

https://docs.microsoft.com/en-us/windows/win32/memory/data-execution-prevention

How Data Execution Prevention Works

If an application attempts to run code from a protected page, the application receives an exception with the status code STATUS_ACCESS_VIOLATION. If your application must run code from a memory page, it must allocate and set the proper virtual memory protection attributes. The allocated memory must be marked PAGE_EXECUTEPAGE_EXECUTE_READPAGE_EXECUTE_READWRITE, or PAGE_EXECUTE_WRITECOPY when allocating memory. Heap allocations made by calling the malloc and HeapAlloc functions are non-executable.

Applications cannot run code from the default process heap or the stack.

DEP is configured at system boot according to the no-execute page protection policy setting in the boot configuration data. An application can get the current policy setting by calling the GetSystemDEPPolicy function. Depending on the policy setting, an application can change the DEP setting for the current process by calling the SetProcessDEPPolicy function.

https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-exception_record

EXCEPTION_RECORD

typedef struct _EXCEPTION_RECORD {
  DWORD                    ExceptionCode;
  DWORD                    ExceptionFlags;
  struct _EXCEPTION_RECORD *ExceptionRecord;
  PVOID                    ExceptionAddress;
  DWORD                    NumberParameters;
  ULONG_PTR                ExceptionInformation[EXCEPTION_MAXIMUM_PARAMETERS];
} EXCEPTION_RECORD;

ExceptionInformation

An array of additional arguments that describe the exception. The RaiseException function can specify this array of arguments. For most exception codes, the array elements are undefined. The following table describes the exception codes whose array elements are defined.

Exception codeMeaning
EXCEPTION_ACCESS_VIOLATIONThe first element of the array contains a read-write flag that indicates the type of operation that caused the access violation. If this value is zero, the thread attempted to read the inaccessible data. If this value is 1, the thread attempted to write to an inaccessible address.If this value is 8, the thread causes a user-mode data execution prevention (DEP) violation.
The second array element specifies the virtual address of the inaccessible data.
EXCEPTION_IN_PAGE_ERRORThe first element of the array contains a read-write flag that indicates the type of operation that caused the access violation. If this value is zero, the thread attempted to read the inaccessible data. If this value is 1, the thread attempted to write to an inaccessible address.If this value is 8, the thread causes a user-mode data execution prevention (DEP) violation.
The second array element specifies the virtual address of the inaccessible data.
The third array element specifies the underlying NTSTATUS code that resulted in the exception.
ExceptionInformation table

The abuse!

VirtualProtect(&addr, &size, PAGE_READONLY, &hs.addressToHookOldProtect);

Set the target address into PAGE_READONLY so that if the address tries to execute/write, then it would result to an exception where we can catch the exception using VEH handler.

LONG WINAPI UltimateHooks::LeoHandler(EXCEPTION_POINTERS* pExceptionInfo)
{
	if (pExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_ACCESS_VIOLATION)
	{
		for (HookEntries hs : hookEntries)
		{
			if ((hs.addressToHook == pExceptionInfo->ContextRecord->XIP) &&
				(pExceptionInfo->ExceptionRecord->ExceptionInformation[0] == 8)) {
				//do your dark rituals here
			}
			return EXCEPTION_CONTINUE_EXECUTION;
		}

	}
	return EXCEPTION_CONTINUE_SEARCH;
}

As you can see, you just have to compare the ExceptionInformation[0] if it is 8 to verify if the exception is caused by DEP.

Simple AF!

What can I do with this?

Change the execution flow, modify the stack, modify values, mutate, and anything your imagination can think of! Just use your creativity!

POC

VEH Debugger
VEH Debugger
VEH Debugger via DEP

Conclusion

Thanks for viewing this, I hope you enjoyed this small writeup. Its been a while since I posted writeups, and may post again on some quite time. I am now currently shifting to Linux environment, should you expect that I will be having writeups on Linux, Web, Network, and Pentesting!

I am also planning to get some certifications such as CEH and OSCP, but I am not quite sure yet. But who knows? Ill just update it here whenever I came to a finalization.

Thanks and have a good day!~

Through the Heaven’s Gate

Really, the title does not literary means it. This writeup is about a research but not mine. And you will see why this writeup is called “Through the Heaven’s Gate” later on.

Background

My interest in this topic started from reversing a game. This game hooks many userland functions including the ones I’m interested in, VirtualProtect and NtVirtualProtectMemory. Without this, I am unable to change protection on pages and such.

This pushes me to resolve my need via kernel driver. I map my own kernel and execute a ZwVirtualprotectmemory from there, sure, it worked. But I want to make everything stay in usermode as their Anti-cheat just stays too in ring3.

The path to solution

Luckily, I have some several contacts that helps me to resolve me problem.

Me: How can I use VirtualProtect or NtVirtualProtectMemory when it's hooked at all.
az: use syscall
Me: *after some quite time* I can't find decent articles about syscall.
az: You can syscall, and since league is wow64, you can do heaven's gate on it
Me: ???

After that conversation I was like, “WHAAAATT???”. So I then proceed to read some articles regarding this. I’m thankful to this person because he does not give the solution directly, but he did point me to the process on how I can formulate the solution. So let’s break it down!

Syscall

In computing, a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed. This may include hardware-related services (for example, accessing a hard disk drive), creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

For example, the x86 instruction set contains the instructions SYSCALL/SYSRET and SYSENTER/SYSEXIT (these two mechanisms were independently created by AMD and Intel, respectively, but in essence they do the same thing). These are “fast” control transfer instructions that are designed to quickly transfer control to the kernel for a system call without the overhead of an interrupt.[8] Linux 2.5 began using this on the x86, where available; formerly it used the INT instruction, where the system call number was placed in the EAX register before interrupt 0x80 was executed.[9][10]

https://en.wikipedia.org/wiki/System_call

But there were problem regarding this, syscall cannot be manually called from 32bit application running in a 64bit environment.

Wow64

In computing on Microsoft platforms, WoW64 (Windows 32-bit oWindows 64-bit) is a subsystem of the Windows operating system capable of running 32-bit applications on 64-bit Windows. It is included in all 64-bit versions of Windows—including Windows XP Professional x64 EditionIA-64 and x64 versions of Windows Server 2003, as well as 64-bit versions of Windows VistaWindows Server 2008Windows 7Windows 8Windows Server 2012Windows 8.1 and Windows 10. In Windows Server 2008 R2 Server Core, it is an optional component, but not in Nano Server[clarification needed]. WoW64 aims to take care of many of the differences between 32-bit Windows and 64-bit Windows, particularly involving structural changes to Windows itself.

https://en.wikipedia.org/wiki/WoW64

Let’s start reversing!

Okay, so first, I will be using Cheat Engine because it has a powerful tool that helps to enumerate dll’s. Second, I will be dissecting discord app as an example.

We’ll open up discord.
Enumerate the Dll’s
And look at that!

Look at that! Faker what was that?. We have seen two ntdll.dll, wow64.dll, wow64win.dll and wow64cpu.dll. Also, if you noticed, 3 dll’s are in 64bit address space. Remember that we cannot execute 64bit codes directly in 32bit application. So what’s happening?

Answer: WOW64

We’ll follow the traces from 32bit ntdll. Let’s trace the NtVirtualProtectMemory on it.

ZwProtectVirtualMemory in 32bit ntdll

It’s not a surprise that we might not found syscall here. But we’ll follow the call.

ntdll.RtlInterlockedCompareExchange64+170 in 32bit ntdll
wow64cpu.dll + 7000

Look at that! RAX?!! 64bit code! What is this?

In fact, on 64-bit Windows, the first piece of code to execute in *any* process, is always the 64-bit NTDLL, which takes care of initializing the process in user-mode (as a 64-bit process!). It’s only later that the Windows-on-Windows (WoW64) interface takes over, loads a 32-bit NTDLL, and execution begins in 32-bit mode through a far jump to a compatibility code segment. The 64-bit world is never entered again, except whenever the 32-bit code attempts to issue a system call. The 32-bit NTDLL that was loaded, instead of containing the expected SYSENTER instruction, actually contains a series of instructions to jump back into 64-bit mode, so that the system call can be issued with the SYSCALL instruction, and so that parameters can be sent using the x64 ABI, sign-extending as needed.

In Alex Lonescu’ blog, he said.

So, whenever you are trying to syscall a function on 32bit ntdll, it will then traverse from 32bit ntdll to 64bit ntdll via wow64 layer dll’s.

Finally! The syscall in 64bit ntdll!

To summarize,

32-bit ntdll.dll -> wow64cpu.dll’s Heaven’s Gate -> 64-bit ntdll.dll syscall-> kernel-land

The solution

We just need to copy the opcode from ZwProtectVirtualMemory in 32bit ntdll. As I said, it was already hooked so we cannot use it. Meanwhile, we can imitate the original opcodes of it before it was hooked.

template<typename T>
void makesyscall<T>::CreateShellSysCall(byte sysindex1, byte sysindex2, byte sysindex3, byte sysindex4, LPCSTR lpFuncName, DWORD offsetToFunc, byte retCode, byte ret1, byte ret2)
{
	if (!sysindex1 && !sysindex2 && !sysindex3 && !sysindex4)
		return;

#ifdef _WIN64
	byte ShellCode[]
	{
		0x4C, 0x8B, 0xD1,					//mov r10, rcx 
		0xB8, 0x00, 0x00, 0x00, 0x00,		        //mov eax, SysCallIndex
		0x0F, 0x05,					        //syscall
		0xC3								//ret				
	};

	m_pShellCode = (char*)VirtualAlloc(nullptr, sizeof(ShellCode), MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);

	if (!m_pShellCode)
		return;

	memcpy(m_pShellCode, ShellCode, sizeof(ShellCode));

	*(byte*)(m_pShellCode + 4) = sysindex1;
	*(byte*)(m_pShellCode + 5) = sysindex2;
	*(byte*)(m_pShellCode + 6) = sysindex3;
	*(byte*)(m_pShellCode + 7) = sysindex4;

#elif _WIN32
	byte ShellCode[]
	{
		0xB8, 0x00, 0x00, 0x00, 0x00,		        //mov eax, SysCallIndex
		0xBA, 0x00, 0x00, 0x00, 0x00,		        //mov edx, [function]
		0xFF, 0xD2,						//call edx
		0xC2, 0x14, 0x00								//ret
	};

	m_pShellCode = (char*)VirtualAlloc(nullptr, sizeof(ShellCode), MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);

	if (!m_pShellCode)
		return;

	memcpy(m_pShellCode, ShellCode, sizeof(ShellCode));

	*(uintptr_t*)(m_pShellCode + 6) = (uintptr_t)((DWORD)GetProcAddress(GetModuleHandleA("ntdll.dll"), lpFuncName) + offsetToFunc);

	*(byte*)(m_pShellCode + 1) = sysindex1;
	*(byte*)(m_pShellCode + 2) = sysindex2;
	*(byte*)(m_pShellCode + 3) = sysindex3;
	*(byte*)(m_pShellCode + 4) = sysindex4;

	*(byte*)(m_pShellCode + 12) = retCode;
	*(byte*)(m_pShellCode + 13) = ret1;
	*(byte*)(m_pShellCode + 14) = ret2;
#endif
}
makesyscall<NTSTATUS>(0x50, 0x00, 0x00, 0x00, "RtlInterlockedCompareExchange64", 0x170, 0xC2, 0x14, 0x00)(GetCurrentProcess(), &addr, &size, PAGE_EXECUTE_READ | PAGE_GUARD, &oldProtection)

POC

Okay, so here is it. We’ve injected the dll and got some print for debugging.

Printing base ntdll
Printing the location of Rtl…

We dumped the running executable to check the print results. And, hell yeah!

Result of dump
Check the location of Rtl…

We then therefore conclude that we have successfully bypassed basic usermode hook.

Extended usage

With all this knowledge, we can also implement heaven’s gate hook! All syscalls will then be caught, and have the option to do actions based on the syscalls as your will. But we will not cover this topics as it can be cited from another writeup: WOW64!Hooks: WOW64 Subsystem Internals and Hooking Techniques

Fig 14: https://www.fireeye.com/blog/threat-research/2020/11/wow64-subsystem-internals-and-hooking-techniques.html
NtResumeThread inline hook before transitioning through the WOW64 layer
Fig 15: https://www.fireeye.com/blog/threat-research/2020/11/wow64-subsystem-internals-and-hooking-techniques.html

Conclusion

We therefore conclude that wow64 application are able to execute 64bit syscalls via Heaven’s Gate.
A big thanks to admiralzero@UC for pointing me on the right direction. When I figured out that they hook usermode functions, I feel that was locked out and pushed to do kernel usage, but no, there was a way. And here it is, going through the heaven’s gate!