Hooking via Vectored Exception Handling

In computer programming, the term hooking covers a range of techniques used to alter or augment the behaviour of an operating system, of applications, or of other software components by intercepting function calls or messages or events passed between software components. Code that handles such intercepted function calls, events or messages is called a hook.

Hooking is used for many purposes, including debugging and extending functionality. Examples might include intercepting keyboard or mouse event messages before they reach an application, or intercepting operating system calls in order to monitor behavior or modify the function of an application or other component. It is also widely used in benchmarking programs, for example frame rate measuring in 3D games, where the output and input is done through hooking.

Hooking can also be used by malicious code. For example, rootkits, pieces of software that try to make themselves invisible by faking the output of API calls that would otherwise reveal their existence, often use hooking techniques.

https://en.wikipedia.org/wiki/Hooking

Hooking Methods

The content of this section came from UC and is not my own words. Kindly visit the page for more detailed and complete info.

Byte patching (.text section)

Execute-Speed: 10
Skill-Level: 2
Detectionrate: 5 – 7

Byte patching in the .text section is the easiest and most common way to place a hook.
Hooking libraries like Microsoft Detours (Download) are used alot.
Some anticheats are still retarded and dont even scan the .text section, but most of them figured out that one finally.
There are various ways to redirect the code flow. You can place a normal JMP instruction (5 bytes in size) or try some hotpatching using a short JMP (2 bytes in size) to some location where is more space for a 5 byte JMP.
You can place a CALL instruction which works same as a JMP but pushes the returnaddress on the stack before jumping. You can also just push the address on the stack and then call RETN which jumps to the last adddress on stack and therefore behaves like a JMP.
Most anticheats figured that out and scan for those byte sequences.

IAT/EAT Hooking

Execute-Speed: 10
Skill-Level: 3
Detectionrate: 5

This hooking method is based on how the PE files are working on windows.
It means “Import/Export Address Table”. This address table contains the pointer to the APIs and is adjusted by the PE loader when the file is executed.
You can either loop the whole table and search for a function and redirect it or you can find it manually using OllyDbg or IDA.
The basic idea is that you replace a certain API with your hooked function.
Thats not only good for simple API hooking but it can also be used for a DirectX hook: http://www.unknowncheats.me/forum/d3…ok-any-os.html

VMT Hooking / Pointer redirections

Execute-Speed: 10
Skill-Level: 3 – 5
Detectionrate: 3

One of the best hooking methods because there is no API or basic way to detect those hooks.
Most anticheats detect VMT hooks on the D3D-Device of the engine but thats not what we want to do anyways.
Nearly every engine has an internal rendering class which can be hooked. You can for example just hook Endscene using detours and log the returnaddress.
When you check the code at the returnaddress you will find the function which calls Endscene. Now search for references to this function and reverse a bit, you will mostlikely get a pointer in the .data section which represents a virtual table.
Those tables just contain addresses of functions and can be easily replaced even without the usage of VirtualProtect because .data has normally Read/Write flags.

HWBP Hooking

Execute-Speed: 6
Skill-Level: 6
Detectionrate: 4

We already talked earlier about hardware breakpoints but this time we wont change any bytes in the .text section.
Like I said earlier you also have to place an exception handler to catch the exception!
They can be placed for each thread individually but that also means we NEED the handle of the thread.
Some anticheats hide all threads using rootkit techniques, but that doesnt mean we cant get into the thread!

PageGuard Hooking

Execute-Speed: 1
Skill-Level: 8
Detectionrate: 1

PageGuard hooks are really stealthy, nearly no AntiCheat detects them. This was detected for GameGuard but only in the game, it worked perfectly on the GameGuard file itself.
Undetected for HackShield, XignCode, Punkbuster, and more. This method can be compared to a HWBP hook. First you have to register an exception handler.
Then you have to trigger the exception, this time by marking the complete memory page with PAGE_GUARD using VirtualProtect, which will result in an exception.
When you read about PAGE_GUARD on msdn you will find out that its removed automaticly after the first exception occured.
In our exception handler we now set the single step flag and single step all instructions until we hit the address we looked for.
We can change the EIP again like we did earlier, but now we have to mark the page as PAGE_GUARD again otherwise the hook wont be triggered again!
This hooking method is slow as hell due to the usage of the single step flag and should only be used for functions which get called very rarely.

Forced Exception hooking

Execute-Speed: 5
Skill-Level: 8
Detectionrate: 2

You can force exceptions in a program by manipulating pointers and stored values.
For example you can grab the device pointer of a game and set it to null, then wait in your exception handler until the program throws an exception.
The exception itself should be a null-pointer dereference, just do your stuff in the redirected EIP hook and then reset the original values and continue the execution.
Since the pointer is now fine again it will execute until you set the pointer to null again. There are many more ways to use this but since I used that method before I know this works forreal.
You might need alot of work to fix all the exceptions which requires some skills.
Heres an example on forcing an exception: http://www.unknowncheats.me/forum/c-…struction.html

VEH Hooking (Let’s get our hands dirty!)

But why VEH? It’s slow AF. Yes it’s slow but I would not take risk byte-patching because it is prone for integrity check which may result to your account being banned. Also, other methods are not applicable such as IAT and VMT. And my last resort is VEH hooking.

Well, your choice will be dependent to situation, every methods has pros and cons. Its up to you on how you would utilize the information.

Implementation

Implementation is quite easy! Thanks to many samples out there!

LONG WINAPI Handler(EXCEPTION_POINTERS* pExceptionInfo)
{
	
	if (pExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_GUARD_PAGE_VIOLATION) //We will catch PAGE_GUARD Violation
	{
		if (pExceptionInfo->ContextRecord->XIP == (DWORD)og_fun) //Make sure we are at the address we want within the page
		{
			pExceptionInfo->ContextRecord->XIP = (DWORD)hk_fun; //Modify EIP/RIP to where we want to jump to instead of the original function
		}

		pExceptionInfo->ContextRecord->EFlags |= 0x100; //Will trigger an STATUS_SINGLE_STEP exception right after the next instruction get executed. In short, we come right back into this exception handler 1 instruction later
		return EXCEPTION_CONTINUE_EXECUTION; //Continue to next instruction
	}

	if (pExceptionInfo->ExceptionRecord->ExceptionCode == STATUS_SINGLE_STEP) //We will also catch STATUS_SINGLE_STEP, meaning we just had a PAGE_GUARD violation
	{
		//uint32_t dwOld;
		//dwOld = Controller->VirtualProtect((DWORD)og_fun, 1, PAGE_EXECUTE_READ | PAGE_GUARD); //Reapply the PAGE_GUARD flag because everytime it is triggered, it get removes

		DWORD dwOld;
		auto addr = (PVOID)og_fun;
		auto size = (SIZE_T)((int)1);
		NTSTATUS res = makesyscall<NTSTATUS>(0x50, 0x00, 0x00, 0x00, "RtlInterlockedCompareExchange64", 0x170, 0xC2, 0x14, 0x00)(GetCurrentProcess(), &addr, &size, PAGE_EXECUTE_READ | PAGE_GUARD, &dwOld);

		return EXCEPTION_CONTINUE_EXECUTION; //Continue the next instruction
	}

	return EXCEPTION_CONTINUE_SEARCH; //Keep going down the exception handling list to find the right handler IF it is not PAGE_GUARD nor SINGLE_STEP
}
bool AreInSamePage(const DWORD* Addr1, const DWORD* Addr2)
{
	MEMORY_BASIC_INFORMATION mbi1;
	if (!VirtualQuery(Addr1, &mbi1, sizeof(mbi1))) //Get Page information for Addr1
		return true;

	MEMORY_BASIC_INFORMATION mbi2;
	if (!VirtualQuery(Addr2, &mbi2, sizeof(mbi2))) //Get Page information for Addr1
		return true;

	if (mbi1.BaseAddress == mbi2.BaseAddress) //See if the two pages start at the same Base Address
		return true; //Both addresses are in the same page, abort hooking!

	return false;
}
bool Hook(DWORD original_fun, DWORD hooked_fun)
{
	og_fun = original_fun;
	hk_fun = hooked_fun;

	//We cannot hook two functions in the same page, because we will cause an infinite callback
	if (AreInSamePage((const DWORD*)og_fun, (const DWORD*)hk_fun))
		return false;

	//Register the Custom Exception Handler
	VEH_Handle = AddVectoredExceptionHandler(true, (PVECTORED_EXCEPTION_HANDLER)LeoHandler);

	//Toggle PAGE_GUARD flag on the page
	if (VEH_Handle) {
		auto addr = (PVOID)og_fun;
		auto size = (SIZE_T)((int)1);

		if (NT_SUCCESS(makesyscall<NTSTATUS>(0x50, 0x00, 0x00, 0x00, "RtlInterlockedCompareExchange64", 0x170, 0xC2, 0x14, 0x00)(GetCurrentProcess(), &addr, &size, PAGE_EXECUTE_READ | PAGE_GUARD, &oldProtection))) {
			return true;
		}

	}
	return false;
}

POC

I hooked a function in a game that is executed every character’s action.

Conclusion

VEH is quite simple to implement, but again, it might depend on the situation you are working on. Besides, you will feel the impact on decreased performance because this is quite slow unlike other methods.

Thank you so much for reading this. I hope you enjoyed this writeup!

Through the Heaven’s Gate

Really, the title does not literary means it. This writeup is about a research but not mine. And you will see why this writeup is called “Through the Heaven’s Gate” later on.

Background

My interest in this topic started from reversing a game. This game hooks many userland functions including the ones I’m interested in, VirtualProtect and NtVirtualProtectMemory. Without this, I am unable to change protection on pages and such.

This pushes me to resolve my need via kernel driver. I map my own kernel and execute a ZwVirtualprotectmemory from there, sure, it worked. But I want to make everything stay in usermode as their Anti-cheat just stays too in ring3.

The path to solution

Luckily, I have some several contacts that helps me to resolve me problem.

Me: How can I use VirtualProtect or NtVirtualProtectMemory when it's hooked at all.
az: use syscall
Me: *after some quite time* I can't find decent articles about syscall.
az: You can syscall, and since league is wow64, you can do heaven's gate on it
Me: ???

After that conversation I was like, “WHAAAATT???”. So I then proceed to read some articles regarding this. I’m thankful to this person because he does not give the solution directly, but he did point me to the process on how I can formulate the solution. So let’s break it down!

Syscall

In computing, a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed. This may include hardware-related services (for example, accessing a hard disk drive), creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

For example, the x86 instruction set contains the instructions SYSCALL/SYSRET and SYSENTER/SYSEXIT (these two mechanisms were independently created by AMD and Intel, respectively, but in essence they do the same thing). These are “fast” control transfer instructions that are designed to quickly transfer control to the kernel for a system call without the overhead of an interrupt.[8] Linux 2.5 began using this on the x86, where available; formerly it used the INT instruction, where the system call number was placed in the EAX register before interrupt 0x80 was executed.[9][10]

https://en.wikipedia.org/wiki/System_call

But there were problem regarding this, syscall cannot be manually called from 32bit application running in a 64bit environment.

Wow64

In computing on Microsoft platforms, WoW64 (Windows 32-bit oWindows 64-bit) is a subsystem of the Windows operating system capable of running 32-bit applications on 64-bit Windows. It is included in all 64-bit versions of Windows—including Windows XP Professional x64 EditionIA-64 and x64 versions of Windows Server 2003, as well as 64-bit versions of Windows VistaWindows Server 2008Windows 7Windows 8Windows Server 2012Windows 8.1 and Windows 10. In Windows Server 2008 R2 Server Core, it is an optional component, but not in Nano Server[clarification needed]. WoW64 aims to take care of many of the differences between 32-bit Windows and 64-bit Windows, particularly involving structural changes to Windows itself.

https://en.wikipedia.org/wiki/WoW64

Let’s start reversing!

Okay, so first, I will be using Cheat Engine because it has a powerful tool that helps to enumerate dll’s. Second, I will be dissecting discord app as an example.

We’ll open up discord.
Enumerate the Dll’s
And look at that!

Look at that! Faker what was that?. We have seen two ntdll.dll, wow64.dll, wow64win.dll and wow64cpu.dll. Also, if you noticed, 3 dll’s are in 64bit address space. Remember that we cannot execute 64bit codes directly in 32bit application. So what’s happening?

Answer: WOW64

We’ll follow the traces from 32bit ntdll. Let’s trace the NtVirtualProtectMemory on it.

ZwProtectVirtualMemory in 32bit ntdll

It’s not a surprise that we might not found syscall here. But we’ll follow the call.

ntdll.RtlInterlockedCompareExchange64+170 in 32bit ntdll
wow64cpu.dll + 7000

Look at that! RAX?!! 64bit code! What is this?

In fact, on 64-bit Windows, the first piece of code to execute in *any* process, is always the 64-bit NTDLL, which takes care of initializing the process in user-mode (as a 64-bit process!). It’s only later that the Windows-on-Windows (WoW64) interface takes over, loads a 32-bit NTDLL, and execution begins in 32-bit mode through a far jump to a compatibility code segment. The 64-bit world is never entered again, except whenever the 32-bit code attempts to issue a system call. The 32-bit NTDLL that was loaded, instead of containing the expected SYSENTER instruction, actually contains a series of instructions to jump back into 64-bit mode, so that the system call can be issued with the SYSCALL instruction, and so that parameters can be sent using the x64 ABI, sign-extending as needed.

In Alex Lonescu’ blog, he said.

So, whenever you are trying to syscall a function on 32bit ntdll, it will then traverse from 32bit ntdll to 64bit ntdll via wow64 layer dll’s.

Finally! The syscall in 64bit ntdll!

To summarize,

32-bit ntdll.dll -> wow64cpu.dll’s Heaven’s Gate -> 64-bit ntdll.dll syscall-> kernel-land

The solution

We just need to copy the opcode from ZwProtectVirtualMemory in 32bit ntdll. As I said, it was already hooked so we cannot use it. Meanwhile, we can imitate the original opcodes of it before it was hooked.

template<typename T>
void makesyscall<T>::CreateShellSysCall(byte sysindex1, byte sysindex2, byte sysindex3, byte sysindex4, LPCSTR lpFuncName, DWORD offsetToFunc, byte retCode, byte ret1, byte ret2)
{
	if (!sysindex1 && !sysindex2 && !sysindex3 && !sysindex4)
		return;

#ifdef _WIN64
	byte ShellCode[]
	{
		0x4C, 0x8B, 0xD1,					//mov r10, rcx 
		0xB8, 0x00, 0x00, 0x00, 0x00,		        //mov eax, SysCallIndex
		0x0F, 0x05,					        //syscall
		0xC3								//ret				
	};

	m_pShellCode = (char*)VirtualAlloc(nullptr, sizeof(ShellCode), MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);

	if (!m_pShellCode)
		return;

	memcpy(m_pShellCode, ShellCode, sizeof(ShellCode));

	*(byte*)(m_pShellCode + 4) = sysindex1;
	*(byte*)(m_pShellCode + 5) = sysindex2;
	*(byte*)(m_pShellCode + 6) = sysindex3;
	*(byte*)(m_pShellCode + 7) = sysindex4;

#elif _WIN32
	byte ShellCode[]
	{
		0xB8, 0x00, 0x00, 0x00, 0x00,		        //mov eax, SysCallIndex
		0xBA, 0x00, 0x00, 0x00, 0x00,		        //mov edx, [function]
		0xFF, 0xD2,						//call edx
		0xC2, 0x14, 0x00								//ret
	};

	m_pShellCode = (char*)VirtualAlloc(nullptr, sizeof(ShellCode), MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);

	if (!m_pShellCode)
		return;

	memcpy(m_pShellCode, ShellCode, sizeof(ShellCode));

	*(uintptr_t*)(m_pShellCode + 6) = (uintptr_t)((DWORD)GetProcAddress(GetModuleHandleA("ntdll.dll"), lpFuncName) + offsetToFunc);

	*(byte*)(m_pShellCode + 1) = sysindex1;
	*(byte*)(m_pShellCode + 2) = sysindex2;
	*(byte*)(m_pShellCode + 3) = sysindex3;
	*(byte*)(m_pShellCode + 4) = sysindex4;

	*(byte*)(m_pShellCode + 12) = retCode;
	*(byte*)(m_pShellCode + 13) = ret1;
	*(byte*)(m_pShellCode + 14) = ret2;
#endif
}
makesyscall<NTSTATUS>(0x50, 0x00, 0x00, 0x00, "RtlInterlockedCompareExchange64", 0x170, 0xC2, 0x14, 0x00)(GetCurrentProcess(), &addr, &size, PAGE_EXECUTE_READ | PAGE_GUARD, &oldProtection)

POC

Okay, so here is it. We’ve injected the dll and got some print for debugging.

Printing base ntdll
Printing the location of Rtl…

We dumped the running executable to check the print results. And, hell yeah!

Result of dump
Check the location of Rtl…

We then therefore conclude that we have successfully bypassed basic usermode hook.

Extended usage

With all this knowledge, we can also implement heaven’s gate hook! All syscalls will then be caught, and have the option to do actions based on the syscalls as your will. But we will not cover this topics as it can be cited from another writeup: WOW64!Hooks: WOW64 Subsystem Internals and Hooking Techniques

Fig 14: https://www.fireeye.com/blog/threat-research/2020/11/wow64-subsystem-internals-and-hooking-techniques.html
NtResumeThread inline hook before transitioning through the WOW64 layer
Fig 15: https://www.fireeye.com/blog/threat-research/2020/11/wow64-subsystem-internals-and-hooking-techniques.html

Conclusion

We therefore conclude that wow64 application are able to execute 64bit syscalls via Heaven’s Gate.
A big thanks to admiralzero@UC for pointing me on the right direction. When I figured out that they hook usermode functions, I feel that was locked out and pushed to do kernel usage, but no, there was a way. And here it is, going through the heaven’s gate!

Basic ROP Chaining Attack on x86

There are lot of games that catches “cheaters” by checking the return address of a function call. After executing the function, it will return to the location of call. Anti-cheat checks the return address if it’s within the module range and whitelists ranges, else, if it’s not, you will get flagged and will result to ban.

Assembly Macros

Call

Saves procedure linking information on the stack and branches to the procedure (called procedure) specified with the destination (target) operand. The target operand specifies the address of the first instruction in the called procedure. This operand can be an immediate value, a general purpose register, or a memory location.

This instruction can be used to execute four different types of calls:

Near call
A call to a procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment call.

Far call
A call to a procedure located in a different segment than the current code segment, sometimes referred to as an intersegment call. Inter-privilege-level far call. A far call to a procedure in a segment at a different privilege level than that of the currently executing program or procedure.

Task switch
A call to a procedure located in a different task.

The latter two call types (inter-privilege-level call and task switch) can only be executed in protected mode. See the section titled “Calling Procedures Using Call and RET” in Chapter 6 of the IA-32 Intel Architecture Software Developer’s Manual, Volume 1, for additional information on near, far, and inter-privilege-level calls. See Chapter 6, Task Management, in the IA-32 Intel Architecture Software Developer’s Manual, Volume 3, for information on performing task switches with the CALL instruction.
https://c9x.me/x86/html/file_module_x86_id_26.html

push ReturnAddress — The address of the next instruction after the call
jmp SomeFunc — Change the EIP/RIP to the address of SomeFunc

Ret

Transfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction.

The optional source operand specifies the number of stack bytes to be released after the return address is popped; the default is none. This operand can be used to release parameters from the stack that were passed to the called procedure and are no longer needed. It must be used when the CALL instruction used to switch to a new procedure uses a call gate with a non-zero word count to access the new procedure. Here, the source operand for the RET instruction must specify the same number of bytes as is specified in the word count field of the call gate.

The RET instruction can be used to execute three different types of returns:

Near return
A return to a calling procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment return.

Far return
A return to a calling procedure located in a different segment than the current code segment, sometimes referred to as an intersegment return.

Inter-privilege-level far return
A far return to a different privilege level than that of the currently executing program or procedure.

The inter-privilege-level return type can only be executed in protected mode. See the section titled “Calling Procedures Using Call and RET” in Chapter 6 of the IA-32 Intel Architecture Software Developer’s Manual, Volume 1, for detailed information on near, far, and inter-privilege- level returns.

When executing a near return, the processor pops the return instruction pointer (offset) from the top of the stack into the EIP register and begins program execution at the new instruction pointer. The CS register is unchanged.

When executing a far return, the processor pops the return instruction pointer from the top of the stack into the EIP register, then pops the segment selector from the top of the stack into the CS register. The processor then begins program execution in the new code segment at the new instruction pointer.

The mechanics of an inter-privilege-level far return are similar to an intersegment return, except that the processor examines the privilege levels and access rights of the code and stack segments being returned to determine if the control transfer is allowed to be made. The DS, ES, FS, and GS segment registers are cleared by the RET instruction during an inter-privilege-level return if they refer to segments that are not allowed to be accessed at the new privilege level. Since a stack switch also occurs on an inter-privilege level return, the ESP and SS registers are loaded from the stack.

If parameters are passed to the called procedure during an inter-privilege level call, the optional source operand must be used with the RET instruction to release the parameters on the return.

Here, the parameters are released both from the called procedure’s stack and the calling procedure’s stack (that is, the stack being returned to).
https://c9x.me/x86/html/file_module_x86_id_280.html

add esp, 18h — Increase the stack pointer, decreasing the stack size, usually by the amount of arguments the function takes (that actually got pushed onto the stack and the callee is responsible for cleaning the stack). This is due to the stack “grows” downward.
pop eip — Practically pop the top of the stack into the instruction pointer, effectively “jmp” there.

Push

Decrements the stack pointer and then stores the source operand on the top of the stack. The address-size attribute of the stack segment determines the stack pointer size (16 bits or 32 bits), and the operand-size attribute of the current code segment determines the amount the stack pointer is decremented (2 bytes or 4 bytes). For example, if these address- and operand-size attributes are 32, the 32-bit ESP register (stack pointer) is decremented by 4 and, if they are 16, the 16-bit SP register is decremented by 2. (The B flag in the stack segment’s segment descriptor determines the stack’s address-size attribute, and the D flag in the current code segment’s segment descriptor, along with prefixes, determines the operand-size attribute and also the address-size attribute of the source operand.) Pushing a 16-bit operand when the stack addresssize attribute is 32 can result in a misaligned the stack pointer (that is, the stack pointer is not aligned on a doubleword boundary).

The PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. Thus, if a PUSH instruction uses a memory operand in which the ESP register is used as a base register for computing the operand address, the effective address of the operand is computed before the ESP register is decremented.

In the real-address mode, if the ESP or SP register is 1 when the PUSH instruction is executed, the processor shuts down due to a lack of stack space. No exception is generated to indicate this condition.
https://c9x.me/x86/html/file_module_x86_id_269.html

sub esp, 4 — Subtracting 4 bytes in case of 32 bits from the stack pointer, effectively increasing the stack size.
mov [esp], eax — Moving the item being pushed to where the current stack pointer is located.

Pop

Loads the value from the top of the stack to the location specified with the destination operand and then increments the stack pointer. The destination operand can be a general-purpose register, memory location, or segment register.

The address-size attribute of the stack segment determines the stack pointer size (16 bits or 32 bits-the source address size), and the operand-size attribute of the current code segment determines the amount the stack pointer is incremented (2 bytes or 4 bytes). For example, if these address- and operand-size attributes are 32, the 32-bit ESP register (stack pointer) is incremented by 4 and, if they are 16, the 16-bit SP register is incremented by 2. (The B flag in the stack segment’s segment descriptor determines the stack’s address-size attribute, and the D flag in the current code segment’s segment descriptor, along with prefixes, determines the operandsize attribute and also the address-size attribute of the destination operand.) If the destination operand is one of the segment registers DS, ES, FS, GS, or SS, the value loaded into the register must be a valid segment selector. In protected mode, popping a segment selector into a segment register automatically causes the descriptor information associated with that segment selector to be loaded into the hidden (shadow) part of the segment register and causes the selector and the descriptor information to be validated (see the “Operation” section below).

A null value (0000-0003) may be popped into the DS, ES, FS, or GS register without causing a general protection fault. However, any subsequent attempt to reference a segment whose corresponding segment register is loaded with a null value causes a general protection exception (#GP). In this situation, no memory reference occurs and the saved value of the segment register is null.

The POP instruction cannot pop a value into the CS register. To load the CS register from the stack, use the RET instruction.

If the ESP register is used as a base register for addressing a destination operand in memory, the POP instruction computes the effective address of the operand after it increments the ESP register. For the case of a 16-bit stack where ESP wraps to 0h as a result of the POP instruction, the resulting location of the memory write is processor-family-specific.

The POP ESP instruction increments the stack pointer (ESP) before data at the old top of stack is written into the destination.

A POP SS instruction inhibits all interrupts, including the NMI interrupt, until after execution of the next instruction. This action allows sequential execution of POP SS and MOV ESP, EBP instructions without the danger of having an invalid stack during an interrupt1. However, use of the LSS instruction is the preferred method of loading the SS and ESP registers.
https://c9x.me/x86/html/file_module_x86_id_248.html

mov eax, [esp] — Move the value on top of the stack into whatever is being pop into.
add esp, 4 — To increase the esp, reducing the size of the stack.

Gadget/ROP Chaining

The idea
The technicality

Let’s get our hands dirty!

WARNING: ALL DETAILS BELOW ARE FOR EDUCATIONAL PURPOSES ONLY.

Now, our goal is to spoof the return address so we will not be having troubles with the return checks, thus, we will not get our account banned.

Normal call

As you can see in the example image, we have our application module that ranges from 0x500000 until 0x600000. The only valid return address should be in this range, otherwise the application will know that we are calling the function from different module.

Now to get things complicated, what if our function call is outside of the application module? Say, it was from an injected DLL?

Call outside of main module

As you can see above, we are calling the function somewhere from 0x700000 ~ 0x800000 which is not a valid range for return check, and would result our account to being banned.

Hands-on: Our target application (Game)

As we check the function we want to call, there is a return check inside of it.

Return check

The Solution

	static void Engine::CastSpellSelf(int SlotID) {
		if (me->IsAlive()) {
			DWORD spellbook = (DWORD)me + (DWORD)oObjSpellBook;
			auto spellslot = me->GetSpellSlotByID(SlotID);
			Vector* objPos = &me->GetPos();
			Vector* mePos = &me->GetPos();
			DWORD objNetworkID = 0;
			DWORD SpoofAddress = (DWORD)GetModuleHandle(NULL) + (DWORD)oRetAddr; //retn instruction
			DWORD CastSpellAddr = (DWORD)GetModuleHandle(NULL) + (DWORD)oCastSpell;//CastSpell


			if (((*(DWORD*)SpoofAddress) & 0xFF) != 0xC3)
				return; //This isn't the instruction we're looking for

			__asm
			{
				push retnHere //address of our function,  
					mov ecx, spellbook //If the function is a __thiscall don't forget to set ECX
					push objNetworkID
					push mePos
					push objPos
					push SlotID
					push spellslot
					push SpoofAddress
					jmp CastSpellAddr
				retnHere :
			}
		}
	}

As you can see above, from the line 18 to line 23, that is our original function parameters. In line 24, I also pushed the SpoofAddress, which is our gadget.

Our gadget

When the function has finished executing, it will pop to our gadget, then it will hit the return instruction back where we originally called the function (outside of the application). The return address will be our gadget, which is inside the application module, thus successfully bypassing the return check.

Additional Note (Another example)

The function above is a __thiscall function. As per microsoft documentation, the function will clean the passed parameters itself that’s why our gadget has only retn instruction. On other case, if it does not clean the passed parameters, then you might want to find a gadget inside the application module that does pop the passed parameters before the retn.

Target function

The above function will be our target and we want to spoof the return address when we call it. Since its __cdecl, we want to clean our own parameters after executing the function. Just find a gadget inside the module that has the ff instructions:

add esp, 28 
ret

We need to clean the esp stack by size of 28, which comes from the parameters. We have 7 parameters so the formula will be 7 x 4bytes = 28, then return.

Thankfully, there is a site where you can easily transform instructions to opcodes so you can easily search the module.

Instruction to opcode
IDA: Binary Search
A lot of results you can choose from

Testing if our spoof works

It’s easy to tell if your spoof works. Just run the application and see if you will get banned after a few days ???

BAN

Or just write a code that gets the value of variable where the flag is being stored.

If you are lucky enough in bypassing the check, then you are now safe from bans (or you just think so).

Sample

Conclusion

First of all, I want to say thank you to the people of UC for giving some quite good materials and resources. Second, I want to thank PITSF for inspiring a lot of people who’s interested in ethical hacking and security. Mabuhay po kayo. And last but not the least, I want to thank the readers who finished reading this post. I am sorry if there are grammatical/terminology errors, English is not my mother tongue.

ROP Chaining Attack is easy to execute but having additional layer of security is enough to catch intruders to the system. Some anti-cheat enumerate the modules, some implements whitelist of modules, some hook the system functions for them to have advantage on control of system, and etc.

Once again, thank you so much!