This won’t get too long. Just a quick fix for heavens gate hook (http://mark.rxmsolutions.com/through-the-heavens-gate/) as Microsoft updates the wow64cpu.dll that manages the translation from 32bit to 64bit syscalls of WoW64 applications.
To better visualize the change, here is the comparison of before and after.
Prior to 22h2, down until win10.
win11 22h2
With that being said, you cannot place a hook on 0x3010 as it would take a size of 8 bytes replacement. And would destroy the call mechanism even if you fix the displacement of call.
The solution
The solution is pretty simple. As in very very simple. Copy all the bytes from 0x3010 down until 0x302D. Fix the displacement only for the copied jmp at 0x3028. Then place the hook at 0x3010. Basically, the copied gate (via VirtualAlloc or Codecave) will continue execution from original 0x3010. And so, the original 0x3015 and onwards will not be executed ever again.
Pretty easy right?
Notes
In the past, Microsoft tends to use far jump to set the CS:33. CS:33 signify that the execution will be a long 64 bit mode in order to translate from 32bit to 64bit. Now, they managed to create bridge without the need for far jmp. Lot of readings need to be cited in order to understand these new mechanism but please do let me know!
I am now close at finishing the HTB Junior Pentester role course but decided to take a quick brake and focus on one of my favorite fields: reversing games and evading anti-cheat.
The goal
The end goal is simple, to bypass the Cheat Engine for usermode anti-cheats and allow us to debug a game using type-1 hypervisor.
This writeup will be divided into 3 parts.
First will be the concept of Direct Kernel Object Manipulation to make a process unlink from eprocess struct.
Second, the concept of hypervisor for debugging.
And lastly, is the concept of Patchguard, Driver Signature Enforcement and how to disable those.
So without further ado, let’s get our hands dirty!
In kernel mode, the program has direct and unrestricted access to system resources.
In user mode, the application program executes and starts.
Interruptions
In Kernel mode, the whole operating system might go down if an interrupt occurs
In user mode, a single process fails if an interrupt occurs.
Modes
Kernel mode is also known as the master mode, privileged mode, or system mode.
User mode is also known as the unprivileged mode, restricted mode, or slave mode.
Virtual address space
In kernel mode, all processes share a single virtual address space.
In user mode, all processes get separate virtual address space.
Level of privilege
In kernel mode, the applications have more privileges as compared to user mode.
While in user mode the applications have fewer privileges.
Restrictions
As kernel mode can access both the user programs as well as the kernel programs there are no restrictions.
While user mode needs to access kernel programs as it cannot directly access them.
Mode bit value
The mode bit of kernel-mode is 0.
While; the mode bit of user-mode is 3.
Memory References
It is capable of referencing both memory areas.
It can only make references to memory allocated for user mode.
System Crash
A system crash in kernel mode is severe and makes things more complicated.
In user mode, a system crash can be recovered by simply resuming the session.
Access
Only essential functionality is permitted to operate in this mode.
User programs can access and execute in this mode for a given system.
Functionality
The kernel mode can refer to any memory block in the system and can also direct the CPU for the execution of an instruction, making it a very potent and significant mode.
The user mode is a standard and typical viewing mode, which implies that information cannot be executed on its own or reference any memory block; it needs an Application Protocol Interface (API) to achieve these things.
Basically, if the anti-cheat resides only in usermode, then the anti-cheat doesn’t have the total control of the system. If you manage to get into the kernelmode, then you can easily manipulate all objects and events in the usermode. However, it is not advised to do the whole cheat in the kernel alone. One single mistake can cause Blue Screen Of Death, but we do need the kernel to allow us for easy read and write on processes.
EPROCESS
The EPROCESS structure is an opaque structure that serves as the process object for a process.
Some routines, such as PsGetProcessCreateTimeQuadPart, use EPROCESS to identify the process to operate on. Drivers can use the PsGetCurrentProcess routine to obtain a pointer to the process object for the current process and can use the ObReferenceObjectByHandle routine to obtain a pointer to the process object that is associated with the specified handle. The PsInitialSystemProcess global variable points to the process object for the system process.
Note that a process object is an Object Manager object. Drivers should use Object Manager routines such as ObReferenceObject and ObDereferenceObject to maintain the object’s reference count.
Each list element in LIST_ENTRY is linked towards the next application pointer (flink) and also backwards (blink) which then from a circular list pattern. Each application opened is added to the list, and removed also when closed.
Now here comes the juicy part!
Unlinking the process
Basically, removing the pointer of an application in the ActiveProcessLinks, means the application will now be invisible from other process enumeration. But don’t get me wrong. This is still detectable especially when an anti-cheat have kernel driver because they can easily scan for unlinked patterns and/or perform memory pattern scanning.
A lot of rootkits use this method to hide their process.
adios
Visualization
Before / Original State
After Modification
Checkout this link for image credits and for also a different perspective of the attack.
Kernel Driver
NTSTATUS processHiderDeviceControl(PDEVICE_OBJECT, PIRP irp) {
auto stack = IoGetCurrentIrpStackLocation(irp);
auto status = STATUS_SUCCESS;
switch (stack->Parameters.DeviceIoControl.IoControlCode) {
case IOCTL_PROCESS_HIDE_BY_PID:
{
const auto size = stack->Parameters.DeviceIoControl.InputBufferLength;
if (size != sizeof(HANDLE)) {
status = STATUS_INVALID_BUFFER_SIZE;
}
const auto pid = *reinterpret_cast<HANDLE*>(stack->Parameters.DeviceIoControl.Type3InputBuffer);
PEPROCESS eprocessAddress = nullptr;
status = PsLookupProcessByProcessId(pid, &eprocessAddress);
if (!NT_SUCCESS(status)) {
KdPrint(("Failed to look for process by id (0x%08X)\n", status));
break;
}
Here, we can see that we are finding the eprocessAddress by using PsLookupProcessByProcessId. We will also get the offset by finding the pid in the struct. We know that ActiveProcessLinks is just below the UniqueProcessId. This might not be the best possible way because it may break on the future patches when a new element is inserted below UniqueProcessId.
Here is a table of offsets used by different windows versions if you want to use manual offsets rather than the method above.
Win7Sp0
0x188
Win7Sp1
0x188
Win8p1
0x2e8
Win10v1607
0x2f0
Win10v1703
0x2e8
Win10v1709
0x2e8
Win10v1803
0x2e8
Win10v1809
0x2e8
Win10v1903
0x2f0
Win10v1909
0x2f0
Win10v2004
0x448
Win10v20H1
0x448
Win10v2009
0x448
Win10v20H2
0x448
Win10v21H1
0x448
Win10v21H2
0x448
ActiveProcessLinks offsets
auto addr = reinterpret_cast<HANDLE*>(eprocessAddress);
LIST_ENTRY* activeProcessList = 0;
for (SIZE_T offset = 0; offset < consts::MAX_EPROCESS_SIZE / sizeof(SIZE_T*); offset++) {
if (addr[offset] == pid) {
activeProcessList = reinterpret_cast<LIST_ENTRY*>(addr + offset + 1);
break;
}
}
if (!activeProcessList) {
ObDereferenceObject(eprocessAddress);
status = STATUS_UNSUCCESSFUL;
break;
}
KdPrint(("Found address for ActiveProcessList! (0x%08X)\n", activeProcessList));
if (activeProcessList->Flink == activeProcessList && activeProcessList->Blink == activeProcessList) {
ObDereferenceObject(eprocessAddress);
status = STATUS_ALREADY_COMPLETE;
break;
}
LIST_ENTRY* prevProcess = activeProcessList->Blink;
LIST_ENTRY* nextProcess = activeProcessList->Flink;
prevProcess->Flink = nextProcess;
nextProcess->Blink = prevProcess;
We also want the process-to-be-hidden to link on its own because the pointer might not exists anymore if the linked process dies.
There are 2 problems that you need to solve first before being able to do this method.
First: You need to disable Driver Signature Enforcement
You need to load your driver to be able to execute kernel functions. You either buy a certificate to sign your own driver so you do not need to disable DSE or you can just disable DSE from windows itself. The only problem of disabling DSE is that some games requires you to have enabled DSE before playing.
Second: Bypass Patchguard
Manually messing with DKOM will result you to BSOD. They got a tons of checks. But luckily we have some ways to bypass patchguard.
These 2 will be tackled on the 3rd part of the writeup. Stay tuned!
I just got finished the Bug Bounty Hunter Job Role path from HTB. At this point, I am eligible to take HTB Certified Bug Bounty Hunter (HTB CBBH) certification. But I feel that I am still not very much confident to take it. The exam cost $210 as of this writing and allow 2 attempts. The exam runs for 7 days without proctor and it is an open note and only the sky is the limit. Check this out for more info: https://academy.hackthebox.com/preview/certifications/htb-certified-bug-bounty-hunter/
Interestingly, HTB did release a new certification called HTB Certified Penetration Testing Specialist (HTB CPTS) and this is for completing the Junior Penetration Tester Job Role path.
I am thinking to complete the said path first then take HTB CPTS before going directly with OSCP as people rate that HTB is much more harder than OSCP.
Ironically, OSCP is more considered on industry and have a much higher employment value. Who knows? HTB is actually getting ramped up for competing with OSCP and other similar certifications.
My CCNA will be expired next year, so I have to take a higher certificate to automatically renew it. My target will be CCNP Security.
With that being said, here are my certifications that I’ve been dreaming a lot:
This writeup is just a PoC on getting the handlers list in win10. This PoC was done in Win10 build 19041.
VEH is used to catch exceptions happening in the application, when the exceptions are caught, you have a chance to resolve the exceptions to avoid application crash.
Credits
Almost this whole writeup is written by Dimitri Fourny and not my original writeup but some parts of it are modified as per my Win10 build version. Please kindly visit his blog to see the original writeup.
VEH usage example
LONG NTAPI MyVEHHandler(PEXCEPTION_POINTERS ExceptionInfo) {
printf("MyVEHHandler (0x%x)\n", ExceptionInfo->ExceptionRecord->ExceptionCode);
if (ExceptionInfo->ExceptionRecord->ExceptionCode == EXCEPTION_INT_DIVIDE_BY_ZERO) {
printf(" Divide by zero at 0x%p\n", ExceptionInfo->ExceptionRecord->ExceptionAddress);
ExceptionInfo->ContextRecord->Eip += 2;
return EXCEPTION_CONTINUE_EXECUTION;
}
return EXCEPTION_CONTINUE_SEARCH;
}
int main() {
AddVectoredExceptionHandler(1, MyVEHHandler);
int except = 5;
except /= 0;
return 0;
}
There are also applications that uses this method to other matters, such as Cheat Engine to bypass basic debugger checks.
Cheat Engine VEH Debugger
Exception Path
When a CPU exception occurs, the kernel will call the function KiDispatchException (ring0) which will follow this exception to the ntdll method KiUserExceptionDispatcher (ring3). This function will call RtlDispatchException which will try to handle it via the VEH. To do it, it will read the VEH chained list via RtlCallVectoredHandlers and calling each handlers until one return EXCEPTION_CONTINUE_EXECUTION. If a handler returned EXCEPTION_CONTINUE_EXECUTION, the function RtlCallVectoredContinueHandlers is called and it will call all the continue exception handlers.
Exception Path
The VEH handlers are important because the SEH handlers are called only if no VEH handler has caught the exception, so it could be the best method to catch all exceptions if you don’t want to hook KiUserExceptionDispatcher. If you want more information about the exceptions dispatcher, 0vercl0ck has made a good paper about it.
The chained list
The VEH list is a circular linked list with the handlers functions pointers encoded:
Chained List
The exception handlers are encoded with a process cookie but you can decode them easily. If you are dumping the VEH which is inside your own process, you can just use DecodePointer and you don’t have to care about the process cookie. If it’s a remote process you can use DecodeRemotePointer but you will need to create your own function pointer with GetModuleHandle("kernel32.dll") and GetProcAddress("DecodeRemotePointer").
The solution that I have chosen is to imitate DecodePointer by getting the process cookie with ZwQueryProcessInformation and applying the same algorithm:
Even if you can find the symbol LdrpVectorHandlerList in the ntdll pdb, there is no official API to get it easily. My solution is to begin by getting a pointer to RtlpAddVectoredHandler:
RtlAddVectoredExceptionHandler
You can disassemble the method RtlAddVectoredExceptionHandler until you find the instruction call or you can just pretend that its address is always at 0x16 bytes after it:
With this, I can now walk through VEH and reverse what does the handlers do. Again, this is not my original writeup, all credits goes to Dimitri Fourny.
There are lot of games that catches “cheaters” by checking the return address of a function call. After executing the function, it will return to the location of call. Anti-cheat checks the return address if it’s within the module range and whitelists ranges, else, if it’s not, you will get flagged and will result to ban.
Assembly Macros
Call
Saves procedure linking information on the stack and branches to the procedure (called procedure) specified with the destination (target) operand. The target operand specifies the address of the first instruction in the called procedure. This operand can be an immediate value, a general purpose register, or a memory location.
This instruction can be used to execute four different types of calls:
Near call A call to a procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment call.
Far call A call to a procedure located in a different segment than the current code segment, sometimes referred to as an intersegment call. Inter-privilege-level far call. A far call to a procedure in a segment at a different privilege level than that of the currently executing program or procedure.
Task switch A call to a procedure located in a different task.
The latter two call types (inter-privilege-level call and task switch) can only be executed in protected mode. See the section titled “Calling Procedures Using Call and RET” in Chapter 6 of the IA-32 Intel Architecture Software Developer’s Manual, Volume 1, for additional information on near, far, and inter-privilege-level calls. See Chapter 6, Task Management, in the IA-32 Intel Architecture Software Developer’s Manual, Volume 3, for information on performing task switches with the CALL instruction. https://c9x.me/x86/html/file_module_x86_id_26.html
push ReturnAddress — The address of the next instruction after the call
jmp SomeFunc — Change the EIP/RIP to the address of SomeFunc
Ret
Transfers program control to a return address located on the top of the stack. The address is usually placed on the stack by a CALL instruction, and the return is made to the instruction that follows the CALL instruction.
The optional source operand specifies the number of stack bytes to be released after the return address is popped; the default is none. This operand can be used to release parameters from the stack that were passed to the called procedure and are no longer needed. It must be used when the CALL instruction used to switch to a new procedure uses a call gate with a non-zero word count to access the new procedure. Here, the source operand for the RET instruction must specify the same number of bytes as is specified in the word count field of the call gate.
The RET instruction can be used to execute three different types of returns:
Near return A return to a calling procedure within the current code segment (the segment currently pointed to by the CS register), sometimes referred to as an intrasegment return.
Far return A return to a calling procedure located in a different segment than the current code segment, sometimes referred to as an intersegment return.
Inter-privilege-level far return A far return to a different privilege level than that of the currently executing program or procedure.
The inter-privilege-level return type can only be executed in protected mode. See the section titled “Calling Procedures Using Call and RET” in Chapter 6 of the IA-32 Intel Architecture Software Developer’s Manual, Volume 1, for detailed information on near, far, and inter-privilege- level returns.
When executing a near return, the processor pops the return instruction pointer (offset) from the top of the stack into the EIP register and begins program execution at the new instruction pointer. The CS register is unchanged.
When executing a far return, the processor pops the return instruction pointer from the top of the stack into the EIP register, then pops the segment selector from the top of the stack into the CS register. The processor then begins program execution in the new code segment at the new instruction pointer.
The mechanics of an inter-privilege-level far return are similar to an intersegment return, except that the processor examines the privilege levels and access rights of the code and stack segments being returned to determine if the control transfer is allowed to be made. The DS, ES, FS, and GS segment registers are cleared by the RET instruction during an inter-privilege-level return if they refer to segments that are not allowed to be accessed at the new privilege level. Since a stack switch also occurs on an inter-privilege level return, the ESP and SS registers are loaded from the stack.
If parameters are passed to the called procedure during an inter-privilege level call, the optional source operand must be used with the RET instruction to release the parameters on the return.
add esp, 18h — Increase the stack pointer, decreasing the stack size, usually by the amount of arguments the function takes (that actually got pushed onto the stack and the callee is responsible for cleaning the stack). This is due to the stack “grows” downward.
pop eip — Practically pop the top of the stack into the instruction pointer, effectively “jmp” there.
Push
Decrements the stack pointer and then stores the source operand on the top of the stack. The address-size attribute of the stack segment determines the stack pointer size (16 bits or 32 bits), and the operand-size attribute of the current code segment determines the amount the stack pointer is decremented (2 bytes or 4 bytes). For example, if these address- and operand-size attributes are 32, the 32-bit ESP register (stack pointer) is decremented by 4 and, if they are 16, the 16-bit SP register is decremented by 2. (The B flag in the stack segment’s segment descriptor determines the stack’s address-size attribute, and the D flag in the current code segment’s segment descriptor, along with prefixes, determines the operand-size attribute and also the address-size attribute of the source operand.) Pushing a 16-bit operand when the stack addresssize attribute is 32 can result in a misaligned the stack pointer (that is, the stack pointer is not aligned on a doubleword boundary).
The PUSH ESP instruction pushes the value of the ESP register as it existed before the instruction was executed. Thus, if a PUSH instruction uses a memory operand in which the ESP register is used as a base register for computing the operand address, the effective address of the operand is computed before the ESP register is decremented.
In the real-address mode, if the ESP or SP register is 1 when the PUSH instruction is executed, the processor shuts down due to a lack of stack space. No exception is generated to indicate this condition. https://c9x.me/x86/html/file_module_x86_id_269.html
sub esp, 4 — Subtracting 4 bytes in case of 32 bits from the stack pointer, effectively increasing the stack size.
mov [esp], eax — Moving the item being pushed to where the current stack pointer is located.
Pop
Loads the value from the top of the stack to the location specified with the destination operand and then increments the stack pointer. The destination operand can be a general-purpose register, memory location, or segment register.
The address-size attribute of the stack segment determines the stack pointer size (16 bits or 32 bits-the source address size), and the operand-size attribute of the current code segment determines the amount the stack pointer is incremented (2 bytes or 4 bytes). For example, if these address- and operand-size attributes are 32, the 32-bit ESP register (stack pointer) is incremented by 4 and, if they are 16, the 16-bit SP register is incremented by 2. (The B flag in the stack segment’s segment descriptor determines the stack’s address-size attribute, and the D flag in the current code segment’s segment descriptor, along with prefixes, determines the operandsize attribute and also the address-size attribute of the destination operand.) If the destination operand is one of the segment registers DS, ES, FS, GS, or SS, the value loaded into the register must be a valid segment selector. In protected mode, popping a segment selector into a segment register automatically causes the descriptor information associated with that segment selector to be loaded into the hidden (shadow) part of the segment register and causes the selector and the descriptor information to be validated (see the “Operation” section below).
A null value (0000-0003) may be popped into the DS, ES, FS, or GS register without causing a general protection fault. However, any subsequent attempt to reference a segment whose corresponding segment register is loaded with a null value causes a general protection exception (#GP). In this situation, no memory reference occurs and the saved value of the segment register is null.
The POP instruction cannot pop a value into the CS register. To load the CS register from the stack, use the RET instruction.
If the ESP register is used as a base register for addressing a destination operand in memory, the POP instruction computes the effective address of the operand after it increments the ESP register. For the case of a 16-bit stack where ESP wraps to 0h as a result of the POP instruction, the resulting location of the memory write is processor-family-specific.
The POP ESP instruction increments the stack pointer (ESP) before data at the old top of stack is written into the destination.
A POP SS instruction inhibits all interrupts, including the NMI interrupt, until after execution of the next instruction. This action allows sequential execution of POP SS and MOV ESP, EBP instructions without the danger of having an invalid stack during an interrupt1. However, use of the LSS instruction is the preferred method of loading the SS and ESP registers. https://c9x.me/x86/html/file_module_x86_id_248.html
mov eax, [esp] — Move the value on top of the stack into whatever is being pop into.
add esp, 4 — To increase the esp, reducing the size of the stack.
Gadget/ROP Chaining
The idea
The technicality
Let’s get our hands dirty!
WARNING: ALL DETAILS BELOW ARE FOR EDUCATIONAL PURPOSES ONLY.
Now, our goal is to spoof the return address so we will not be having troubles with the return checks, thus, we will not get our account banned.
Normal call
As you can see in the example image, we have our application module that ranges from 0x500000 until 0x600000. The only valid return address should be in this range, otherwise the application will know that we are calling the function from different module.
Now to get things complicated, what if our function call is outside of the application module? Say, it was from an injected DLL?
Call outside of main module
As you can see above, we are calling the function somewhere from 0x700000 ~ 0x800000 which is not a valid range for return check, and would result our account to being banned.
Hands-on: Our target application (Game)
As we check the function we want to call, there is a return check inside of it.
Return check
The Solution
static void Engine::CastSpellSelf(int SlotID) {
if (me->IsAlive()) {
DWORD spellbook = (DWORD)me + (DWORD)oObjSpellBook;
auto spellslot = me->GetSpellSlotByID(SlotID);
Vector* objPos = &me->GetPos();
Vector* mePos = &me->GetPos();
DWORD objNetworkID = 0;
DWORD SpoofAddress = (DWORD)GetModuleHandle(NULL) + (DWORD)oRetAddr; //retn instruction
DWORD CastSpellAddr = (DWORD)GetModuleHandle(NULL) + (DWORD)oCastSpell;//CastSpell
if (((*(DWORD*)SpoofAddress) & 0xFF) != 0xC3)
return; //This isn't the instruction we're looking for
__asm
{
push retnHere //address of our function,
mov ecx, spellbook //If the function is a __thiscall don't forget to set ECX
push objNetworkID
push mePos
push objPos
push SlotID
push spellslot
push SpoofAddress
jmp CastSpellAddr
retnHere :
}
}
}
As you can see above, from the line 18 to line 23, that is our original function parameters. In line 24, I also pushed the SpoofAddress, which is our gadget.
Our gadget
When the function has finished executing, it will pop to our gadget, then it will hit the return instruction back where we originally called the function (outside of the application). The return address will be our gadget, which is inside the application module, thus successfully bypassing the return check.
Additional Note (Another example)
The function above is a __thiscall function. As per microsoft documentation, the function will clean the passed parameters itself that’s why our gadget has only retn instruction. On other case, if it does not clean the passed parameters, then you might want to find a gadget inside the application module that does pop the passed parameters before the retn.
Target function
The above function will be our target and we want to spoof the return address when we call it. Since its __cdecl, we want to clean our own parameters after executing the function. Just find a gadget inside the module that has the ff instructions:
add esp, 28
ret
We need to clean the esp stack by size of 28, which comes from the parameters. We have 7 parameters so the formula will be 7 x 4bytes = 28, then return.
Thankfully, there is a site where you can easily transform instructions to opcodes so you can easily search the module.
Instruction to opcodeIDA: Binary SearchA lot of results you can choose from
Testing if our spoof works
It’s easy to tell if your spoof works. Just run the application and see if you will get banned after a few days ???
BAN
Or just write a code that gets the value of variable where the flag is being stored.
If you are lucky enough in bypassing the check, then you are now safe from bans (or you just think so).
Sample
Conclusion
First of all, I want to say thank you to the people of UC for giving some quite good materials and resources. Second, I want to thank PITSF for inspiring a lot of people who’s interested in ethical hacking and security. Mabuhay po kayo. And last but not the least, I want to thank the readers who finished reading this post. I am sorry if there are grammatical/terminology errors, English is not my mother tongue.
ROP Chaining Attack is easy to execute but having additional layer of security is enough to catch intruders to the system. Some anti-cheat enumerate the modules, some implements whitelist of modules, some hook the system functions for them to have advantage on control of system, and etc.