[NSA2024] Task 5 – The #153 – (Reverse Engineering, Cryptography)

Disclaimer

This blog post is a part of NSA Codebreaker 2024 writeup.

The challenge content is a PURELY FICTIONAL SCENARIO created by the NSA for EDUCATIONAL PURPOSES only. The mention and use of any actual products, tools, and techniques are similarly contrived for the sake of the challenge alone, and do not represent the intent of any company, product owner, or standards body.

Any similarities to real persons, entities, or events is coincidental.

Synopsis

Great job finding out what the APT did with the LLM! GA was able to check their network logs and figure out which developer copy and pasted the malicious code; that developer works on a core library used in firmware for the U.S. Joint Cyber Tactical Vehicle (JCTV)! This is worse than we thought!

You ask GA if they can share the firmware, but they must work with their legal teams to release copies of it (even to the NSA). While you wait, you look back at the data recovered from the raid. You discover an additional drive that you haven’t yet examined, so you decide to go back and look to see if you can find anything interesting on it. Sure enough, you find an encrypted file system on it, maybe it contains something that will help!

Unfortunately, you need to find a way to decrypt it. You remember that Emiko joined the Cryptanalysis Development Program (CADP) and might have some experience with this type of thing. When you reach out, he’s immediately interested! He tells you that while the cryptography is usually solid, the implementation can often have flaws. Together you start hunting for something that will give you access to the filesystem.

What is the password to decrypt the filesystem?

Downloads

disk image of the USB drive which contains the encrypted filesystem (disk.dd.tar.gz)
Interesting files from the user’s directory (files.zip)
Interesting files from the bin/ directory (bins.zip)

Prompt

Enter the password (hope it works!)

Solution

So we start off by checking the given files.

After some poking around, we learned that we are currently doing forensics on workstation of 570RM. We also learned that 570RM sent passwords to 4C1D, PL46U3, and V3RM1N.
Another thing here is we have the public keys of 4C1D, PL46U3, and V3RM1N but not their private keys.
And lastly, we have 570RM‘s private and public key.

Going back to the sent passwords, based on context clues, we assume that they are all the same plaintext but different recipients. We cannot recover the decipher without 4C1D, PL46U3, or V3RM1N‘s private key. However, since we have 3 different ciphertexts of the same plaintexts and their public keys, we might able to recover it! With the power of Chinese Remainder Theorem.

import base64
from Crypto.PublicKey import RSA
import gmpy2

# Function to decode base64 and convert to integer
def decode_ciphertext(encrypted_message):
    return int.from_bytes(base64.b64decode(encrypted_message), byteorder='big')

# Decode ciphertexts
c1 = decode_ciphertext("B+QWncX2NQpwUWIA+1+PXw7Y9x7eL53vfixIL+N9dRMG9ZKQnOyZARtV+tG1Zfs3z/r0shpW9fhfA9kOVUw/PGx6UpIRbgRXwKd3EZ0MomhxYXeaaxkXbI2lHfCHOhcWHqsGWgaMsSYxykDe9dX8hPtVeZMwXnGKGcGaZLoQ71WNG9e1kQaMB35UozCrNeqjfrvOJu0A5jIEjZkbaiJkhv01Z9SgE9E8ToCoPU2H/6g0j0j+PnDCjCjvaBS7A2AGP+L3twl3XQmrD8GqM38kIcvvdziZoSZwaB13Uzfzli+LBXKBr9RGjwuleQTeInfSBtW9obW1/I4803mqFj7NvQ==")
c2 = decode_ciphertext("ZtMuN9EjxCv+xtsKAhl1ECIi8wIe3CVC7L1HTTBap73V6MZSEjyEf3Ea7HWyW4juyTp2+PdfDBTBmvvLOYSA2Fm3ydGXBuLav98+7nNMcfEw38x6u9NpbsC0d5qgfhks5tSaFQCkgEHH89T+yrkjT6xkJ5kw64Q+jCVWB2uygzueK5RQbmJO9qRDtiOrxN/I+GW1MLjXpiZiPZcDLnKmBbLLq0P1efakIkkRvIHrbeyyZDRvlUu2d9HLXTVKqsqAh9umxjRKTm24wGbAm1jR9iBFEdGhn2PRDPaUMKEsryjbqzGvcyr1OCr3PS8cQBoejCOLia2L/HtwbRJwMXPEqQ==")
c3 = decode_ciphertext("VRvYIQ3rOrAgQpHyInyBfNpqEHUQJEbTM89+l+Os+3BtInbawuVQ/jc/xjuRQwe40wISJPMnh+uDJZiKn2jQZCWK8AqDZN3I7BXcmvSaSLHJI0lOezlEY/7Ps60wr71YXuozxqhQwJ9dgaNSdAv0BaFPvMN1V5+HGQJfc7VqxvdFpIOq1QwVQwvq9a9HGBaUJRv/sCHDt+EHQtXHNyXJ0U1ox9YqmkOBn+nGVKK5D/WI3iMy8qPYu9F3nGYU4gx644wZSbt8Ks0aTJxKs6TYZPez5+sk0Z7qow8tvKvAXInMb4CH2CsYZnfP8EZD2OG7LpBasSOw6QiE+eL1lkxokw==")

# Extract public keys
public_key1 = RSA.import_key('''-----BEGIN RSA PUBLIC KEY-----
MIIBCAKCAQEArqHDiJwi0hddQv1LCxZcPErAT/WRD6PdUoth/ZNqbv+BZq5JIQJg
AEzeEEqh1Wafv/Ks2fMXAMsslW413zm4Lssk5+os/0JLuUje9OKAhKPTacUt4P74
ZfjDIMOIUfcFmtjcM9nQwY7e/SWXzFeSQsrSp+XdYvB3sCDZtthCUTEtW8hKtPe2
H36K+eyQKzDoMcs/BNV+XiSJoeRK1zDqrOYDNy5Jrob/q4vElEd3BlhCAnlyJg0C
wKSnTrDFDccPWJFM+cPjneSsxTyThWZ8Vr2UcZkcO0VJvFedkb0xUpiTdrHyu9l8
JBqG4CEKs+y941WxoXwNa076GMkmmbCZEQIBAw==
-----END RSA PUBLIC KEY-----''')
public_key2 = RSA.import_key('''-----BEGIN RSA PUBLIC KEY-----
MIIBCAKCAQEAt88A0ixTOgd2GpyA4ihONMkmWyEQ89vvCRVtjtcc/lp3SeXZqLpR
tSIrUt0dsBMVIss+aHrquYs7PkN2FmiHCr+uEa5mB2FvxC04iits7mbYjqoZHpHo
cZAntnSUqW4xVZJEqLh/9L/g/U5WhZ4Ta78eJFpDlo2b/vKPQQ/aBNTmCxedpK6k
KW2EEdND0etrKjh2cl4vHz6d7+OmR3X32QTDBXIjjH+nYU09xrCItfx9s27457sA
yXJ6XY1ry4/DxvAY7yRks4Zd7GynI+kUaXuzhf2WZQIKUc/BrkAnhKaZmb9p+j79
Vx5zefStg4JcFQmAMghbJ3XoUYS6DtaukwIBAw==
-----END RSA PUBLIC KEY-----''')
public_key3 = RSA.import_key('''-----BEGIN RSA PUBLIC KEY-----
MIIBCAKCAQEAyKFLqgFkvwrRt4fBSbDXVjiPdR2jo2vkrUfefAzn7YXmgcy8YM06
SWo3jNVy0/MwrMFwymFHSf31OG3WLcY9epGpg0EP4Ha7go66fy6dv47kTzEnbxSk
o4rMTRiapDFaJRWzGbfZRboS/wuQYTsk+itdMwiFMd3jt5xlDs1ULMQfS/xfcbaR
p1BX5DbdmF45CaoTzv+uBI8piGn5eAFG/Yn3L0L09xDZl5Jtw7JlMeZIo8gzOXE5
HL6eBNZ+1bi4x4dwjXHEFNyeFvbKO4EI8nPk7eRMOyZoPFoY9vrFNVlJxgL4bkaP
RxTQVVtkRsC/FEPq6fKxOnG9odDRtDsfWwIBAw==
-----END RSA PUBLIC KEY-----''')

# Get moduli
n1 = public_key1.n
n2 = public_key2.n
n3 = public_key3.n

# Implementing a custom CRT function
def custom_crt(moduli, residues):
    N = 1
    for n in moduli:
        N *= n

    result = 0
    for n_i, a_i in zip(moduli, residues):
        N_i = N // n_i
        # Modular inverse of N_i modulo n_i
        inv = gmpy2.invert(N_i, n_i)
        result += a_i * N_i * inv

    return result % N

# Use the custom CRT function to find x
x = custom_crt([n1, n2, n3], [c1, c2, c3])

# Step 4: Find the cube root of x (since e=3 for Håstad's attack)
m = gmpy2.iroot(x, 3)[0]

# Step 5: Convert the integer m back to bytes
plaintext_bytes = m.to_bytes((m.bit_length() + 7) // 8, byteorder='big')

# Step 6: Strip the PKCS#1 v1.5 padding
if plaintext_bytes.startswith(b'\x00\x02'):
    # Find the first occurrence of \x00 after the padding
    separator_index = plaintext_bytes.find(b'\x00', 2)
    if separator_index != -1:
        plaintext_bytes = plaintext_bytes[separator_index + 1:]

# Convert to string and print the recovered plaintext
plaintext = plaintext_bytes.decode('utf-8', errors='ignore')
print("Recovered plaintext:", plaintext)

Got it!

We will comeback to this information later. But for now, we need to check other binaries.

We now shift our focus to pm binary that was under bins.zip. Upon inspecting it, we learned that it was an application built from python. So we use https://pyinstxtractor-web.netlify.app/ to extract the .pyc. After that, we will now use https://pylingual.io/ to convert .pyc to human readable format.

Upon poking, we learned that the password files have a structure of: first 16 bytes are IV, and the rest are the ciphertext. We also aren’t successful in recovering the master password.

Upon investigating further, we are able to see that AWS and USB password files do have the same key.

And since we have the ciphertext and IVs of both AWS and USB, and we also have the plaintext of AWS from earlier engagement, then therefore we will be able to use Key Stream Cipher Attack to recover the plaintext of USB.

from Crypto.Cipher import AES
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
import hashlib
import time

# Provided data
aws_encrypted = b'\x7c\x30\x03\x7e\xec\x00\xb2\xe1\xf4\x09\xea\x92\x27\x2d\x1e\x80\x71\x89\x2b\xb9\xa1\x4e\x39\xf2\x05\x7b\xda\xa6\x83\x0b\x2d\xbc\xff\xdc'
aws_plaintext = "r2s^PKT=lW2L(wmG06"
usb_encrypted = b'\x7c\x30\x03\x7e\xec\x00\xb2\xe1\xf4\x09\xea\x92\x27\x2d\x1e\x80\x38\xc8\x01\xdb\xa5\x43\x5c\xe2\x2c\x76\x8b\xc0\xdd\x54\x2e\xac\x1b\xbb'


# Extract IV (first 16 bytes)
iv = usb_encrypted[:16]

# Extract ciphertexts (excluding IV)
usb_ciphertext = usb_encrypted[16:]
aws_ciphertext = aws_encrypted[16:]

# Convert plaintext to bytes
aws_plaintext_bytes = aws_plaintext.encode()

# Function to derive the keystream and decrypt usb_ciphertext until non-UTF-8 encountered
def brute_force_until_invalid_utf8(aws_plaintext_bytes, aws_ciphertext, usb_ciphertext):
    possible_plaintexts = []
    # Brute-force until a non-UTF-8 character is encountered
    for length in range(1, len(aws_plaintext_bytes) + 1):
        # Derive the partial keystream for the current length
        keystream = bytes(c ^ p for c, p in zip(aws_ciphertext[:length], aws_plaintext_bytes[:length]))

        # Attempt to recover the usb plaintext using the partial keystream
        usb_plaintext = bytes(c ^ k for c, k in zip(usb_ciphertext[:length], keystream))
        
        # Attempt to recover aws plaintext to validate against the original plaintext
        aws_recovered = bytes(c ^ k for c, k in zip(aws_ciphertext[:length], keystream))
        
        try:
            usb_plaintext_string = usb_plaintext.decode('utf-8')
            aws_recovered_string = aws_recovered.decode('utf-8')
            
            # Check if aws_recovered matches aws_plaintext for validation
            if aws_recovered_string == aws_plaintext[:length]:
                possible_plaintexts.append((length, usb_plaintext_string, aws_recovered_string))
        except UnicodeDecodeError:
            # Stop if non-UTF-8 character is encountered
            break
    
    return possible_plaintexts

# Perform brute-force decryption
decrypted_plaintexts = brute_force_until_invalid_utf8(aws_plaintext_bytes, aws_ciphertext, usb_ciphertext)

# Output all possible plaintexts to a file
with open('usb_plaintext.txt', 'w') as f:
    for length, usb_plaintext, aws_recovered in decrypted_plaintexts:
        f.write(f"{usb_plaintext}\n")

Upon some iterations, I learned that the 17th and 18th characters are non-ascii printable. So there might be some collision happening. So what I did was to create a script to bruteforce the last 2 characters to forcefully unlock the USB.

We need to mount the disk.dd first by using the following commands below.

Next, once the disk is mounted, we should now see the unlock and lock binaries.

Now, here is the bruteforce script to unlock the USB content.

import subprocess
import string
import sys
from itertools import product

# Define the fixed prefix of the password
fixed_prefix = ";sY<TF1-EZc*v(nW"

# Character set for brute-forcing the last two characters
charset = string.ascii_letters + string.digits + string.punctuation

# Function to attempt unlocking
def try_password(password):
    process = subprocess.Popen(['/mnt/unlock'], stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    stdout, stderr = process.communicate(input=f"{password}\n".encode())
    
    # Check if the output does not contain "Password incorrect."
    if b"Password incorrect." not in stdout + stderr:
        return True
    return False

# Generate all combinations of two characters from the charset
for combo in product(charset, repeat=2):
    # Form the password by appending the brute-forced characters to the fixed prefix
    password = fixed_prefix + ''.join(combo)
    
    # Try the password
    if try_password(password):
        print(f"Password found: {password}")
        sys.exit(0)  # Exit the script once a valid password is found
    else:
        print(f"Tried password: {password} - Incorrect")

The 3 files are needed for Task 6 and Task 7. The only thing needed to submit to complete the task 5 is the password.

[NSA2024] Task 4 – LLMs never lie – (Programming, Forensics)

Disclaimer

This blog post is a part of NSA Codebreaker 2024 writeup.

The challenge content is a PURELY FICTIONAL SCENARIO created by the NSA for EDUCATIONAL PURPOSES only. The mention and use of any actual products, tools, and techniques are similarly contrived for the sake of the challenge alone, and do not represent the intent of any company, product owner, or standards body.

Any similarities to real persons, entities, or events is coincidental.

Synopsis

Great work! With a credible threat proven, NSA’s Cybersecurity Collaboration Center reaches out to GA and discloses the vulnerability with some indicators of compromise (IoCs) to scan for.

New scan reports in hand, GA’s SOC is confident they’ve been breached using this attack vector. They’ve put in a request for support from NSA, and Barry is now tasked with assisting with the incident response.

While engaging the development teams directly at GA, you discover that their software engineers rely heavily on an offline LLM to assist in their workflows. A handful of developers vaguely recall once getting some confusing additions to their responses but can’t remember the specifics.

Barry asked for a copy of the proprietary LLM model, but approvals will take too long. Meanwhile, he was able to engage GA’s IT Security to retrieve partial audit logs for the developers and access to a caching proxy for the developers’ site.

Barry is great at DFIR, but he knows what he doesn’t know, and LLMs are outside of his wheelhouse for now. Your mutual friend Dominique was always interested in GAI and now works in Research Directorate.

The developers use the LLM for help during their work duties, and their AUP allows for limited personal use. GA IT Security has bound the audit log to an estimated time period and filtered it to specific processes. Barry sent a client certificate for you to authenticate securely with the caching proxy using https://[REDACTED]/?q=query%20string.

You bring Dominique up to speed on the importance of the mission. They receive a nod from their management to spend some cycles with you looking at the artifacts. You send the audit logs their way and get to work looking at this one.

Find any snippet that has been purposefully altered.

Downloads

TTY audit log of a developer’s shell activity (audit.log)

Prompt

A maliciously altered line from a code snippet

Solution

After downloading the files, we are greeted a huge audit log file that contents some keyboard strokes and escaped characters.

We need to make a parser for this to make this human readable.

import sys
import re

# Precompile the regular expression for CSI (Control Sequence Introducer) sequences
CSI_PATTERN = re.compile(r'\x1b\[(.*?)([@-~])')

def process_line(line, history):
    # Try to decode the escaped sequences to actual control characters
    try:
        line_decoded = bytes(line, "utf-8").decode("unicode_escape")
    except UnicodeDecodeError:
        # If decoding fails, return the line as-is
        return line.strip()
    
    buffer = []
    cursor = 0
    i = 0
    interrupted = False  # Flag to indicate if Ctrl+C was pressed
    history_index = len(history)  # Start at the end of history (no history navigation)
    while i < len(line_decoded):
        c = line_decoded[i]
        # Handle control characters and escape sequences
        if c == '\x03':  # Ctrl+C (Interrupt)
            # Indicate that Ctrl+C was pressed
            interrupted = True
            i += 1
            break  # Stop processing the current line
        elif c == '\x1b':  # Escape character
            # Check if it's a CSI sequence
            if i + 1 < len(line_decoded) and line_decoded[i + 1] == '[':
                # Try to match CSI sequence
                m = CSI_PATTERN.match(line_decoded, i)
                if m:
                    full_seq = m.group(0)
                    params = m.group(1)
                    final_byte = m.group(2)
                    seq_length = len(full_seq)
                    # Now process known CSI sequences
                    if full_seq == '\x1b[H':  # Cursor to Home
                        cursor = 0
                    elif full_seq == '\x1b[2J':  # Clear Screen
                        buffer = []
                        cursor = 0
                    elif full_seq == '\x1b[3~':  # Delete key
                        if cursor < len(buffer):
                            del buffer[cursor]
                    elif full_seq == '\x1b[D':  # Left Arrow
                        if cursor > 0:
                            cursor -= 1
                    elif full_seq == '\x1b[C':  # Right Arrow
                        if cursor < len(buffer):
                            cursor += 1
                    elif full_seq == '\x1b[A':  # Up Arrow (Previous Command)
                        if history:
                            history_index = max(history_index - 1, 0)
                            buffer = list(history[history_index])
                            cursor = len(buffer)
                    elif full_seq == '\x1b[B':  # Down Arrow (Next Command)
                        if history:
                            history_index = min(history_index + 1, len(history))
                            if history_index < len(history):
                                buffer = list(history[history_index])
                                cursor = len(buffer)
                            else:
                                # If beyond the latest command, clear buffer
                                buffer = []
                                cursor = 0
                    else:
                        # For unhandled CSI sequences, leave them escaped
                        buffer.insert(cursor, full_seq)
                        cursor += len(full_seq)
                    # Advance index by length of the sequence
                    i += seq_length
                    continue
                else:
                    # Unrecognized CSI sequence, leave it escaped
                    escaped_seq = line_decoded[i].encode('unicode_escape').decode()
                    buffer.insert(cursor, escaped_seq)
                    cursor += len(escaped_seq)
                    i += 1
            else:
                # Not a CSI sequence, leave it escaped
                escaped_seq = line_decoded[i].encode('unicode_escape').decode()
                buffer.insert(cursor, escaped_seq)
                cursor += len(escaped_seq)
                i += 1
        elif c == '\x08':  # Backspace
            if cursor > 0:
                del buffer[cursor - 1]
                cursor -= 1
            i += 1
        elif c == '\x01':  # Ctrl+A (Home)
            cursor = 0
            i += 1
        elif c == '\x05':  # Ctrl+E (End)
            cursor = len(buffer)
            i += 1
        elif c == '\x0d' or c == '\x0a':  # Carriage Return (Enter) or Line Feed (Newline)
            # End of command; break if needed
            i += 1
            break  # Stop processing the current line
        else:
            # Insert character at cursor position
            buffer.insert(cursor, c)
            cursor += 1
            i += 1
    command = ''.join(buffer).strip()
    if interrupted:
        command += ' [Ctrl+C pressed]'
    return command

def parse_file_content(input_file):
    commands = []
    with open(input_file, 'r', encoding='utf-8') as f:
        lines = f.readlines()
        i = 0
        while i < len(lines):
            line = lines[i].rstrip('\n')
            # Pass the command history to process_line
            processed_command = process_line(line, commands)
            if processed_command:
                commands.append(processed_command)
            i += 1
    return commands

def main():
    if len(sys.argv) != 3:
        print("Usage: python transform.py <input_file> <output_file>")
        sys.exit(1)
    input_file = sys.argv[1]
    output_file = sys.argv[2]

    # Parse the content and write the commands to the output file
    parsed_commands = parse_file_content(input_file)
    with open(output_file, 'w', encoding='utf-8') as f:
        for cmd in parsed_commands:
            f.write(cmd + '\n')

if __name__ == "__main__":
    main()

Now we have a somewhat human readable log.
Now we are interested only with the prompts.
So we remove lines that do not have gagpt keyword.

We also saw some lines with literal string Ctrl+C... so we remove those as well.

Now we remove all of the prefix characters before the actual prompt string.

The final output should look like this.

Now, we need to create a .p12 file to be able to programmatically interact with the LLM server.

import httpx
import json
import os
import argparse
import tempfile
from urllib.parse import quote
from cryptography.hazmat.primitives.serialization.pkcs12 import load_key_and_certificates
from cryptography.hazmat.primitives import serialization
from cryptography.hazmat.backends import default_backend

# Function to load .p12 file
def load_p12_certificate(p12_path, p12_password):
    with open(p12_path, "rb") as p12_file:
        p12_data = p12_file.read()
    private_key, certificate, additional_certs = load_key_and_certificates(
        p12_data, 
        p12_password.encode(), 
        default_backend()
    )
    return private_key, certificate

# Function to create a filename-safe version of a query string
def sanitize_filename(query, max_length=255):
    # Replace spaces with underscores and remove invalid filename characters
    sanitized = ''.join(c if c.isalnum() or c in ['_', '-'] else '_' for c in query)
    # Truncate if too long, and leave space for file extension
    if len(sanitized) > max_length - 5:  # Reserving space for ".json"
        sanitized = sanitized[:max_length - 5]
    return sanitized

# Function to make a request and save response as JSON
def make_request_and_save(query, client, save_location):
    url = f"https://[REDACTED]/?q={quote(query)}"
    headers = {
        "User-Agent": "Mozilla/5.0 (X11; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/115.0",
        "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8",
        "Accept-Language": "en-US,en;q=0.5",
        "Upgrade-Insecure-Requests": "1",
        "Sec-Fetch-Dest": "document",
        "Sec-Fetch-Mode": "navigate",
        "Sec-Fetch-Site": "none",
        "Sec-Fetch-User": "?1",
        "Te": "trailers"
    }
    
    response = client.get(url, headers=headers)
    
    body = response.json()
    output_data = {
        "q": query,
        "body": body
    }
    
    # Sanitize filename and save as JSON
    filename = sanitize_filename(query) + ".json"
    full_path = os.path.join(save_location, filename)
    with open(full_path, "w") as f:
        json.dump(output_data, f, indent=4)
    print(f"Saved response to {full_path}")

# Main function to read queries and perform requests
def main(p12_path, p12_password, queries_file, save_location):
    private_key, certificate = load_p12_certificate(p12_path, p12_password)
    
    # Serialize private key and certificate to PEM format
    private_key_pem = private_key.private_bytes(
        encoding=serialization.Encoding.PEM,
        format=serialization.PrivateFormat.TraditionalOpenSSL,
        encryption_algorithm=serialization.NoEncryption()
    )
    certificate_pem = certificate.public_bytes(serialization.Encoding.PEM)
    
    # Create temporary files for the certificate and private key
    with tempfile.NamedTemporaryFile(delete=False) as cert_file, tempfile.NamedTemporaryFile(delete=False) as key_file:
        cert_file.write(certificate_pem)
        key_file.write(private_key_pem)
        cert_file_path = cert_file.name
        key_file_path = key_file.name
    
    # Using httpx client with HTTP/2 and certificate for mutual TLS
    with httpx.Client(http2=True, verify=False, cert=(cert_file_path, key_file_path)) as client:
        # Read queries from the file
        with open(queries_file, "r") as f:
            queries = [line.strip().strip('"') for line in f.readlines() if line.strip()]
        
        # Process each query
        for query in queries:
            make_request_and_save(query, client, save_location)
    
    # Clean up temporary files
    os.remove(cert_file_path)
    os.remove(key_file_path)

if __name__ == "__main__":
    parser = argparse.ArgumentParser(description="Scraper to make HTTP/2 requests with a .p12 key and save results as JSON.")
    parser.add_argument("--p12", required=True, help="Path to the .p12 certificate file.")
    parser.add_argument("--p12_password", required=True, help="Password for the .p12 certificate file.")
    parser.add_argument("--queries_file", required=True, help="Path to the file containing queries.")
    parser.add_argument("--save_location", required=True, help="Directory to save the resulting JSON files.")
    
    args = parser.parse_args()
    
    # Create save directory if it does not exist
    os.makedirs(args.save_location, exist_ok=True)
    
    main(args.p12, args.p12_password, args.queries_file, args.save_location)

Now, there would be a lot of results.

There are few errors from the queries that we must also correct.

Now using burpsuite, connect to the server.
Let’s setup the burpsuite first by importing the .p12 file.

Then manually pull the data that has not been able to pull due to character encoding problem.

Then repeat for other data as well.

When it’s done, we are now ready to build all json files into one big json file.

import json
import glob
import sys
import os

def combine_json_files(input_dir, output_file):
    # Ensure input directory exists
    if not os.path.isdir(input_dir):
        print(f"Error: The directory '{input_dir}' does not exist.")
        return

    # Find all JSON files in the input directory
    json_files = glob.glob(os.path.join(input_dir, '*.json'))

    if not json_files:
        print(f"No JSON files found in directory '{input_dir}'.")
        return

    combined_json = []

    # Read and combine all JSON files
    for file in json_files:
        with open(file, 'r') as f:
            data = json.load(f)
            combined_json.append(data)

    # Save the combined JSON array to the specified output file
    with open(output_file, 'w') as output_file:
        json.dump(combined_json, output_file, indent=4)

    print(f"Combined JSON file created successfully as '{output_file}'.")

if __name__ == '__main__':
    if len(sys.argv) != 3:
        print("Usage: python combine_json.py <input_directory> <output_file>")
        sys.exit(1)
    
    input_directory = sys.argv[1]
    output_filename = sys.argv[2]
    
    combine_json_files(input_directory, output_filename)

In the first line, make it a json variable.

Now use this html template so we can view the json files in human readable display.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>JSON Viewer</title>
    <style>
        #container {
            max-width: 800px;
            margin: auto;
            padding: 20px;
            border: 1px solid #ccc;
            border-radius: 5px;
            font-family: Arial, sans-serif;
        }
        #markdown {
            padding: 10px;
            border: 1px solid #ddd;
            border-radius: 5px;
            background-color: #f9f9f9;
        }
        #buttons {
            margin-top: 20px;
            text-align: center;
        }
        button {
            margin: 5px;
            padding: 10px;
        }
    </style>
    <!-- Correct CDN link for marked.js -->
    <script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
    <!-- Load JSON data as a script -->
    <script src="./gagpt_catalogue.js"></script>
</head>
<body>
    <div id="container">
        <h2>JSON Viewer</h2>
        <h3 id="prompt"></h3>
        <div id="markdown"></div>
        <div id="buttons">
            <button onclick="previousObject()">Previous</button>
            <button onclick="nextObject()">Next</button>
        </div>
    </div>

    <script>
        let currentIndex = 0;
        let data = [];

        document.addEventListener('DOMContentLoaded', () => {
            // Assign the loaded JSON data to the variable
            data = jsonData;
            displayObject(currentIndex);
        });

        // Display the current object
        function displayObject(index) {
            const obj = data[index];
            const promptText = obj.body.prompt;
            const fulfillmentText = obj.body.fulfillment[0].text;

            document.getElementById('prompt').textContent = promptText;
            document.getElementById('markdown').innerHTML = marked.parse(fulfillmentText);
        }

        // Navigate to the next object
        function nextObject() {
            if (currentIndex < data.length - 1) {
                currentIndex++;
                displayObject(currentIndex);
            }
        }

        // Navigate to the previous object
        function previousObject() {
            if (currentIndex > 0) {
                currentIndex--;
                displayObject(currentIndex);
            }
        }
    </script>
</body>
</html>

We have now a human readable display of prompts.

The last piece of the puzzle is to review all of these and find something that is suspicious.

And yes, it is very time consuming especially when you don’t know what exactly you are looking for.

[NSA2024] Task 3 – How did they get in? – (Reverse Engineering, Vulnerability Research)

Disclaimer

This blog post is a part of NSA Codebreaker 2024 writeup.

The challenge content is a PURELY FICTIONAL SCENARIO created by the NSA for EDUCATIONAL PURPOSES only. The mention and use of any actual products, tools, and techniques are similarly contrived for the sake of the challenge alone, and do not represent the intent of any company, product owner, or standards body.

Any similarities to real persons, entities, or events is coincidental.

Synopsis

Great work finding those files! Barry shares the files you extracted with the blue team who share it back to Aaliyah and her team. As a first step, she ran strings across all the files found and noticed a reference to a known DIB, “Guardian Armaments” She begins connecting some dots and wonders if there is a connection between the software and the hardware tokens. But what is it used for and is there a viable threat to Guardian Armaments (GA)?

She knows the Malware Reverse Engineers are experts at taking software apart and figuring out what it’s doing. Aaliyah reaches out to them and keeps you in the loop. Looking at the email, you realize your friend Ceylan is touring on that team! She is on her first tour of the Computer Network Operations Development Program

Barry opens up a group chat with three of you. He wants to see the outcome of the work you two have already contributed to. Ceylan shares her screen with you as she begins to reverse the software. You and Barry grab some coffee and knuckle down to help.

Figure out how the APT would use this software to their benefit

Downloads

Executable from ZFS filesystem (server)
Retrieved from the facility, could be important? (shredded.jpg)

Prompt

Enter a valid JSON that contains the (3 interesting) keys and specific values that would have been logged if you had successfully leveraged the running software. Do ALL your work in lower case.

Solution

I downloaded the attachments and start poking the files. Based from the error, it seems like it connects to a server or some sort.

Upon digging more, it seems like this is an application with protobuf definitions.

Now, we will extract protobuf definitions using https://github.com/arkadiyt/protodump

Since we have now the protobuf definition, we can now create a server simulator to observe the behavior and dig more deep in the application. We can use https://pypi.org/project/grpcio-tools/

We can now write our server simulator.

from concurrent import futures
import time
import grpc
import logging

import auth_pb2
import auth_pb2_grpc

_ONE_DAY_IN_SECONDS = 60 * 60 * 24

# Configure logging
logging.basicConfig(
    level=logging.DEBUG,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s"
)

# Interceptor for handling logging on method not found or decoding errors
class LoggingInterceptor(grpc.ServerInterceptor):
    def intercept_service(self, continuation, handler_call_details):
        method = handler_call_details.method
        logging.info(f"Incoming request for method: {method}")
        
        try:
            # Call the actual service method
            response = continuation(handler_call_details)
            return response
        except grpc.RpcError as e:
            # Log error details
            if e.code() == grpc.StatusCode.UNIMPLEMENTED:
                logging.error(f"Method not found: {method}")
            elif e.code() == grpc.StatusCode.INVALID_ARGUMENT:
                logging.error(f"Request decoding error for method: {method}")
            else:
                logging.error(f"Error during request handling: {e}")
            raise e

class AuthService(auth_pb2_grpc.AuthServiceServicer):
    def log_metadata(self, context):
        # Log the incoming metadata (headers)
        metadata = context.invocation_metadata()
        logging.info("Received headers:")
        for key, value in metadata:
            logging.info(f"{key}: {value}")

    def Ping(self, request, context):
        # Log headers and request details
        self.log_metadata(context)
        logging.debug(f"Received Ping request: {request}")
        
        # Return the Ping response
        return auth_pb2.PingResponse(response=1)

    def Authenticate(self, request, context):
        # Log headers and request details
        self.log_metadata(context)
        logging.debug(f"Received Authenticate request: {request}")
        
        # Return the Authenticate response
        return auth_pb2.AuthResponse(success=True)

    def RegisterOTPSeed(self, request, context):
        # Log headers and request details
        self.log_metadata(context)
        logging.debug(f"Received RegisterOTPSeed request: {request}")
        
        # Return the RegisterOTPSeed response
        return auth_pb2.RegisterOTPSeedResponse(success=False)

    def VerifyOTP(self, request, context):
        # Log headers and request details
        self.log_metadata(context)
        logging.debug(f"Received VerifyOTP request: {request}")
        
        # Return the RegisterOTPSeed response
        return auth_pb2.VerifyOTPResponse(success=True,token="000000")

def serve():
    # Add the interceptor to the server
    server = grpc.server(
        futures.ThreadPoolExecutor(max_workers=10),
        interceptors=[LoggingInterceptor()]
    )
    
    auth_pb2_grpc.add_AuthServiceServicer_to_server(AuthService(), server)
    server.add_insecure_port("[::]:50052")
    server.start()

    # Log the server start event
    logging.info("gRPC server started on port 50052")

    try:
        while True:
            time.sleep(_ONE_DAY_IN_SECONDS)
    except KeyboardInterrupt:
        # Log the server stop event
        logging.info("Stopping gRPC server...")
        server.stop(grace=0)
        logging.info("gRPC server stopped")

if __name__ == "__main__":
    serve()

We then also create a script for a client to connect to the server running at 50051.

import argparse

import grpc

import seed_generation_pb2
import seed_generation_pb2_grpc

def run(host):
    channel = grpc.insecure_channel(host)
    stub = seed_generation_pb2_grpc.SeedGenerationServiceStub(channel)

    response = stub.GetSeed(seed_generation_pb2.GetSeedRequest(username="jasper_05376",password="test"))
    print("SeedGenerationService client received: Seed=" + str(response.seed) + ", Count=" + str(response.count))

if __name__ == "__main__":
    parser = argparse.ArgumentParser(
        description=__doc__, formatter_class=argparse.RawDescriptionHelpFormatter
    )
    parser.add_argument("--host", default="localhost:50051", help="The server host.")
    args = parser.parse_args()
    run(args.host)

Based on the context clues given, it seems like we need to submit a json that has the following keys: username, seed, and count.

Upon digging more, it seems like the seeds are deterministic, so there must be a fixed seed used in instantiation.

Another piece of information is the auth module.

The code above is not safe. The application authenticates the test but not in a safe manner.
Therefore, if we can produce a combination of username and a seed that would satisfy the conditions, we might get authenticated without the need of Auth Service.

Here is the snippet of the algorithm used by the application:

v7 = currentRand;
  for ( i = 0LL; username.len > (__int64)i; i += 4LL )
  {
    if ( username.len < (__int64)(i + 4) )
    {
      v10 = username.len - i;
      if ( username.len - i == 1 )
      {
        if ( username.len <= i )
          runtime_panicIndex();
        v9 = username.str[i];
      }
      else if ( v10 == 2 )
      {
        if ( username.len <= i )
          runtime_panicIndex();
        if ( username.len <= i + 1 )
          runtime_panicIndex();
        v9 = *(unsigned __int16 *)&username.str[i];
      }
      else if ( v10 == 3 )
      {
        if ( username.len <= i )
          runtime_panicIndex();
        if ( username.len <= i + 1 )
          runtime_panicIndex();
        if ( username.len <= i + 2 )
          runtime_panicIndex();
        v9 = *(unsigned __int16 *)&username.str[i] | (username.str[i + 2] << 16);
      }
      else
      {
        v9 = 0;
      }
    }
    else
    {
      if ( username.len <= i )
        runtime_panicIndex();
      if ( username.len <= i + 1 )
        runtime_panicIndex();
      if ( username.len <= i + 2 )
        runtime_panicIndex();
      if ( username.len <= i + 3 )
        runtime_panicIndex();
      v9 = *(_DWORD *)&username.str[i];
    }
    v7 ^= v9;
  }
  if ( v7 == -1972368894 )
  {
    // we need to get into these block by using a VALID and KNOWN username.
  }

Another context clue is the shredded.jpg file. With those context clues, we can try using jasper_05376 that was found from Task 1.

With all of the information above, we can try to emulate the algorithm to bruteforce a valid username and seed combination that would meet our goal.

package main

import (
	"encoding/binary"
	"fmt"
	"math/rand"
)

// getChunks splits a username into 4-byte chunks for XOR operations
func getChunks(username string) []uint32 {
	usernameBytes := []byte(username)
	padding := (4 - len(usernameBytes)%4) % 4
	usernameBytes = append(usernameBytes, make([]byte, padding)...)

	var chunks []uint32
	for i := 0; i < len(usernameBytes); i += 4 {
		chunks = append(chunks, binary.LittleEndian.Uint32(usernameBytes[i:i+4]))
	}
	return chunks
}

// performXOR performs XOR on the initial uVar2 with chunks from the username
func performXOR(uVar2 uint32, chunks []uint32) uint32 {
	for _, chunk := range chunks {
		uVar2 ^= chunk
	}
	return uVar2
}

// simulateAuthBypass checks if the current random value XOR'd with the username meets the bypass condition
func simulateAuthBypass(username string, uVar2Initial uint32, targetUVar2 uint32) (bool, uint32) {
	usernameChunks := getChunks(username)
	finalUVar2 := performXOR(uVar2Initial, usernameChunks)

	// Return true if the bypass condition is met
	if finalUVar2 == targetUVar2 {
		return true, finalUVar2
	}
	return false, finalUVar2
}

func main() {
	// Fixed initial random seed from the server code
	seed := int64(0x76546CC2CA2D7)
	rand.Seed(seed)

	// Bypass target value
	targetUVar2 := uint32(0x8A700A02)

	// Iterate over seed count (1,000,000,000,000,000 iterations)
	maxIterations := 1000000000000000

	username := "jasper_05376"

	fmt.Println("Starting bypass detection...")

	prevRand := int64(0)

	showNextRand := false

	for count := 1; count <= maxIterations; count++ {
		// Get the current random value (uVar2Initial is the lower 32-bits of currentRand)
		currentRand := rand.Int63()
		uVar2Initial := uint32(currentRand & 0xFFFFFFFF)

		bypass, finalUVar2 := simulateAuthBypass(username, uVar2Initial, targetUVar2)

		// this will be hit on next loop after finding the bypass
		if showNextRand {
			showNextRand = false
			fmt.Printf("NextRand: %d\n\n", currentRand)
		}

		if bypass {
			// Display the bypass information
			fmt.Printf("\nBypass detected!\n")
			fmt.Printf("Username: %s\n", username)
			fmt.Printf("Seed Count: %d\n", count)
			fmt.Printf("Initial uVar2: 0x%x\n", uVar2Initial)
			fmt.Printf("Final uVar2: 0x%x (Matches Bypass Value)\n", finalUVar2)
			fmt.Printf("CurrentRand: %d\n\n", currentRand)
			fmt.Printf("PrevRand: %d\n\n", prevRand)
			showNextRand = true
		}

		prevRand = currentRand

		// Show progress
		if count%100000000 == 0 {
			fmt.Printf("Processed %d seed iterations...\n", count)
		}
	}

	fmt.Println("Bypass detection completed.")
}

Gotcha! We have now our bypass!

username: jasper_05376
count: 181182686
seed: 350024956464939860

I am not really sure why NextRand should be the value of seed key in the json, when I try submitting the CurrentRand, it just don’t accept my answer. I also forgot if its -+1 Seed Count, I was just doing mix and match with the answers until it has been accepted by the system. It’s kinda confusing but it is what it is.

In real life, I think this is an exploit that is a very hard to do. Because the attacker must track the current seed and count of the target server, and the payload must be sent in a pixel perfect timing.

[NSA2024] Task 2 – Driving Me Crazy – (Forensics, DevOps)

Disclaimer

This blog post is a part of NSA Codebreaker 2024 writeup.

The challenge content is a PURELY FICTIONAL SCENARIO created by the NSA for EDUCATIONAL PURPOSES only. The mention and use of any actual products, tools, and techniques are similarly contrived for the sake of the challenge alone, and do not represent the intent of any company, product owner, or standards body.

Any similarities to real persons, entities, or events is coincidental.

Synopsis

Having contacted the NSA liaison at the FBI, you learn that a facility at this address is already on a FBI watchlist for suspected criminal activity.

With this tip, the FBI acquires a warrant and raids the location.

Inside they find the empty boxes of programmable OTP tokens, but the location appears to be abandoned. We’re concerned about what this APT is up to! These hardware tokens are used to secure networks used by Defense Industrial Base companies that produce critical military hardware.

The FBI sends the NSA a cache of other equipment found at the site. It is quickly assigned to an NSA forensics team. Your friend Barry enrolled in the Intrusion Analyst Skill Development Program and is touring with that team, so you message him to get the scoop. Barry tells you that a bunch of hard drives came back with the equipment, but most appear to be securely wiped. He managed to find a drive containing what might be some backups that they forgot to destroy, though he doesn’t immediately recognize the data. Eager to help, you ask him to send you a zip containing a copy of the supposed backup files so that you can take a look at it.

If we could recover files from the drives, it might tell us what the APT is up to. Provide a list of unique SHA256 hashes of all files you were able to find from the backups. Example (2 unique hashes):

471dce655395b5b971650ca2d9494a37468b1d4cb7b3569c200073d3b384c5a4
0122c70e2f7e9cbfca3b5a02682c96edb123a2c2ba780a385b54d0440f27a1f6

Downloads

disk backups (archive.tar.bz2)

Prompt

Provide your list of SHA256 hashes

Solution

Upon checking, it looks like we are given ZFS Snapshots, and it looks like we need to restore the images chronologically to get the unique files.

We then transferred to ubuntu which does natively supports zfs.

We first create the disk, then create the pool, then create the dataset.

We then import the starting backup.

We then create a folder which we will put the files for every backup.

Copy the current contents of the pool on the created folder.

Just repeat the process: import a backup, make a folder where contents will be copied, then copy the content, and then repeat.

I know, this is a tedious process because the backups are not labeled in order (or maybe I missed a clue on it). Also this can be automated, but I chose the hard way.

After importing all and extracting each backup content, we can now proceed to next step.

We then recursively get sha256 from the files.

Then pipe them to sort and uniq.

Submitted these and viola! Task 2 is done!

[NSA2024] Task 1 – No Token Left Behind – (File Forensics)

Disclaimer

This blog post is a part of NSA Codebreaker 2024 writeup.

The challenge content is a PURELY FICTIONAL SCENARIO created by the NSA for EDUCATIONAL PURPOSES only. The mention and use of any actual products, tools, and techniques are similarly contrived for the sake of the challenge alone, and do not represent the intent of any company, product owner, or standards body.

Any similarities to real persons, entities, or events is coincidental.

Synopsis

Aaliyah is showing you how Intelligence Analysts work. She pulls up a piece of intelligence she thought was interesting. It shows that APTs are interested in acquiring hardware tokens used for accessing DIB networks. Those are generally controlled items, how could the APT get a hold of one of those?

DoD sometimes sends copies of procurement records for controlled items to the NSA for analysis. Aaliyah pulls up the records but realizes it’s in a file format she’s not familiar with. Can you help her look for anything suspicious?

If DIB companies are being actively targeted by an adversary the NSA needs to know about it so they can help mitigate the threat.

Help Aaliyah determine the outlying activity in the dataset given

Downloads

DoD procurement records (shipping.db)

Prompt

Provide the order id associated with the order most likely to be fraudulent.

Solution

Upon inspecting the file, it ends with .db file and doesn’t really much make sense.
So what I did was to check the correct mimetype so I can properly determine the right tool for it.

Upon checking some documentation and other references, I think it is an .ods file.

References:
https://stackoverflow.com/questions/31489757/what-is-correct-mimetype-with-apache-openoffice-files-like-odt-ods-odp
https://www.iana.org/assignments/media-types/application/vnd.oasis.opendocument.spreadsheet

Upon opening it with spreadsheet, we are greeted by gigantic dataset.

Our goal is to find something suspicious, so I tried to arrange them to find an outlier in the dataset.

So what I did was to arrange them accordingly and manually checked for outlier. And there we go! We spotted it.

I submitted the order id and viola! Task 1 is done!

Remember this information as we will be needing this in the later task: jasper_05376

[NSA2024] NSA Codebreaker 2024

Hi everyone! I recently participated in the NSA Codebreaker Challenge 2024, which had over 6,693 participants. I’m proud to share that I was one of 30 people who managed to complete all the tasks!

Without further ado, I’ll be sharing my experience, the challenge binaries, and detailed write-ups.

This blog post is divided into seven parts:
Task 1 – No Token Left Behind – (File Forensics)
Task 2 – Driving Me Crazy – (Forensics, DevOps)
Task 3 – How did they get in? – (Reverse Engineering, Vulnerability Research)
Task 4 – LLMs never lie – (Programming, Forensics)
Task 5 – The #153 – (Reverse Engineering, Cryptography)
Task 6 – It’s always DNS – (Reverse Engineering, Cryptography, Vulnerability Research, Exploitation)
Task 7 – Location (un)compromised – (Vulnerability Research, Exploitation, Reverse Engineering)

I hope you enjoy reading!

Disclaimer

The challenge content is a PURELY FICTIONAL SCENARIO created by the NSA for EDUCATIONAL PURPOSES only. The mention and use of any actual products, tools, and techniques are similarly contrived for the sake of the challenge alone, and do not represent the intent of any company, product owner, or standards body.

Any similarities to real persons, entities, or events is coincidental.

Background

Foreign adversaries have long strived to gain an advantage against the might of the United States Armed Forces. While matching the USA on the battlefield is a costly and risky proposition, our adversaries are always looking for ways to balance the playing field. A serious and real threat is the infiltration and sabotage of military operations before the fight even breaks out.

Fortunately, the NSA is always recruiting bright young individuals to help protect our country! In fact, a bunch of your friends graduated last year and have been busy at work in their Developmental Programs.

You have returned to NSA on your final Cooperative Education tour and are visiting your friend Aaliyah who is currently employed full-time in the Intelligence Analysis Development Program. Intelligence Analysts are always scouring through collected Signals Intelligence (SIGINT) for threat indicators. Aaliyah recently attended a briefing that highlighted Nation-State Advanced Persistent Threats (APT) targeting our Defense Industrial Base (DIB) contractors.