Challenge Overview

CTF: TAMU CTF 2025 (gigem) Category: Misc / Rev Challenge: pittrap

The challenge gave us an ONNX neural network model with no source, no architecture diagram, and one job: find the input that makes it output something specific. Classic black-box optimization problem.

Initial Recon

First thing — inspect what we’re working with:

import onnxruntime as ort
import numpy as np

sess = ort.InferenceSession("pittrap.onnx")

# check input/output shape
for inp in sess.get_inputs():
    print(inp.name, inp.shape, inp.type)

for out in sess.get_outputs():
    print(out.name, out.shape, out.type)

Output told us the expected input shape and the output format. The model takes a fixed-length float vector and outputs a score. Goal: maximize (or hit a threshold on) that score.

No gradients available — it’s ONNX, inference only. No white-box access.

The Approach: Simulated Annealing

Since we can’t backpropagate, we treat the model as a pure oracle: feed it inputs, read the output score, iterate.

Simulated annealing is perfect here:

  • Explores the input space without getting trapped in local minima
  • Accepts worse solutions with decreasing probability over time
  • No gradient required
import onnxruntime as ort
import numpy as np
import math
import random

sess = ort.InferenceSession("pittrap.onnx")
input_name = sess.get_inputs()[0].name
input_shape = sess.get_inputs()[0].shape  # e.g. [1, N]

def score(x):
    x_input = x.reshape(input_shape).astype(np.float32)
    result = sess.run(None, {input_name: x_input})
    return float(result[0].flatten()[0])

# Initialize random candidate
n = input_shape[-1]
current = np.random.uniform(-1, 1, n)
current_score = score(current)

best = current.copy()
best_score = current_score

# Annealing schedule
T = 1.0
T_min = 1e-5
alpha = 0.995
step_size = 0.1

iteration = 0
while T > T_min:
    # Perturb candidate
    neighbor = current + np.random.randn(n) * step_size
    neighbor_score = score(neighbor)

    delta = neighbor_score - current_score

    # Accept if better, or probabilistically if worse
    if delta > 0 or random.random() < math.exp(delta / T):
        current = neighbor
        current_score = neighbor_score

    if current_score > best_score:
        best = current.copy()
        best_score = current_score
        print(f"[+] New best: {best_score:.6f} @ iter {iteration}")

    T *= alpha
    iteration += 1

print(f"\n[*] Final score: {best_score}")
print(f"[*] Best input: {best}")

Getting the Flag

Once the score crossed the threshold the model was checking for, the challenge server returned the flag. The model was essentially a learned classifier — it was trained to recognize a specific “correct” input pattern, and SA found it through pure exploration.

Key Takeaways

  • ONNX inference-only models are black boxes — treat them like APIs, not source code
  • Simulated annealing beats random search significantly for continuous input spaces
  • Cooling schedule matters — too fast and you get stuck, too slow and you waste time
  • Start with a wide step_size and narrow it as temperature drops for better convergence

— 0xAdham | RootRunners