Tournament Protocols

Two completed tracks, two planned — each isolating a different LLM capability.

4
Defined Tracks
2
Completed
8
LLMs Tested
5
Baselines
Track A
T001 — Minimal Prompt

Protocol

Each agent receives the game rules and must produce a single fixed build. The prompt provides basic mechanics but no formulas, no meta-information, and no structured output format.

What It Tests

Raw strategic reasoning from minimal information. Can the LLM infer good stat allocations from qualitative descriptions alone?

Key Result

LLM average win rate: 37.5%
Baselines outperform most LLMs. SmartAgent (#1) dominates.

vs
Track B
T002 — Engineered Prompt

Protocol

Same game, same simulator, but the prompt includes: exact damage/HP formulas, meta-information about matchup archetypes, structured JSON output format, and adaptation between games in a series.

What It Tests

Whether LLMs can leverage explicit formulas and structured guidance to make near-optimal decisions.

Key Result

LLM average win rate: 89.75%
LLMs dominate baselines. gpt-5.2-codex (#1) at BT 1.0.

What Changed Between T001 and T002

Component T001 T002
Damage Formulas "Higher ATK deals more damage" base_dmg = atk - 1
HP Formula "More HP means more health" max_hp = 50 + hp * 10
Meta Information None Archetype descriptions, matchup tips
Output Format Free text Structured JSON
Adaptation None (fixed build) Can adapt between games

Planned Tracks

Track C — Meta-Conditioned

Agents receive information about the current meta (popular builds, dominant strategies) and must adapt. Tests whether LLMs can reason about population-level dynamics and find counter-strategies.

PLANNED

Track D — Tool-Augmented

Agents can call the simulator as a tool to test builds before committing. Tests whether LLMs can effectively use computational tools for hypothesis-driven optimization.

PLANNED

Ablation Study

Which T002 components matter most? Five variants isolate individual contributions.

Variant Description Components Included
formulas-only Add exact formulas to T001 +formulas
meta-only Add meta-info to T001 +meta
adaptation-only Add between-game adaptation +adapt
structured-output-only Add JSON output format +json
formulas-no-meta T002 minus meta-info +formulas +json +adapt