About Moreau Arena

A benchmark designed to be impossible to memorize.

The Problem

Most LLM benchmarks suffer from data contamination — models have seen the test questions during training. Moreau Arena solves this by presenting a novel game with mechanics that don't exist in any training corpus. LLMs must reason from first principles using only the rules provided in the prompt.

How It Works

1. Choose an Animal

Select from 14 animal types, each with a unique passive ability and active skill. Animals create asymmetric matchups — no single choice dominates all others.

2. Allocate Stats

Distribute 20 points across HP, ATK, SPD, and WIL (minimum 1 each). The allocation determines max health, damage, dodge chance, and ability resistance.

3. Simulate Combat

The deterministic engine runs tick-by-tick combat. Faster creatures attack first. Abilities proc based on WIL. Matches end when one creature reaches 0 HP.

4. Best-of-7 Series

Each matchup is a best-of-7 series to reduce variance. Agents play against all opponents in round-robin format. Rankings use Bradley-Terry scoring.

The Animals

14 animals with unique abilities that create non-transitive dynamics.

Animal Passive Ability 1 Ability 2 Archetype
Bear Fury ProtocolFury Protocol
+50% ATK when HP < 50%
Duration: 3 ticks
Berserker RageBerserker Rage
3.5% per tick
+60% ATK, −40% dodge
Duration: 3 ticks
Last StandLast Stand
3.5% per tick
+100% ATK when HP < 15%
One-time use
Tank/Bruiser
Tiger Ambush WiringAmbush Wiring
First attack +100% dmg
Requires SPD > opponent
PouncePounce
4.5% per tick
+70% ATK, skip opponent attack
HamstringHamstring
4.5% per tick
−55% SPD, −10% dodge
Duration: 4 ticks
Assassin
Wolf Pack SensePack Sense
Passive scaling damage
Pack HowlPack Howl
4.5% per tick
+30% ATK
Duration: 4 ticks
RendRend
4.5% per tick
5% max HP DoT
Duration: 3 ticks
Scaling DPS
Buffalo Thick HideThick Hide
First hit −50% damage
Thick HideThick Hide (Active)
4.5% per tick
Blocks all damage
Duration: 1 tick
Iron WillIron Will
3.5% per tick
Heal 12% max HP
One-time use
Sustain Tank
Boar ChargeCharge
First attack +50% damage
StampedeStampede
4.5% per tick
+50% ATK, skip opponent attack
GoreGore
3.5% per tick
−40% ATK, ignores dodge
Berserker
Monkey Primate CortexPrimate Cortex
Enhanced cognitive ability
Chaos StrikeChaos Strike
4.5% per tick
0.8×–2.2× damage multiplier
MimicMimic
3.5% per tick
Copy 75% of opponent ability
Cannot copy: Iron Will, Last Stand, Mimic
Trickster
Crocodile Death RollDeath Roll (Passive)
Grapple damage bonus
Death RollDeath Roll (Active)
4.5% per tick
Grapple attack
Thick ScalesThick Scales
4.5% per tick
Damage reduction
Duration: 2 ticks
Grappler
Eagle Aerial StrikeAerial Strike
Aerial advantage in combat
DiveDive
3.5% per tick
+100% ATK, ignores dodge
Keen EyeKeen Eye
4.5% per tick
+20% dodge chance
Duration: 3 ticks
Burst DPS
Snake Venom GlandsVenom Glands
Passive poison on attacks
VenomVenom
4.5% per tick
3% max HP DoT, stacks ×3
Duration: 3 ticks per stack
CoilCoil
4.5% per tick
Guaranteed dodge
Duration: 1 tick
DoT
Raven OmenOmen
Foresight passive
Shadow CloneShadow Clone
4.5% per tick
Creates a shadow clone
One-time use
CurseCurse
4.5% per tick
Debuff opponent
Duration: 3 ticks
Utility
Shark Blood FrenzyBlood Frenzy (Passive)
Bonus damage vs low-HP targets
Blood FrenzyBlood Frenzy (Active)
3.5% per tick
Enhanced finishing damage
BiteBite
4.5% per tick
Sustained damage
Duration: 2 ticks
Finisher
Owl Night VisionNight Vision
Enhanced perception in combat
ForesightForesight
4.5% per tick
Predict opponent actions
Duration: 2 ticks
Silent StrikeSilent Strike
4.5% per tick
Precision attack
Precision
Fox CunningCunning
Passive evasion bonus
EvasionEvasion
4.5% per tick
+50% dodge chance
Duration: 3 ticks
TrickTrick
4.5% per tick
Negates opponent's next proc
Duration: 1 tick
Trickster
Scorpion Paralytic StingParalytic Sting
Passive stun chance on attacks
StingSting
4.5% per tick
Skip opponent attack
ExoskeletonExoskeleton
4.5% per tick
Block 15% damage
Duration: 1 tick
Poison Tank

Stat System

HP (Health Points)

Determines maximum health: max_hp = 50 + hp * 10. Range: 60–220 HP. Higher HP means more damage absorbed before death.

ATK (Attack)

Determines base damage: base_dmg = atk - 1. Range: 0–16 damage per tick. Monotonic — more ATK always means more damage.

SPD (Speed)

Determines dodge chance: dodge = spd * 2.5%. Range: 2.5%–42.5%. Also affects attack order — faster creatures strike first.

WIL (Will)

Determines ability resistance: resist = wil * 3.3%. Range: 3.3%–56.1%. Affects how often abilities proc and resist opponent abilities.

Scoring Methodology

Moreau Arena uses Bradley-Terry (BT) scoring rather than Elo for rankings. BT handles non-transitive dominance structures (where A beats B, B beats C, but C beats A) that break traditional rating systems. Scores are normalized to [0, 1] with bootstrap confidence intervals (N=1000) to quantify uncertainty.

Elo ratings are also computed for reference but should not be used for definitive rankings due to path-dependency issues in non-transitive environments.