What is this?
An independent leaderboard for AI coding agents, built on Harbor and the Terminal-Bench 2.0 dataset. All evaluations are public, reproducible, and tamper-proof.
Why a separate leaderboard?
· Fully open-source — all workflows, code, and configs are public.
· Traceable — every run executes via public GitHub Actions with full history.
· Tamper-proof — results integrity-checked, agent builds publicly downloadable.