Band of Agents · Track 2

Code doesn't merge until the jury reaches quorum.

A multi-agent coding system where reviewers from four model families read every diff and must agree before the change lands. A single big model is cheap but unchecked; a swarm is checked but costly. YouTopia is the third option.

$ youtopia run "add a slugify helper with tests" --auto
8
agents through Band
4
model families on the jury
~34%
cheaper per task
α gpt-5-nano β gemini-3.1 γ deepseek-v4 σ qwen3.5 JUDGE
The problem

Two ways to build with agents. Both leave something on the table.

cheap · unchecked

The strong lone agent

One model plans and codes. Token-efficient, fast — but there is no independent check. A confident mistake ships.

vs.
checked · expensive

The orchestrator swarm

Many agents collaborate, but nobody controls their cost or accuracy. Adversarial review, where it exists, is limited to two frameworks.

YouTopia combines a tightly-controlled editing brain with a cross-family adversarial jury — generalizing two-framework review to N model families, and recruiting more jurors only on disagreement or high risk.

The core idea

An adaptive jury, not a panel that always pays for four.

Reviewers read only the diff on a cheap model. The expensive model runs once for the Coder, and again only when the jury asks for changes. Extra seats convene only where it matters.

Two reviewers open

Two jurors from different families read the diff and vote. Unanimous and low-risk? Merge on quorum.

Escalate on disagreement

Split verdict or high risk convenes a third family. A fourth open-source seat joins only if round two still splits.

Judge reconciles

When no new jurors remain and consensus hasn't formed, a Judge weighs the findings and casts the deciding vote.

Recode against objections

A rejection must name the defect. The Judge compiles the findings and feeds them to the Coder — never a blind regenerate.

Tests gate the merge

Quorum is necessary but not sufficient. The Tester writes a pytest gate; a real failing assertion blocks the merge.

Real per-token ledger

Every call is costed against list prices in a live ledger. No estimation — the benchmark totals are measured spend.

The evidence

Head-to-head vs the single big model. Same work.

Same 5 SWE-bench Lite tasks, same AI/ML API provider, same real per-token prices. The adaptive jury costs a third less per task than letting one big model plan and code alone.

Resolved
Avg cost / task
Total spend
OURS Adaptive jury cheap flash coder + 4-family jury
0/5
$0.0245
$0.123
BASELINE Single big model deepseek-v4-pro plans & codes alone
0/5
$0.0369
$0.185
Adaptive jury
$0.0245
Single big model
$0.0369
−34% cost per task vs the single-big-model baseline — same accuracy, same tasks, same grading.
The flow

Every handoff happens through Band.

Rooms, @mentions, shared memory. Band isn't a final notification — it's the collaboration layer between every stage.

Plan

reason over the task, pick files

Code

emit files on a feature branch

Jury review

diff-only, cross-family

Test

pytest gate must pass

Merge

only on quorum + tests

The roster

Eight roles, four families, one room.

The jury deliberately spans distinct model families so a disagreement catches a different failure mode, not the same blind spot twice.

Planner
DeepSeek V4 Pro
deepseek · big
Reasons over the whole task, chooses files, orders the work.
Coder
DeepSeek V4 Flash
deepseek · flash
Applies the plan. Runs once on the happy path, again only on jury feedback.
Reviewer α
GPT-5 Nano
gpt · reasoning
Correctness and edge cases.
Reviewer β
Gemini 3.1 Flash
gemini · flash-lite
API misuse and contract violations.
Reviewer γ
DeepSeek V4 Flash
deepseek · recruited
Security and performance. Convenes on disagreement.
Reviewer σ
Qwen 3.5 Flash
qwen · open seat
Idioms and maintainability. Joins in round two splits.
Judge
GPT-5 Nano
gpt · reasoning
Reconciles split verdicts and casts the deciding vote.
Tester
Gemini 3.1 Flash
gemini · flash-lite
Writes the pytest regression gate.
gpt gemini deepseek qwen (open-source seat)
Built on

The collaboration layer, the models, and an open seat.

Band
Rooms, @mentions, shared memory — the actual collaboration layer between every agent.
HOST PLATFORM
AI/ML API
All four jury families (gpt, gemini, deepseek, qwen) through one OpenAI-compatible endpoint.
PARTNER PRIZE — AI/ML API
Featherless
The open-source fourth seat — recruited in deep round-two splits, serverless inference.
PARTNER PRIZE — FEATHERLESS
Zero keys required

Run the full jury in one command.

The hosted demo injects the project's AI/ML API key and runs Band in-process — clone, one install, run. No accounts, no keys of your own.

$ git clone https://github.com/ayushhroyy/YouTopia-Build && cd YouTopia-Build && ./setup.sh