YouTopia Build — code doesn't merge until the jury reaches quorum

The problem

Two ways to build with agents. Both leave something on the table.

cheap · unchecked

The strong lone agent

One model plans and codes. Token-efficient, fast — but there is no independent check. A confident mistake ships.

vs.

checked · expensive

The orchestrator swarm

Many agents collaborate, but nobody controls their cost or accuracy. Adversarial review, where it exists, is limited to two frameworks.

YouTopia combines a tightly-controlled editing brain with a cross-family adversarial jury — generalizing two-framework review to N model families, and recruiting more jurors only on disagreement or high risk.

The core idea

An adaptive jury, not a panel that always pays for four.

Reviewers read only the diff on a cheap model. The expensive model runs once for the Coder, and again only when the jury asks for changes. Extra seats convene only where it matters.

Two reviewers open

Two jurors from different families read the diff and vote. Unanimous and low-risk? Merge on quorum.

Escalate on disagreement

Split verdict or high risk convenes a third family. A fourth open-source seat joins only if round two still splits.

Judge reconciles

When no new jurors remain and consensus hasn't formed, a Judge weighs the findings and casts the deciding vote.

Recode against objections

A rejection must name the defect. The Judge compiles the findings and feeds them to the Coder — never a blind regenerate.

Tests gate the merge

Quorum is necessary but not sufficient. The Tester writes a pytest gate; a real failing assertion blocks the merge.

Real per-token ledger

Every call is costed against list prices in a live ledger. No estimation — the benchmark totals are measured spend.

The evidence

Head-to-head vs the single big model. Same work.

Same 5 SWE-bench Lite tasks, same AI/ML API provider, same real per-token prices. The adaptive jury costs a third less per task than letting one big model plan and code alone.

Resolved

Avg cost / task

Total spend

OURS Adaptive jury cheap flash coder + 4-family jury

0/5

$0.0245

$0.123

BASELINE Single big model deepseek-v4-pro plans & codes alone

0/5

$0.0369

$0.185

Adaptive jury

$0.0245

Single big model

$0.0369

−34% cost per task vs the single-big-model baseline — same accuracy, same tasks, same grading.

The flow

Every handoff happens through Band.

Rooms, @mentions, shared memory. Band isn't a final notification — it's the collaboration layer between every stage.

Plan

reason over the task, pick files

Code

emit files on a feature branch

Jury review

diff-only, cross-family

Test

pytest gate must pass

Merge

only on quorum + tests

The roster

Eight roles, four families, one room.

The jury deliberately spans distinct model families so a disagreement catches a different failure mode, not the same blind spot twice.

Planner

DeepSeek V4 Pro

deepseek · big

Reasons over the whole task, chooses files, orders the work.

Coder

DeepSeek V4 Flash

deepseek · flash

Applies the plan. Runs once on the happy path, again only on jury feedback.

Reviewer α

GPT-5 Nano

gpt · reasoning

Correctness and edge cases.

Reviewer β

Gemini 3.1 Flash

gemini · flash-lite

API misuse and contract violations.

Reviewer γ

DeepSeek V4 Flash

deepseek · recruited

Security and performance. Convenes on disagreement.

Reviewer σ

Qwen 3.5 Flash

qwen · open seat

Idioms and maintainability. Joins in round two splits.

Judge

GPT-5 Nano

gpt · reasoning

Reconciles split verdicts and casts the deciding vote.

Tester

Gemini 3.1 Flash

gemini · flash-lite

Writes the pytest regression gate.

gpt gemini deepseek qwen (open-source seat)

Built on

The collaboration layer, the models, and an open seat.

Band

Rooms, @mentions, shared memory — the actual collaboration layer between every agent.

HOST PLATFORM

AI/ML API

All four jury families (gpt, gemini, deepseek, qwen) through one OpenAI-compatible endpoint.

PARTNER PRIZE — AI/ML API

Featherless

The open-source fourth seat — recruited in deep round-two splits, serverless inference.

PARTNER PRIZE — FEATHERLESS

Zero keys required

Run the full jury in one command.

The hosted demo injects the project's AI/ML API key and runs Band in-process — clone, one install, run. No accounts, no keys of your own.

$ git clone https://github.com/ayushhroyy/YouTopia-Build && cd YouTopia-Build && ./setup.sh

View on GitHub Read the design