⚠ Facilitator material - spoilers. This page reveals planted bugs, answers and scoring. Don't show it to participants. All facilitator docs →

Scoring Rubric - Spec-First Build

The deliberate message in the maths: “it works” is worth 6 points. The human review gate is worth 12. The test pyramid is worth 14. Doing it spec-first is worth 10. A team whose endpoint barely runs but is specified, contract-tested, and signed by a named non-author beats a team with a slick demo and no rigor. That ranking is the entire teaching point - say it before you score.

Scored per team, out of 100. Every line maps to the Definition of Done and a corporate-standards ID. Facilitators score live during the demo showdown; witness slips from Gate 2 settle the review points.


The scorecard

#CriterionWhat earns itCite(s)Max
1Spec-first order observedAcceptance criteria + frozen contract existed before any feature code (Gate 1 initialled).QUAL-PRINC-SHIFT-LEFT, Way #110
2Machine-readable contractOpenAPI is real and in sync: $ref-ed response, shared error envelope, explicit version, consistent conventions.REQ-API-2, REQ-API-4, REQ-API-510
3Idempotency reasoning in ADRADR explains why GET is idempotent and what would change if it became a write.REQ-API-64
4ADR qualityCaptures the decision and the why - a real trade-off, durable, simplest-thing-that-works.ENG-PRIN-DOC-STMT, ENG-PRIN-SIMPLE-STMT6
5Test pyramidFirst commit = stub + failing contract test; behaviours tested at the lowest viable layer; no duplicated coverage; right-shaped pyramid.QTEST-EARLY, QTEST-NO-DUP, REQ-API-714
6Human review gateA named non-author approved against the DoD; AI assisted not approved; no self-merge.ENG-PRIN-REVIEW-STMT12
7Cross-role handoffBuild visibly consumed Product’s criteria, Architecture’s ADR, QA’s tests - quality as a team property.QUAL-PRINC-TEAM10
8UX statesLoading / empty / error / populated all handled; error state reads the contract’s Problem envelope.QUAL-PRINC-SDLC, REQ-API-56
9Endpoint actually worksThe happy path returns the agreed shape, live.REQ-API-26
10Standards technique spottingFacilitator-observed standards moves during build (see table below), 2 pts each.(various)8
11”Most Standards-Aligned” (audience)Top-voted team after demos.-5
12Sabotage survivalSurvived the delivered card the right way (re-assigned reviewer / updated contract+test).ENG-PRIN-REVIEW-STMT, REQ-API-2, QTEST-NO-DUP3
TOTAL94 + 6 bonus

Where the weight sits - on purpose: Review (12) + Tests (14) + Spec-first (10) = 36 points for rigor. “Endpoint actually works” = 6. Rigor outscores results 6-to-1. A team cannot demo their way to the top; they have to earn it through the process.

Scoring notes

  • Gate is a multiplier on shipping, not just points. A team that never passes Gate 2 (no named human approval) forfeits criterion 6 (12 pts) and cannot be ranked as “shipped” - they’re scored on spec/test merit only. This is ENG-PRIN-REVIEW-STMT made literal.
  • No contract, capped low. If criterion 2 scores 0 (no machine-readable contract), cap criterion 9 at 0 too - you can’t have “it works to spec” with no spec.
  • Partial credit is fine. A right-shaped pyramid with one missing layer still earns most of criterion 5. Award against the DoD MUST/SHOULD bands.

Standards technique spotting (criterion 10)

Watch the build. When you see one, call it out loudly: “Team B just wrote the failing contract test before any code - 2 points!” Max 4 techniques scored per team (8 pts).

TechniqueWhat to look forCite
Contract-firstWrites the OpenAPI $ref schema before touching implementationREQ-API-2, Way #1
Red-firstCommits a failing contract test as commit #1QTEST-CONTRACT, ENG-PRIN-INCR-STMT
Lowest-layer testPushes a rule down to a unit test instead of an E2EQTEST-EARLY
No-dup disciplineRefuses to re-assert the same rule at three layersQTEST-NO-DUP, QUAL-PRINC-WASTE
Idempotency call-outNames idempotency / Idempotency-Key in the ADR unpromptedREQ-API-6
Envelope reuseReuses Problem / Lesson rather than copying a shapeREQ-API-5
AI-assisted, human-signedUses AI to review the diff, then a human types the approvalENG-PRIN-REVIEW-STMT, Way #2
Wrote down the whyCaptures a real trade-off in the ADR, not just the choiceENG-PRIN-DOC-STMT, Way #5

Leaderboard template

Project this. Update live and announce every change - don’t move numbers silently.

TeamStack / Substrate1. Spec-first /102. Contract /103. Idemp. /44. ADR /65. Tests /146. Review /127. Handoff /108. UX /69. Works /610. Techniques /811. Audience /512. Sabotage /3TOTAL
Team 1
Team 2
Team 3

Spot prizes (award on the day):

  • Most Standards-Aligned - audience vote (criterion 11)
  • Best ADR - clearest why (criterion 4)
  • Greenest Pyramid - best-shaped test suite (criterion 5)
  • Cleanest Gate 2 - most rigorous human review against the DoD (criterion 6)

Update after: Gate 1 (mark spec-first locked), demos (criteria 1–10), audience vote (11), sabotage resolution (12). Make the rigor categories the dramatic ones - that’s where the lesson lands.


Part of Innovation Day. The five ways: ways-of-working.md. The bar being scored: definition-of-done.md. How to run it: facilitator-guide.md.