Facilitator Guide - Spec-First Build (2 hours)
The pitch in one line: “Same 501 endpoint as last time - completely different rigor. We’re not racing to make it work. We’re racing to make it right, on the record, and signed by a human.”
This is the technical session of Innovation Day. Three cross-functional teams of ~5 (Dev, Architecture, QA, Product, UI/UX) take a feature from spec → OpenAPI contract → code → six-layer tests → human review gate. Two hard gates are graded above everything else: the contract is frozen before any feature code, and a named non-author human approves the merge against the Definition of Done.
The five rules of the road are the same as the Skill session - see The 5 Ways We Work With AI. This session lives and dies on Way #1: Describe before you generate (the contract) and Way #2: A human always signs the work (the review gate). Tool-agnostic where stated: any AI assistant (Claude, Copilot, Cursor, Gemini) and either stack.
You need:
- 1 lead facilitator (runs the room, holds the gate pen, scores)
- 1–2 roaming facilitators (unblock teams, initial contracts at Gate 1, witness Gate 2 approvals, deliver sabotage cards)
Companion docs, read them before the room: Definition of Done · Scoring Rubric · OpenAPI contract.
Pre-Session Checklist (30 min before)
- Leaderboard projected (use the template in scoring-rubric.md)
- Definition of Done printed, one per person - it is the grading sheet and the takeaway
- Scoring rubric printed for facilitators only
- Gate pen + initials stamp ready - Gate 1 is physical: a facilitator initials the contract
-
openapi.yamlopen on the big screen, scrolled to/compare(the worked example) - Substrate cards (a / b / c) at each table - one scaffold, one DoD, one rubric across all three
- Stack choice noted per team (Next.js or Spring Boot)
- Both stacks verified: Next.js on :3099; Spring Boot on :8080 + :3099
- Two sabotage cards printed on bright stock (see below)
- Countdown timer projected
- Gong/bell tested
- Monday-commitment cards + pens at each seat
- Wi-Fi + repo URL on screen
- Power strips at every table
- Synthetic-data-only reminder on screen (Way #3) - no real client data in any prompt
The Headline Arc
SPEC ──► CONTRACT ──► CODE ──► 6-LAYER TESTS ──► HUMAN REVIEW GATE
(words) (OpenAPI, (stub + (test at the (named non-author
frozen at failing lowest viable approves vs DoD;
Gate 1) test 1st) layer) AI assists,
never approves)
Print this. Say it out loud at 0:00 and again at every phase change. Every team that ships should be able to point at where they are on this line.
Minute-by-Minute (total 120)
| Clock | Phase | Min | What happens |
|---|---|---|---|
| 0:00 | Intro | 10 | Pitch the arc. “Same 501, different rigor.” Walk the diagram. Name the two gates and that they outscore “it works”. Point at the DoD on every desk. |
| 0:10 | Team formation + role claim + substrate & stack | 10 | Form 3 teams of ~5. Each person claims a role (Dev, Arch, QA, Product, UI/UX). Team picks a substrate (a/b/c) and a stack. Write both on the leaderboard. |
| 0:20 | SPEC PHASE | 15 | Product writes acceptance criteria; Architecture drafts the ADR (incl. idempotency reasoning); the team authors the OpenAPI response schema. No feature code. Ends at GATE 1. |
| 0:35 | 🚦 GATE 1 | - | A facilitator initials the OpenAPI contract at the table. No initials = no implementation. (Worked into the 15+30; budget ~2 min of roaming per team.) |
| 0:35 | BUILD | 30 | Dev’s first commit = stub + a failing contract test (red), then implement to green. Small commits (Way #4). QA stacks the test pyramid. UI/UX builds the states. |
| 1:05 | 🚦 REVIEW & MERGE GATE | 15 | GATE 2. A named non-author human approves the PR against the DoD. AI may assist the review; AI may not approve. No approval = not merged = not scored as shipped. |
| 1:20 | Demo showdown | 15 | 4 min/team. Contract on screen FIRST, then the failing-then-passing test, then the UI. Gong at cutoff. |
| 1:35 | Standards pub quiz | 7 | 7 questions on the standards techniques (see rubric’s “technique spotting”). |
| 1:42 | Leaderboard + spot prizes | 6 | Tally live. Award “Most Standards-Aligned” (audience vote) + spot prizes. Build tension. |
| 1:48 | Monday commitments + blockers + close | 12 | Each person writes one thing they’ll do Monday. Capture blockers on the board. Close on the arc. Photo. |
| 2:00 | END |
Why the gates sit where they do: Gate 1 makes Way #1 physical - you cannot generate before you’ve described. Gate 2 makes Way #2 physical - nothing ships unread, and the approver is a different human than the author (
ENG-PRIN-REVIEW-STMT, no self-merge).
The Two Gates - Run Them Like This
🚦 Gate 1 - Contract Freeze (end of Spec Phase, ~0:35)
The facilitator walks to each table and reads the contract aloud against four checks. Initial it only if all four pass.
| Check | Standard | Pass looks like |
|---|---|---|
/recommend 200 response is a real $ref, not a TODO | REQ-API-2, REQ-API-5 | RecommendResponse defined under components/schemas, reusing Lesson/DimensionScore |
| Versioning + error envelope present | REQ-API-4, REQ-API-5 | info.version set; errors use the shared Problem (problem+json) |
| Idempotency reasoned, not just stated | REQ-API-6 | ADR says why GET is safe/idempotent and what would change if it became a write |
| Acceptance criteria written before code | QUAL-PRINC-SHIFT-LEFT | Product’s criteria exist on paper/screen; no feature commits yet |
Once initialled, the contract is frozen. Changing it later costs a re-initial - and the sabotage card may force exactly that.
🚦 Gate 2 - Review & Merge (1:05–1:20)
The single most-weighted moment in the day. Run it strictly.
- Author opens the PR/diff. They do not approve their own work.
- A named non-author on the team reviews against the printed DoD, ticking MUST items.
- AI may assist (summarise the diff, suggest missing tests, spot a guess) - the human types the approval and owns it (
ENG-PRIN-REVIEW-STMT,ENG-PRIN-OWN-STMT). - A roaming facilitator witnesses and records: approver name, MUST items met, merged y/n.
No approval, no ship. A working endpoint with no human approval scores the test/spec points but forfeits the 12 review points and cannot win “shipped”. Say this at the start so no one is surprised.
Sabotage Cards (standards-flavoured)
Print on bright stock. Deliver one per team during BUILD or REVIEW, timed to bite. These are not random chaos - each one tests a specific standard.
Card A - “Reviewer is out”
+-----------------------------------------------------------+
| SABOTAGE CARD - THE REVIEWER IS OUT |
| |
| Your designated reviewer just got pulled into an |
| incident. You may NOT self-merge. |
| (ENG-PRIN-REVIEW-STMT: no change reaches trunk without |
| review by another human.) |
| |
| Find a different named non-author to approve against |
| the DoD - or the work does not merge. |
| |
| Survive it: +3 BONUS POINTS |
+-----------------------------------------------------------+
Tests: ENG-PRIN-REVIEW-STMT. The wrong move is to merge unreviewed “to save time”. The right move is to re-assign the reviewer.
Card B - “The criterion changed”
+-----------------------------------------------------------+
| SABOTAGE CARD - ACCEPTANCE CRITERION CHANGED |
| |
| Product just changed an acceptance criterion |
| (e.g. "top 5" becomes "top 3, exclude completed"). |
| |
| Update the CONTRACT *and* its CONTRACT TEST before you |
| merge. Spec, code, and test must agree. |
| (REQ-API-2 keep contract in sync; REQ-API-7 contract |
| testing; QTEST-NO-DUP - change it in ONE place.) |
| |
| Survive it: +3 BONUS POINTS |
+-----------------------------------------------------------+
Tests: REQ-API-2, REQ-API-7, QTEST-NO-DUP. The wrong move is to patch the code and leave the contract/test stale. The right move is contract-first, re-initial at Gate 1, update the failing test, then implement.
Demo Format (4 min/team, gong-enforced)
Order is mandatory - the contract goes up first. This is the whole point of the session.
- Contract (60s): Show
/recommendinopenapi.yaml. “Here’s the shape we froze at Gate 1.” Point at the$ref, the error envelope, the version. - Test pyramid (60s): Show the first commit (stub + failing contract test, red) and the same test passing (green). Name the layers you covered (unit / component / contract / system).
- The endpoint (60s): Hit it live. Show the real response matches the frozen schema.
- UI states (45s): Loading, empty, error, populated.
- The signature (15s): Name the approver and the DoD items they ticked. “Signed by [name], not the author.”
Facilitator scores on the spot using scoring-rubric.md. Gong at 4:00, no exceptions.
The Three Substrate Options
All three share one scaffold, one DoD, one rubric. Teams pick at 0:10. Let mixed-confidence teams take (a); let strong teams take (b) or (c).
| Substrate | What they build | Best for | |
|---|---|---|---|
| a | Guided rise-mini | Finish /compare and/or /recommend in the rise-mini repo. /compare is fully specified in openapi.yaml as the worked example; /recommend is deliberately partial - they author its schema first. | Teams who want rails. The contract is half-written; they complete it. |
| b | Greenfield from a written spec | A brand-new endpoint from a one-paragraph product spec the facilitator hands out. Same arc, blank repo. | Teams who want to own the contract end-to-end. |
| c | Bring-your-own | A real endpoint from the team’s own backlog (synthetic data only - Way #3). | Teams who want Monday-relevant output. |
The trick: the grading sheet doesn’t care which substrate you chose. Spec → contract → test → review applies identically. A team on (a) and a team on (c) are scored on the same DoD and rubric.
The worked example for everyone: /compare in openapi.yaml is fully specified - $ref-ed response, error envelope, idempotency note. /recommend is the TODO. Project /compare at 0:20 and say: “Make /recommend look like this - that’s Gate 1.”
If Running Behind
Protect the review gate above all else. Gate 2 is the point of the day. Cut from the bottom up:
| Priority | What to cut | Saves | Never cut |
|---|---|---|---|
| 1st | Drop quiz from 7 → 4 questions | ~3 min | - |
| 2nd | Demos to 3 min, top scorers only | ~3–4 min | - |
| 3rd | Skip spot prizes, keep leaderboard tally | ~3 min | - |
| 4th | Shorten BUILD by 5 min (announce a hard scope-cut: stub + one happy path) | ~5 min | - |
| NEVER | Gate 1 contract freeze · Gate 2 human review · Monday commitments | - | These are the session. |
If a team is drowning in BUILD, the rescue is scope down to one happy path so they still reach Gate 2 - a smaller thing, properly reviewed, beats a bigger thing unreviewed. That is the lesson.
Materials Checklist (hand-out)
- Definition of Done - one printed per person (grading sheet + takeaway)
- Substrate cards a / b / c - one set per table
- Greenfield spec paragraph (for substrate b) - printed
- The two sabotage cards - bright stock, 3 of each
- Gate pen / initials stamp (Gate 1)
- Gate 2 witness slip per team: approver name · MUST items met · merged y/n
- The arc diagram, projected
-
openapi.yamlopen on/compare - Monday-commitment cards + pens
- Blockers board
- Standards quiz loaded (Mentimeter/Slido)
- Leaderboard (scoring-rubric.md template)
Common Problems & Solutions
| Problem | Solution |
|---|---|
| ”We just want to start coding.” | Hold the line: no Gate 1 initials, no implementation. That’s ENG-PRIN-INCR-STMT + Way #1. |
| Dev writes the impl before the test | Reset to the first-commit rule: stub + failing contract test first. It’s worth 14 points; “it works” is worth 6. |
| Author wants to approve their own PR | Hard stop. ENG-PRIN-REVIEW-STMT - no self-merge. This is Sabotage Card A whether or not you handed it out. |
| Contract drifts from the code | That’s exactly what REQ-API-2 forbids. Point them at the contract test (REQ-API-7) - it should be failing. |
| AI “approved” the review | AI assists, never approves. A human types and owns the sign-off (Way #2, ENG-PRIN-OWN-STMT). |
| Team over-builds (3 abstractions “for later”) | Way #5 / ENG-PRIN-SIMPLE-STMT (KISS/YAGNI). Score the ADR down, reward the simplest thing that meets the contract. |
| Someone pastes real client data | Stop, synthetic only (Way #3). Substrate (c) teams especially. |
Part of Innovation Day. The five ways of working: ways-of-working.md. Definition of Done: definition-of-done.md. Scoring: scoring-rubric.md.