⚠ Facilitator material - spoilers. This page reveals planted bugs, answers and scoring. Don't show it to participants. All facilitator docs →

Facilitator Guide - Spec-First Build (2 hours)

The pitch in one line: “Same 501 endpoint as last time - completely different rigor. We’re not racing to make it work. We’re racing to make it right, on the record, and signed by a human.

This is the technical session of Innovation Day. Three cross-functional teams of ~5 (Dev, Architecture, QA, Product, UI/UX) take a feature from spec → OpenAPI contract → code → six-layer tests → human review gate. Two hard gates are graded above everything else: the contract is frozen before any feature code, and a named non-author human approves the merge against the Definition of Done.

The five rules of the road are the same as the Skill session - see The 5 Ways We Work With AI. This session lives and dies on Way #1: Describe before you generate (the contract) and Way #2: A human always signs the work (the review gate). Tool-agnostic where stated: any AI assistant (Claude, Copilot, Cursor, Gemini) and either stack.

You need:

  • 1 lead facilitator (runs the room, holds the gate pen, scores)
  • 1–2 roaming facilitators (unblock teams, initial contracts at Gate 1, witness Gate 2 approvals, deliver sabotage cards)

Companion docs, read them before the room: Definition of Done · Scoring Rubric · OpenAPI contract.


Pre-Session Checklist (30 min before)

  • Leaderboard projected (use the template in scoring-rubric.md)
  • Definition of Done printed, one per person - it is the grading sheet and the takeaway
  • Scoring rubric printed for facilitators only
  • Gate pen + initials stamp ready - Gate 1 is physical: a facilitator initials the contract
  • openapi.yaml open on the big screen, scrolled to /compare (the worked example)
  • Substrate cards (a / b / c) at each table - one scaffold, one DoD, one rubric across all three
  • Stack choice noted per team (Next.js or Spring Boot)
  • Both stacks verified: Next.js on :3099; Spring Boot on :8080 + :3099
  • Two sabotage cards printed on bright stock (see below)
  • Countdown timer projected
  • Gong/bell tested
  • Monday-commitment cards + pens at each seat
  • Wi-Fi + repo URL on screen
  • Power strips at every table
  • Synthetic-data-only reminder on screen (Way #3) - no real client data in any prompt

The Headline Arc

  SPEC  ──►  CONTRACT  ──►  CODE  ──►  6-LAYER TESTS  ──►  HUMAN REVIEW GATE
 (words)   (OpenAPI,        (stub +    (test at the         (named non-author
            frozen at        failing    lowest viable        approves vs DoD;
            Gate 1)          test 1st)  layer)               AI assists,
                                                             never approves)

Print this. Say it out loud at 0:00 and again at every phase change. Every team that ships should be able to point at where they are on this line.


Minute-by-Minute (total 120)

ClockPhaseMinWhat happens
0:00Intro10Pitch the arc. “Same 501, different rigor.” Walk the diagram. Name the two gates and that they outscore “it works”. Point at the DoD on every desk.
0:10Team formation + role claim + substrate & stack10Form 3 teams of ~5. Each person claims a role (Dev, Arch, QA, Product, UI/UX). Team picks a substrate (a/b/c) and a stack. Write both on the leaderboard.
0:20SPEC PHASE15Product writes acceptance criteria; Architecture drafts the ADR (incl. idempotency reasoning); the team authors the OpenAPI response schema. No feature code. Ends at GATE 1.
0:35🚦 GATE 1-A facilitator initials the OpenAPI contract at the table. No initials = no implementation. (Worked into the 15+30; budget ~2 min of roaming per team.)
0:35BUILD30Dev’s first commit = stub + a failing contract test (red), then implement to green. Small commits (Way #4). QA stacks the test pyramid. UI/UX builds the states.
1:05🚦 REVIEW & MERGE GATE15GATE 2. A named non-author human approves the PR against the DoD. AI may assist the review; AI may not approve. No approval = not merged = not scored as shipped.
1:20Demo showdown154 min/team. Contract on screen FIRST, then the failing-then-passing test, then the UI. Gong at cutoff.
1:35Standards pub quiz77 questions on the standards techniques (see rubric’s “technique spotting”).
1:42Leaderboard + spot prizes6Tally live. Award “Most Standards-Aligned” (audience vote) + spot prizes. Build tension.
1:48Monday commitments + blockers + close12Each person writes one thing they’ll do Monday. Capture blockers on the board. Close on the arc. Photo.
2:00END

Why the gates sit where they do: Gate 1 makes Way #1 physical - you cannot generate before you’ve described. Gate 2 makes Way #2 physical - nothing ships unread, and the approver is a different human than the author (ENG-PRIN-REVIEW-STMT, no self-merge).


The Two Gates - Run Them Like This

🚦 Gate 1 - Contract Freeze (end of Spec Phase, ~0:35)

The facilitator walks to each table and reads the contract aloud against four checks. Initial it only if all four pass.

CheckStandardPass looks like
/recommend 200 response is a real $ref, not a TODOREQ-API-2, REQ-API-5RecommendResponse defined under components/schemas, reusing Lesson/DimensionScore
Versioning + error envelope presentREQ-API-4, REQ-API-5info.version set; errors use the shared Problem (problem+json)
Idempotency reasoned, not just statedREQ-API-6ADR says why GET is safe/idempotent and what would change if it became a write
Acceptance criteria written before codeQUAL-PRINC-SHIFT-LEFTProduct’s criteria exist on paper/screen; no feature commits yet

Once initialled, the contract is frozen. Changing it later costs a re-initial - and the sabotage card may force exactly that.

🚦 Gate 2 - Review & Merge (1:05–1:20)

The single most-weighted moment in the day. Run it strictly.

  1. Author opens the PR/diff. They do not approve their own work.
  2. A named non-author on the team reviews against the printed DoD, ticking MUST items.
  3. AI may assist (summarise the diff, suggest missing tests, spot a guess) - the human types the approval and owns it (ENG-PRIN-REVIEW-STMT, ENG-PRIN-OWN-STMT).
  4. A roaming facilitator witnesses and records: approver name, MUST items met, merged y/n.

No approval, no ship. A working endpoint with no human approval scores the test/spec points but forfeits the 12 review points and cannot win “shipped”. Say this at the start so no one is surprised.


Sabotage Cards (standards-flavoured)

Print on bright stock. Deliver one per team during BUILD or REVIEW, timed to bite. These are not random chaos - each one tests a specific standard.

Card A - “Reviewer is out”

+-----------------------------------------------------------+
|  SABOTAGE CARD  -  THE REVIEWER IS OUT                    |
|                                                           |
|  Your designated reviewer just got pulled into an         |
|  incident. You may NOT self-merge.                        |
|  (ENG-PRIN-REVIEW-STMT: no change reaches trunk without   |
|   review by another human.)                               |
|                                                           |
|  Find a different named non-author to approve against     |
|  the DoD - or the work does not merge.                    |
|                                                           |
|  Survive it: +3 BONUS POINTS                              |
+-----------------------------------------------------------+

Tests: ENG-PRIN-REVIEW-STMT. The wrong move is to merge unreviewed “to save time”. The right move is to re-assign the reviewer.

Card B - “The criterion changed”

+-----------------------------------------------------------+
|  SABOTAGE CARD  -  ACCEPTANCE CRITERION CHANGED          |
|                                                           |
|  Product just changed an acceptance criterion             |
|  (e.g. "top 5" becomes "top 3, exclude completed").       |
|                                                           |
|  Update the CONTRACT *and* its CONTRACT TEST before you   |
|  merge. Spec, code, and test must agree.                  |
|  (REQ-API-2 keep contract in sync; REQ-API-7 contract     |
|   testing; QTEST-NO-DUP - change it in ONE place.)        |
|                                                           |
|  Survive it: +3 BONUS POINTS                              |
+-----------------------------------------------------------+

Tests: REQ-API-2, REQ-API-7, QTEST-NO-DUP. The wrong move is to patch the code and leave the contract/test stale. The right move is contract-first, re-initial at Gate 1, update the failing test, then implement.


Demo Format (4 min/team, gong-enforced)

Order is mandatory - the contract goes up first. This is the whole point of the session.

  1. Contract (60s): Show /recommend in openapi.yaml. “Here’s the shape we froze at Gate 1.” Point at the $ref, the error envelope, the version.
  2. Test pyramid (60s): Show the first commit (stub + failing contract test, red) and the same test passing (green). Name the layers you covered (unit / component / contract / system).
  3. The endpoint (60s): Hit it live. Show the real response matches the frozen schema.
  4. UI states (45s): Loading, empty, error, populated.
  5. The signature (15s): Name the approver and the DoD items they ticked. “Signed by [name], not the author.”

Facilitator scores on the spot using scoring-rubric.md. Gong at 4:00, no exceptions.


The Three Substrate Options

All three share one scaffold, one DoD, one rubric. Teams pick at 0:10. Let mixed-confidence teams take (a); let strong teams take (b) or (c).

SubstrateWhat they buildBest for
aGuided rise-miniFinish /compare and/or /recommend in the rise-mini repo. /compare is fully specified in openapi.yaml as the worked example; /recommend is deliberately partial - they author its schema first.Teams who want rails. The contract is half-written; they complete it.
bGreenfield from a written specA brand-new endpoint from a one-paragraph product spec the facilitator hands out. Same arc, blank repo.Teams who want to own the contract end-to-end.
cBring-your-ownA real endpoint from the team’s own backlog (synthetic data only - Way #3).Teams who want Monday-relevant output.

The trick: the grading sheet doesn’t care which substrate you chose. Spec → contract → test → review applies identically. A team on (a) and a team on (c) are scored on the same DoD and rubric.

The worked example for everyone: /compare in openapi.yaml is fully specified - $ref-ed response, error envelope, idempotency note. /recommend is the TODO. Project /compare at 0:20 and say: “Make /recommend look like this - that’s Gate 1.”


If Running Behind

Protect the review gate above all else. Gate 2 is the point of the day. Cut from the bottom up:

PriorityWhat to cutSavesNever cut
1stDrop quiz from 7 → 4 questions~3 min-
2ndDemos to 3 min, top scorers only~3–4 min-
3rdSkip spot prizes, keep leaderboard tally~3 min-
4thShorten BUILD by 5 min (announce a hard scope-cut: stub + one happy path)~5 min-
NEVERGate 1 contract freeze · Gate 2 human review · Monday commitments-These are the session.

If a team is drowning in BUILD, the rescue is scope down to one happy path so they still reach Gate 2 - a smaller thing, properly reviewed, beats a bigger thing unreviewed. That is the lesson.


Materials Checklist (hand-out)

  • Definition of Done - one printed per person (grading sheet + takeaway)
  • Substrate cards a / b / c - one set per table
  • Greenfield spec paragraph (for substrate b) - printed
  • The two sabotage cards - bright stock, 3 of each
  • Gate pen / initials stamp (Gate 1)
  • Gate 2 witness slip per team: approver name · MUST items met · merged y/n
  • The arc diagram, projected
  • openapi.yaml open on /compare
  • Monday-commitment cards + pens
  • Blockers board
  • Standards quiz loaded (Mentimeter/Slido)
  • Leaderboard (scoring-rubric.md template)

Common Problems & Solutions

ProblemSolution
”We just want to start coding.”Hold the line: no Gate 1 initials, no implementation. That’s ENG-PRIN-INCR-STMT + Way #1.
Dev writes the impl before the testReset to the first-commit rule: stub + failing contract test first. It’s worth 14 points; “it works” is worth 6.
Author wants to approve their own PRHard stop. ENG-PRIN-REVIEW-STMT - no self-merge. This is Sabotage Card A whether or not you handed it out.
Contract drifts from the codeThat’s exactly what REQ-API-2 forbids. Point them at the contract test (REQ-API-7) - it should be failing.
AI “approved” the reviewAI assists, never approves. A human types and owns the sign-off (Way #2, ENG-PRIN-OWN-STMT).
Team over-builds (3 abstractions “for later”)Way #5 / ENG-PRIN-SIMPLE-STMT (KISS/YAGNI). Score the ADR down, reward the simplest thing that meets the contract.
Someone pastes real client dataStop, synthetic only (Way #3). Substrate (c) teams especially.

Part of Innovation Day. The five ways of working: ways-of-working.md. Definition of Done: definition-of-done.md. Scoring: scoring-rubric.md.