Facilitator Guide - Spec-First Build (2 hours)

The pitch in one line: “Same 501 endpoint as last time - completely different rigor. We’re not racing to make it work. We’re racing to make it right, on the record, and signed by a human.”

This is the technical session of Innovation Day. Three cross-functional teams of ~5 (Dev, Architecture, QA, Product, UI/UX) take a feature from spec → OpenAPI contract → code → six-layer tests → human review gate. Two hard gates are graded above everything else: the contract is frozen before any feature code, and a named non-author human approves the merge against the Definition of Done.

The five rules of the road are the same as the Skill session - see The 5 Ways We Work With AI. This session lives and dies on Way #1: Describe before you generate (the contract) and Way #2: A human always signs the work (the review gate). Tool-agnostic where stated: any AI assistant (Claude, Copilot, Cursor, Gemini) and either stack.

You need:

1 lead facilitator (runs the room, holds the gate pen, scores)
1–2 roaming facilitators (unblock teams, initial contracts at Gate 1, witness Gate 2 approvals, deliver sabotage cards)

Companion docs, read them before the room: Definition of Done · Scoring Rubric · OpenAPI contract.

Pre-Session Checklist (30 min before)

The Headline Arc

  SPEC  ──►  CONTRACT  ──►  CODE  ──►  6-LAYER TESTS  ──►  HUMAN REVIEW GATE
 (words)   (OpenAPI,        (stub +    (test at the         (named non-author
            frozen at        failing    lowest viable        approves vs DoD;
            Gate 1)          test 1st)  layer)               AI assists,
                                                             never approves)

Print this. Say it out loud at 0:00 and again at every phase change. Every team that ships should be able to point at where they are on this line.

Minute-by-Minute (total 120)

Clock	Phase	Min	What happens
0:00	Intro	10	Pitch the arc. “Same 501, different rigor.” Walk the diagram. Name the two gates and that they outscore “it works”. Point at the DoD on every desk.
0:10	Team formation + role claim + substrate & stack	10	Form 3 teams of ~5. Each person claims a role (Dev, Arch, QA, Product, UI/UX). Team picks a substrate (a/b/c) and a stack. Write both on the leaderboard.
0:20	SPEC PHASE	15	Product writes acceptance criteria; Architecture drafts the ADR (incl. idempotency reasoning); the team authors the OpenAPI response schema. No feature code. Ends at GATE 1.
0:35	🚦 GATE 1	-	A facilitator initials the OpenAPI contract at the table. No initials = no implementation. (Worked into the 15+30; budget ~2 min of roaming per team.)
0:35	BUILD	30	Dev’s first commit = stub + a failing contract test (red), then implement to green. Small commits (Way #4). QA stacks the test pyramid. UI/UX builds the states.
1:05	🚦 REVIEW & MERGE GATE	15	GATE 2. A named non-author human approves the PR against the DoD. AI may assist the review; AI may not approve. No approval = not merged = not scored as shipped.
1:20	Demo showdown	15	4 min/team. Contract on screen FIRST, then the failing-then-passing test, then the UI. Gong at cutoff.
1:35	Standards pub quiz	7	7 questions on the standards techniques (see rubric’s “technique spotting”).
1:42	Leaderboard + spot prizes	6	Tally live. Award “Most Standards-Aligned” (audience vote) + spot prizes. Build tension.
1:48	Monday commitments + blockers + close	12	Each person writes one thing they’ll do Monday. Capture blockers on the board. Close on the arc. Photo.
2:00	END

Why the gates sit where they do: Gate 1 makes Way #1 physical - you cannot generate before you’ve described. Gate 2 makes Way #2 physical - nothing ships unread, and the approver is a different human than the author (ENG-PRIN-REVIEW-STMT, no self-merge).

The Two Gates - Run Them Like This

🚦 Gate 1 - Contract Freeze (end of Spec Phase, ~0:35)

The facilitator walks to each table and reads the contract aloud against four checks. Initial it only if all four pass.

Check	Standard	Pass looks like
`/recommend` 200 response is a real `$ref`, not a TODO	REQ-API-2, REQ-API-5	`RecommendResponse` defined under `components/schemas`, reusing `Lesson`/`DimensionScore`
Versioning + error envelope present	REQ-API-4, REQ-API-5	`info.version` set; errors use the shared `Problem` (problem+json)
Idempotency reasoned, not just stated	REQ-API-6	ADR says why GET is safe/idempotent and what would change if it became a write
Acceptance criteria written before code	QUAL-PRINC-SHIFT-LEFT	Product’s criteria exist on paper/screen; no feature commits yet

Once initialled, the contract is frozen. Changing it later costs a re-initial - and the sabotage card may force exactly that.

🚦 Gate 2 - Review & Merge (1:05–1:20)

The single most-weighted moment in the day. Run it strictly.

Author opens the PR/diff. They do not approve their own work.
A named non-author on the team reviews against the printed DoD, ticking MUST items.
AI may assist (summarise the diff, suggest missing tests, spot a guess) - the human types the approval and owns it (ENG-PRIN-REVIEW-STMT, ENG-PRIN-OWN-STMT).
A roaming facilitator witnesses and records: approver name, MUST items met, merged y/n.

No approval, no ship. A working endpoint with no human approval scores the test/spec points but forfeits the 12 review points and cannot win “shipped”. Say this at the start so no one is surprised.

Sabotage Cards (standards-flavoured)

Print on bright stock. Deliver one per team during BUILD or REVIEW, timed to bite. These are not random chaos - each one tests a specific standard.

Card A - “Reviewer is out”

+-----------------------------------------------------------+
|  SABOTAGE CARD  -  THE REVIEWER IS OUT                    |
|                                                           |
|  Your designated reviewer just got pulled into an         |
|  incident. You may NOT self-merge.                        |
|  (ENG-PRIN-REVIEW-STMT: no change reaches trunk without   |
|   review by another human.)                               |
|                                                           |
|  Find a different named non-author to approve against     |
|  the DoD - or the work does not merge.                    |
|                                                           |
|  Survive it: +3 BONUS POINTS                              |
+-----------------------------------------------------------+

Tests: ENG-PRIN-REVIEW-STMT. The wrong move is to merge unreviewed “to save time”. The right move is to re-assign the reviewer.

Card B - “The criterion changed”

+-----------------------------------------------------------+
|  SABOTAGE CARD  -  ACCEPTANCE CRITERION CHANGED          |
|                                                           |
|  Product just changed an acceptance criterion             |
|  (e.g. "top 5" becomes "top 3, exclude completed").       |
|                                                           |
|  Update the CONTRACT *and* its CONTRACT TEST before you   |
|  merge. Spec, code, and test must agree.                  |
|  (REQ-API-2 keep contract in sync; REQ-API-7 contract     |
|   testing; QTEST-NO-DUP - change it in ONE place.)        |
|                                                           |
|  Survive it: +3 BONUS POINTS                              |
+-----------------------------------------------------------+

Tests: REQ-API-2, REQ-API-7, QTEST-NO-DUP. The wrong move is to patch the code and leave the contract/test stale. The right move is contract-first, re-initial at Gate 1, update the failing test, then implement.

Demo Format (4 min/team, gong-enforced)

Order is mandatory - the contract goes up first. This is the whole point of the session.

Contract (60s): Show /recommend in openapi.yaml. “Here’s the shape we froze at Gate 1.” Point at the $ref, the error envelope, the version.
Test pyramid (60s): Show the first commit (stub + failing contract test, red) and the same test passing (green). Name the layers you covered (unit / component / contract / system).
The endpoint (60s): Hit it live. Show the real response matches the frozen schema.
UI states (45s): Loading, empty, error, populated.
The signature (15s): Name the approver and the DoD items they ticked. “Signed by [name], not the author.”

Facilitator scores on the spot using scoring-rubric.md. Gong at 4:00, no exceptions.

The Three Substrate Options

All three share one scaffold, one DoD, one rubric. Teams pick at 0:10. Let mixed-confidence teams take (a); let strong teams take (b) or (c).

	Substrate	What they build	Best for
a	Guided rise-mini	Finish `/compare` and/or `/recommend` in the rise-mini repo. `/compare` is fully specified in `openapi.yaml` as the worked example; `/recommend` is deliberately partial - they author its schema first.	Teams who want rails. The contract is half-written; they complete it.
b	Greenfield from a written spec	A brand-new endpoint from a one-paragraph product spec the facilitator hands out. Same arc, blank repo.	Teams who want to own the contract end-to-end.
c	Bring-your-own	A real endpoint from the team’s own backlog (synthetic data only - Way #3).	Teams who want Monday-relevant output.

The trick: the grading sheet doesn’t care which substrate you chose. Spec → contract → test → review applies identically. A team on (a) and a team on (c) are scored on the same DoD and rubric.

The worked example for everyone: /compare in openapi.yaml is fully specified - $ref-ed response, error envelope, idempotency note. /recommend is the TODO. Project /compare at 0:20 and say: “Make /recommend look like this - that’s Gate 1.”

If Running Behind

Protect the review gate above all else. Gate 2 is the point of the day. Cut from the bottom up:

Priority	What to cut	Saves	Never cut
1st	Drop quiz from 7 → 4 questions	~3 min	-
2nd	Demos to 3 min, top scorers only	~3–4 min	-
3rd	Skip spot prizes, keep leaderboard tally	~3 min	-
4th	Shorten BUILD by 5 min (announce a hard scope-cut: stub + one happy path)	~5 min	-
NEVER	Gate 1 contract freeze · Gate 2 human review · Monday commitments	-	These are the session.

If a team is drowning in BUILD, the rescue is scope down to one happy path so they still reach Gate 2 - a smaller thing, properly reviewed, beats a bigger thing unreviewed. That is the lesson.

Materials Checklist (hand-out)

Common Problems & Solutions

Problem	Solution
”We just want to start coding.”	Hold the line: no Gate 1 initials, no implementation. That’s `ENG-PRIN-INCR-STMT` + Way #1.
Dev writes the impl before the test	Reset to the first-commit rule: stub + failing contract test first. It’s worth 14 points; “it works” is worth 6.
Author wants to approve their own PR	Hard stop. `ENG-PRIN-REVIEW-STMT` - no self-merge. This is Sabotage Card A whether or not you handed it out.
Contract drifts from the code	That’s exactly what REQ-API-2 forbids. Point them at the contract test (REQ-API-7) - it should be failing.
AI “approved” the review	AI assists, never approves. A human types and owns the sign-off (Way #2, `ENG-PRIN-OWN-STMT`).
Team over-builds (3 abstractions “for later”)	Way #5 / `ENG-PRIN-SIMPLE-STMT` (KISS/YAGNI). Score the ADR down, reward the simplest thing that meets the contract.
Someone pastes real client data	Stop, synthetic only (Way #3). Substrate (c) teams especially.

Part of Innovation Day. The five ways of working: ways-of-working.md. Definition of Done: definition-of-done.md. Scoring: scoring-rubric.md.

Source: docs/innovation-day/spec-first-build/facilitator-guide.md