Spec-First Build - Role Cards (Print-Ready)
The session in one line: nobody writes feature code until the contract is frozen. Each role produces a spec artifact and hands it down the spine. The five ways anchor everything we do here - see
../ways-of-working.md.
Five roles, five deliverables, one handoff spine:
PRODUCT ──▶ ARCHITECTURE ──▶ ┌─ DEV ─┐
(what + (contract + │ QA │ ──▶ REVIEWER
why) ADR, Gate 1) └─ UX ─┘ (a human signs)
The two endpoints under construction are /compare (fully specified) and /recommend (deliberately partial) in ../../../stacks/nextjs/openapi.yaml. That file is the single contract both stacks implement.
Print as A5 cards, double-sided. Laminate if possible. Colour-code by role: Product (purple), Architecture (blue), Dev (amber), QA (green), UI/UX (pink).
How to read a card
Every card has the same five lines:
| Line | What it tells you |
|---|---|
| Deliverable | The one spec artifact you produce. No feature code until it exists. |
| Satisfies | The corporate-standard requirement IDs your artifact ticks off. |
| Hands off | What you give, and to which downstream role on the spine. |
| AI techniques | 2-3 tool-agnostic ways to let AI accelerate - you still sign it (Way #2). |
| Maps to DoD | How your artifact moves the Definition of Done forward (QUAL-PRINC-SDLC). |
Product - Feature Spec
+-----------------------------------------------------------+
| SPEC-FIRST CARD: PRODUCT |
| Spine position: HEAD -- you set the destination |
| |
| DELIVERABLE |
| A Feature Spec for /compare AND /recommend, with |
| Given/When/Then acceptance criteria, named consumers, |
| and exactly ONE success metric per feature. |
| |
| SATISFIES |
| REQ-API-1 -- the API is a product; the spec defines |
| its contract of value, not just routes. |
| QUAL-PRINC-SHIFT-LEFT -- you prevent defects at |
| inception by describing "good" first. |
| |
| HANDS OFF |
| -> ARCHITECTURE. Your Given/When/Then becomes the |
| source of truth for the OpenAPI response schema |
| and the ADR's decision context. |
| |
| AI TECHNIQUES |
| [1] Draft G/W/T from a one-line brief, then tighten |
| each scenario by hand. |
| [2] Ask AI to surface the edge cases you missed |
| (N/A dimensions, empty team, completed lessons). |
| [3] Pressure-test your success metric: "is this |
| measurable, and does it prove the feature worked?" |
| |
| MAPS TO DoD (QUAL-PRINC-SDLC) |
| Your acceptance criteria ARE the DoD's first column. |
| If it isn't in a G/W/T here, it isn't "done" later. |
+-----------------------------------------------------------+
What “good” looks like (use the seed data):
| /compare | /recommend | |
|---|---|---|
| Given | Alice is on team Platform | Alice has two N/A dimensions |
| When | she requests scope=team | she requests recommendations |
| Then | she sees her score next to the team average per dimension, averaging only members who responded | she gets her top 5 lessons by gap, excluding ones she has completed |
| Consumer | the dashboard radar overlay (UI/UX) | the “Recommended for you” list |
| Success metric | e.g. % of users who view their gap vs team | e.g. recommendation click-through rate |
Worked example numbers live in the contract’s
examples:block - Alice’sai_literacyteam average is(0.72+0.65+0.50)/3 ≈ 0.62, responded-only. Hand Architecture the rule, not just the number.
Architecture - Contract + ADR (owns Gate 1)
+-----------------------------------------------------------+
| SPEC-FIRST CARD: ARCHITECTURE |
| Spine position: gatekeeper -- you FREEZE the contract |
| |
| DELIVERABLE |
| Extend openapi.yaml: author the /recommend 200 schema |
| (the file ships it as a TODO) and reuse the shared |
| Recommendation / Problem shapes. PLUS one ADR |
| capturing a real decision + its trade-off. |
| |
| SATISFIES |
| REQ-API-2 -- machine-readable OpenAPI, kept in sync. |
| REQ-API-4 -- explicit, governed version. |
| REQ-API-5 -- consistent conventions: shared schemas, |
| problem+json error envelope, no copy. |
| REQ-API-6 -- idempotency: GET is safe; note in the |
| ADR what changes if it ever becomes write|
| ENG-PRIN-DOC-STMT -- the ADR records the WHY so the |
| decision outlives the conversation. |
| |
| HANDS OFF |
| -> DEV, QA, UI/UX (all three, in parallel). The frozen |
| contract is the only thing they build against. |
| |
| AI TECHNIQUES |
| [1] Have AI draft the RecommendResponse schema from |
| Product's G/W/T, then reconcile it against the |
| existing Recommendation schema -- reuse, don't dupe.|
| [2] Generate an ADR skeleton, then YOU write the |
| trade-off in your own words. |
| [3] Ask AI to lint the contract for REQ-API-5 |
| consistency (naming, error envelope, enums). |
| |
| GATE 1 (you own it) |
| Contract is frozen and committed BEFORE any feature |
| code. After this gate, the schema is law. Changes need |
| a new commit + a one-line ADR note. |
| |
| MAPS TO DoD (QUAL-PRINC-SDLC) |
| "Contract published and versioned" + "decision |
| recorded" are DoD entries only you can tick. |
+-----------------------------------------------------------+
A real decision to put in your ADR (pick one, state the trade-off):
| Decision | Trade-off to capture |
|---|---|
| Compute team/org averages on the fly vs. precompute | On-the-fly is simplest (ENG-PRIN-SIMPLE-STMT); dataset is tiny so caching is YAGNI - but won’t scale to large orgs |
gapScore rounding & tie-break (gapScore desc, then stage: exploring first) | Deterministic order makes the contract testable; hides ties that a relevance signal might break differently |
One Problem envelope for every error vs. per-route shapes | Reuse satisfies REQ-API-5; less expressive than bespoke error bodies |
This is Way #5: the simplest thing that works, and a note on why. One ADR line beats three speculative abstractions.
Dev - Implementation Behind the Frozen Contract
+-----------------------------------------------------------+
| SPEC-FIRST CARD: DEV |
| Spine position: builds AGAINST the contract, not ahead |
| |
| DELIVERABLE |
| /compare and /recommend implemented to match the |
| frozen openapi.yaml exactly. Built on a short-lived |
| branch, stub + failing contract test FIRST, then |
| fill in until the test goes green. |
| |
| SATISFIES |
| ENG-PRIN-INCR-STMT -- many small commits; stub, |
| then logic, then edge cases. |
| ENG-PRIN-TRUNK-STMT -- short-lived branch, integrate |
| often, no week-long divergence. |
| ENG-PRIN-LOCAL-STMT -- run it on your machine; prove |
| it works before you push. |
| REQ-API-7 -- contract-based testing: the contract, |
| not your assumptions, decides "correct". |
| |
| HANDS OFF |
| -> REVIEWER. A human reads every line and approves; |
| no self-merge (ENG-PRIN-REVIEW-STMT, Way #2). |
| |
| AI TECHNIQUES |
| [1] Feed the agent the frozen contract as context, |
| then: "make this failing contract test pass -- |
| no extra endpoints, no extra fields." |
| [2] Implement responded-only averaging and the |
| gapScore + tie-break in one small commit each; |
| ask AI to review each diff before the next. |
| [3] Keep AI honest: "does this response match the |
| schema field-for-field?" before you call it done. |
| |
| MAPS TO DoD (QUAL-PRINC-SDLC) |
| "Behaviour matches contract", "runs locally", |
| "small reviewable commits" -- your green test is the |
| evidence behind those DoD ticks. |
+-----------------------------------------------------------+
The order matters (Way #4 - small steps, not big leaps):
- Stub the route -> returns a hard-coded valid-shaped body.
- Write the failing contract test against
openapi.yaml. - Implement averaging / gapScore until it goes green.
- Add edge cases (N/A dims, completed-lesson exclusion) as separate commits.
Build only what the contract says. Extra fields “for later” are exactly what
ENG-PRIN-SIMPLE-STMTwarns against.
QA - Test Pyramid at the Lowest Viable Layer
+-----------------------------------------------------------+
| SPEC-FIRST CARD: QA |
| Spine position: parallel to Dev -- you test the contract |
| |
| DELIVERABLE |
| A test plan that hits each rule at its LOWEST viable |
| layer: |
| - UNIT: gapScore math, responded-only averaging, |
| sort order (gapScore desc, stage tie-break)|
| - CONTRACT: responses validated vs openapi.yaml |
| - E2E: exactly ONE happy path, no more. |
| |
| SATISFIES |
| QTEST-EARLY -- test at the lowest viable layer; pure |
| logic belongs in unit tests, not E2E. |
| QTEST-NO-DUP -- don't re-assert the same rule at three |
| layers; each rule, one home. |
| REQ-API-7 -- contract-based testing against the |
| machine-readable spec. |
| |
| HANDS OFF |
| -> REVIEWER (with Dev). Your green pyramid is the |
| evidence the reviewer needs to sign. |
| |
| AI TECHNIQUES |
| [1] Generate the unit cases from Product's G/W/T, |
| including the N/A-dimension gapScore = 1.0 cases. |
| [2] Ask AI: "which of these assertions belong in unit |
| vs contract vs E2E?" -- then push each down a layer.|
| [3] Have AI generate the contract test harness that |
| loads openapi.yaml and validates real responses. |
| |
| MAPS TO DoD (QUAL-PRINC-SDLC) |
| "Tested at the right layer, no duplication" is a DoD |
| entry. A wall of E2E tests is waste (QUAL-PRINC-WASTE), |
| not done. |
+-----------------------------------------------------------+
Where each rule lives (QTEST-NO-DUP - one home per rule):
| Rule | Layer | Why here |
|---|---|---|
gapScore = 1.0 - value; N/A dims -> 1.0 | Unit (QTEST-UNIT) | Pure function, no I/O |
Average only members where responded:true | Unit | Pure math; Alice’s ai_literacy team avg ≈ 0.62 |
| Sort: gapScore desc, then stage (exploring first) | Unit | Deterministic, fast |
Response matches openapi.yaml field-for-field | Contract (QTEST-CONTRACT) | Catches drift the contract owns |
| Alice loads dashboard, sees compare + top-5 recommend | E2E (QTEST-E2E) - ONE | Proves the wiring end-to-end, once |
One E2E. The math is already covered below it - re-testing it through the browser is duplication, and slow duplication at that.
UI/UX - State + Data-Viz Spec
+-----------------------------------------------------------+
| SPEC-FIRST CARD: UI/UX |
| Spine position: parallel to Dev -- you spec the surface |
| |
| DELIVERABLE |
| A UX spec covering FOUR states for each feature -- |
| loading, populated, empty, error -- plus the data-viz |
| spec for overlaying the team average on the radar |
| chart. |
| |
| SATISFIES |
| QUAL-PRINC-SDLC -- the experience is part of the |
| Definition of Done, not a bolt-on. A |
| feature with no empty/error state is not |
| done. |
| |
| HANDS OFF |
| -> DEV. Your four states + overlay spec tell Dev |
| exactly what to render for each contract response |
| (including the error envelope and N/A dimensions). |
| |
| AI TECHNIQUES |
| [1] Ask AI to enumerate every state from the contract: |
| what does empty look like? what renders on a |
| problem+json error? |
| [2] Generate quick mockups for each of the four states |
| from a text description, then refine. |
| [3] Map each DimensionScore field to a visual element |
| -- and have AI flag where a wrong mapping would |
| mislead the user. |
| |
| MAPS TO DoD (QUAL-PRINC-SDLC) |
| "All four states designed and handled" is a DoD line. |
| Populated-only is a demo, not a done feature. |
+-----------------------------------------------------------+
The four states (spec all four - this is the DoD line nobody remembers):
| State | /compare | /recommend |
|---|---|---|
| Loading | radar with skeleton overlay | shimmer on the list |
| Populated | user line + team-average line on one radar | top 5 cards with reason + gap |
| Empty | team has one responder -> show “not enough data” | no gaps (all dims high) -> celebratory empty state |
| Error | render the problem+json title, not a blank chart | same - show the error envelope’s title/detail |
Teachable correctness link - the radar overlay meets a planted bug.
The compare feature overlays the team average on the same radar chart that carries planted Bug #1: reversed axis labels (.reverse() in RadarChart.tsx misaligns labels with values). Your overlay only tells the truth if the axes are correct. A pixel-perfect overlay on a mislabelled chart is a confidently wrong visualisation - the most dangerous kind. Spec the axis-to-dimension mapping explicitly so the bug surfaces here, not in front of a user. Good data-viz is a correctness concern, not decoration.
This is Way #1 for the interface: describe every state - including the ugly ones - before anyone generates the component.
The handoff, on one line per role
| Role | Produces | Freezes/Gates | Hands to | Top requirement |
|---|---|---|---|---|
| Product | Feature Spec (G/W/T, consumers, metric) | - | Architecture | REQ-API-1 |
| Architecture | OpenAPI schema + ADR | Gate 1: contract frozen | Dev, QA, UX | REQ-API-2/4/5/6 |
| Dev | Implementation, stub-then-test | short-lived branch | Reviewer | REQ-API-7 |
| QA | Layered test pyramid | green at lowest layer | Reviewer | QTEST-EARLY |
| UI/UX | Four-state + viz spec | - | Dev | QUAL-PRINC-SDLC |
| Reviewer | Approval (a human signs) | no self-merge | trunk | ENG-PRIN-REVIEW-STMT |
Nothing reaches trunk unread. AI may assist the review; a human approves it (Way #2,
ENG-PRIN-REVIEW-STMT).
Printing Notes
- Print each card as A5 (half of A4); card stock 160gsm+ if available.
- Colour-code by role: Product (purple), Architecture (blue), Dev (amber), QA (green), UI/UX (pink).
- One card per person for their role; print a spare set of the Architecture card - Gate 1 is the bottleneck everyone references.
- Keep the contract (
openapi.yaml) and the five ways projected throughout.