Build-a-Skill - The Four Scenarios

Who this is for: programme managers, ops, and admin - non-technical, mixed confidence. No code, no setup. You pick a real, repetitive task and turn it into a one-page, reusable Skill you could paste into any chatbot on Monday.

The whole session is tool-agnostic. ChatGPT, Gemini, Claude, Copilot - whatever you already have open. The Skill is the deliverable, not the tool.

The one rule that runs through everything: describe what “good” looks like before you prompt. That’s Way #1: Describe before you generate. The Skill one-pager template is built to force it - the Output section sits above the prompt for a reason.

How the session runs


Length	2 hours. Each scenario is sized for ~30–40 min end to end.
You will	Pick one scenario, fill in the SKILL template, run it once on the supplied sample data, then read every line of the output and find what the AI got wrong.
Scenario 1	Is the guided build - we do it together, step by step, as the warm-up. Then you pick scenario 2, 3, or 4 to do solo or in a pair.
Golden rule	Use the sample data only (`sample-data/`). It is synthetic. Never paste real names, client data, or confidential figures - Way #3.
The two beats that win the room	(1) the privacy check - what you strip before pasting; (2) the human sign-off - the one thing the AI got confidently wrong. Every scenario is rigged so there’s something to catch.

Each scenario below gives you four things: the real task, the input file, the spec-first move (what to define before you prompt), and what good looks like (your checklist). Each points to a finished worked example in skill-examples/ you can compare against.

Scenario 1 - Weekly status report (the guided build)

The real task. Every Monday someone turns messy sync notes into a clean status update for stakeholders. It’s the single most repeated programme-management chore in the building. We’ll build the Skill for it together.

For: programme / project managers writing the weekly stakeholder update.

Input: sample-data/sample-meeting-notes.txt - raw, lower-case, half-finished sync notes from “Project Helix”.

Spec-first - define the Output before you write a single prompt. Open the template, go straight to section 3, and pin down the shape:

A short table: Summary | Key decisions | Risks (R/A/G) | Actions (Owner, Due).
A rule for missing owners: if the notes don’t clearly state who owns an action, write TBC - do not guess. (This one matters - see the worked example.)
A separate Open questions / unconfirmed list so nothing gets quietly smoothed over.
Neutral, factual tone. Max ~8 rows.

Only once that shape is written do you draft the prompt (section 4) and paste it in.

What good looks like ✅

Decisions, risks, and actions are separated - not one mushy paragraph.
Every action has an owner or an honest TBC. None invented.
The unconfirmed bits (legal review, the analytics-numbers issue) are surfaced as open questions, not dropped.
You read every line and fixed at least one thing the AI got wrong (Way #2).
Before pasting, you stripped the names, the vendor, and the confidential figure (Way #3).

Worked example: skill-examples/status-report-skill.md

Scenario 2 - Expense report cleanup

The real task. A monthly expenses export lands as a messy CSV - inconsistent categories, typos, the odd duplicate. Someone has to tidy and sanity-check it before it goes to finance. The Skill turns “stare at a spreadsheet for 40 minutes” into “review the AI’s flagged list in 5.”

For: ops / finance admin preparing an expense file for sign-off.

Input: sample-data/sample-expenses.csv - 27 rows, deliberately imperfect.

Spec-first - define the Output before you prompt. In section 3 of the template, decide what “clean” means:

A cleaned table (consistent category casing, consistent date format) plus a separate “Flags for review” list - the AI proposes, a human decides.
The cast-iron rule: flag, don’t delete. Anything suspect (possible duplicate, odd value, wrong format) goes on the flag list with a reason. The AI never silently removes a row.
Do not convert or total across currencies. If amounts are in mixed currencies, sub-total per currency and say so. (Mixing GBP/USD/EUR into one number is the classic trap - see worked example.)
A privacy line: strip anything personal from the notes column before pasting.

What good looks like ✅

Categories normalised (e.g. travel/Travel → one form); dates in one format.
Duplicates and oddities are flagged with a reason, not deleted.
Currencies are sub-totalled separately - no single blended total.
A typo or two is caught and flagged rather than guessed-corrected.
You spotted the personal email hiding in a notes cell and removed it (Way #3).

Worked example: skill-examples/expense-cleanup-skill.md

Scenario 3 - Incoming request triage

The real task. A shared inbox or request log fills up with mixed asks - some genuinely urgent, most not. Someone triages priority and owner each morning. The Skill drafts that triage in seconds; a human still decides.

For: ops / admin running a shared request queue or service desk.

Input: sample-data/sample-requests.csv - 12 requests, varied urgency, plain-English wording.

Spec-first - define the Output before you prompt. In section 3:

A triage table: ID | One-line summary | Priority (High / Medium / Low) | Suggested owner/team | Why this priority.
The “Why” column is non-negotiable - it forces the AI to justify the rating so a human can sanity-check it rather than trust a bare label.
Read urgency from the words, not the sender’s tone. “URGENT” in caps isn’t automatically High; a quiet note about no heating for two days might be.
A redaction rule: personal contact details and ID numbers must be flagged and stripped, never echoed into the triage output.

What good looks like ✅

Every row has a priority and a one-line justification.
Genuinely time-critical items (out of paper before tomorrow’s board pack; heating off for days) are rated High - and you checked the AI didn’t under-rate them.
Low-stakes “whenever convenient” items aren’t inflated to High just because they’re recent.
No personal phone number or employee ID appears anywhere in the output - they were flagged and removed (Way #3).
You overrode at least one priority the AI got wrong (Way #2).

Worked example: skill-examples/request-triage-skill.md

Scenario 4 - Multi-document summary

The real task. A decision needs a one-page read-out drawn from several short docs - a policy note, a vendor update, a risk extract. Someone reads all three and writes the brief. The Skill drafts it; the human checks it holds together. The hard part isn’t summarising - it’s noticing where the documents disagree.

For: programme managers / ops preparing a decision brief from a small document pack.

Input: sample-data/sample-docs/ - three files: policy-note.txt, vendor-update.txt, risk-extract.txt.

Spec-first - define the Output before you prompt. In section 3:

A one-page brief: Background | Key facts | Conflicts / open questions | Recommended next step.
The make-or-break instruction: a dedicated “Conflicts / open questions” section. Tell the AI explicitly to surface anything where the documents disagree, rather than picking one version and moving on.
An anti-smoothing rule: do not resolve a contradiction by choosing the more confident source. If dates or facts clash, report the clash.
Privacy line: strip the named individuals and the vendor name before pasting.

What good looks like ✅

The brief names the date conflict explicitly - the fixed go-live vs the vendor’s later readiness date - as an open question.
It doesn’t quietly state a single go-live date as settled fact.
The flagged risk about the clash is carried through, not dropped.
The recommended next step is “escalate / confirm the date,” not a false “all on track.”
Names and vendor anonymised before pasting (Way #3); you verified the conflict made it through (Way #2).

Worked example: skill-examples/doc-summary-skill.md

Wrap-up

By the end you have a one-page Skill you can reuse, and you’ve felt both halves of working with AI: it drafts fast, and a human still signs the work. File your Skill where your team can find it - that’s the Reuse notes section, and Way #5: keep it simple, write down the why.

The same five ways of working underpin the Build session next door - the only difference is the team writes an API contract instead of a Skill one-pager. Same destination, faster road.

Facilitator only - planted elements

Cut this section before printing participant handouts. It exists so the Skill Scout (roaming facilitator) can nudge people toward the two beats that make the session land: the privacy strip and the human catch. Don’t hand people the answer - ask the question that gets them to find it. The whole point is that a lazy “just summarise this” prompt produces a confidently-wrong result, and a spec-first prompt with an anti-guessing rule doesn’t.

Each scenario is ~30–40 min. Every dataset carries (a) privacy bait - something that should never reach the chatbot, and (b) at least one AI-catchable oversight error - something the AI gets wrong unless the spec told it not to.

Scenario 1 - `sample-meeting-notes.txt`

	Planted element	What it teaches	Scout nudge
Privacy bait	Full name “Maria Gonzalez”; vendor “Brightwave Ltd”; budget “480k (confidential)“	Way #3: names, vendor, and confidential figures must be stripped/disguised before pasting.	”What in those notes would you not want a public chatbot to keep?”
Oversight catch (primary)	Line 11: “dave picking up the vendor chase i think? not 100% sure who owns that actually.”	The owner is not stated. A spec-first prompt that says “write TBC if owner not stated, do not guess” outputs owner = TBC. A lazy prompt makes the AI confidently assign Dave.	”The report says Dave owns the vendor chase - does the source actually say that?”
Oversight catch (secondary)	Line 13: the analytics dashboard “showing wrong numbers again” is easy to drop because it’s the last, throwaway line.	Surfacing-not-smoothing: an open issue should survive into the output, not be trimmed.	”Is everything from the notes accounted for, including the last line?”

Scenario 2 - `sample-expenses.csv`

	Planted element	What it teaches	Scout nudge
Privacy bait	Personal email `[email protected]` in the notes cell of the coffee row.	PII hides in free-text columns, not just obvious fields.	”Did you read every cell - including the notes column - before pasting?”
Oversight catch - flag-don’t-delete	Two identical “Monitor stand” rows (£45.99 each).	These should be FLAGGED as a possible duplicate, not silently deleted. A genuine double-purchase is plausible - a human decides.	”The AI removed a row - should it have, or should it have asked you?”
Oversight catch - normalise	Inconsistent category casing (`travel`/`Travel`, `meals`/`Meals`); a typo “Taxii”.	Cleaning = consistency + flagging typos, not guess-correcting them.	”Are your categories one form now? And did it flag ‘Taxii’ or quietly change it?”
Oversight catch - dates	Mixed date formats - most ISO, one “12/03/26”.	Spec should pin one date format; the AI must convert visibly.	-
Oversight catch - currency (the big one)	Mixed currencies: GBP / USD / EUR.	The AI will often sum across currencies or invent a conversion rate. Correct behaviour: sub-total per currency, no blended total, no made-up FX.	”That single total - is it adding pounds, dollars and euros together?”

Scenario 3 - `sample-requests.csv`

	Planted element	What it teaches	Scout nudge
Privacy bait	REQ-003: “Priya Nair, mobile 07700 900142”; REQ-009: “employee ID 48817.”	Contact details and ID numbers must be FLAGGED and stripped, never echoed into the triage output.	”Search your output for that phone number - is it in there? It shouldn’t be.”
Oversight catch - under-rated High	REQ-007 (out of printer paper, board pack due tomorrow AM) and REQ-011 (heating off two days, people in coats) are genuinely High.	Urgency lives in the consequence and deadline, not in capital letters. The AI frequently under-rates these because the wording is calm.	”No paper before tomorrow’s board pack - is that really Medium?”
Oversight catch - over-rated / correctly Low	REQ-002, REQ-006, REQ-010 are genuinely Low (“when there’s a chance,” “whenever convenient,” “order more markers”).	Recency and politeness aren’t priority. Don’t inflate.	”Anything rated High here that’s actually a ‘whenever’ job?”

Note: REQ-005 (“URGENT, client demo at 2pm”) is legitimately High - it’s the control case. The teaching point is that the AI should reach the same High for REQ-007/011 without the word URGENT.

Scenario 4 - `sample-docs/`

	Planted element	What it teaches	Scout nudge
Privacy bait	”Sandra Whitfield” (policy-note); “Brightwave Ltd” (vendor-update).	Names and vendor anonymised before pasting.	”Who’d you swap out before pasting the pack?”
Oversight catch - the CONTRADICTION (whole point of this scenario)	`policy-note.txt`: go-live is fixed 1 June, “will not move.” `vendor-update.txt`: integration testing can’t start until 2 June, readiness 9 June. `risk-extract.txt` R-15 explicitly flags the clash and says “Needs escalation.”	A good summary must surface this conflict as an open question - not pick the confident source and state “go-live 1 June” as settled fact. The three docs are designed so the truth only appears when you cross-reference them.	”Your brief says go-live is 1 June - does the vendor agree? What does R-15 say?”

The failure mode to watch for: a “just summarise these three documents” prompt almost always reports a clean “go-live 1 June” and buries or omits the vendor’s 9-June readiness, because the policy note is the most authoritative-sounding source. The spec-first fix is a dedicated “Conflicts / open questions” section plus an explicit “do not resolve contradictions by choosing the more confident source” instruction. If a team’s brief surfaces the date clash, they’ve nailed Way #2. Call it out loudly.

Built at Innovation Day. The five ways of working: ways-of-working.md. Skill template: SKILL-template.md.

Source: docs/innovation-day/build-a-skill/scenarios.md

Build-a-Skill - The Four Scenarios

How the session runs

Scenario 1 - Weekly status report (the guided build)

Scenario 2 - Expense report cleanup

Scenario 3 - Incoming request triage

Scenario 4 - Multi-document summary

Wrap-up

Facilitator only - planted elements

Scenario 1 - sample-meeting-notes.txt

Scenario 2 - sample-expenses.csv

Scenario 3 - sample-requests.csv

Scenario 4 - sample-docs/

Scenario 1 - `sample-meeting-notes.txt`

Scenario 2 - `sample-expenses.csv`

Scenario 3 - `sample-requests.csv`

Scenario 4 - `sample-docs/`