SKILL: Clean and standardise a messy expenses CSV

A worked Skill from Innovation Day. Built spec-first (Way #1), with a privacy check before any data is pasted (Way #3) and a human verifying what the AI flagged before anything is deleted (Way #2).


1. Goal - what this Skill does, and when to use it

Take an exported expenses CSV that’s full of inconsistent categories, mixed date formats and the odd typo, and turn it into a clean, consistent table - with the dodgy rows flagged, not silently fixed. Use it before you submit a claim batch or reconcile a month.

It replaces the squinting-at-spreadsheet-cells pass everyone does by hand.

2. Inputs - what you paste in for it to work

  • Input 1: the expenses CSV (columns: date, description, category, amount, currency, notes).
  • Input 2: your team’s fixed category list (see the prompt - using a fixed list is what stops the AI inventing new categories).

PRIVACY CHECK (Way #3): expense exports leak personal data in the notes column.

In the fileDo this
[email protected] (a personal email in a notes cell)Remove it before pasting. Personal email = PII.
Any name, claimant ID, or “sent receipt to…” trailStrip it - the AI doesn’t need it to clean dates and categories.

Better still - don’t paste the data at all. See the stretch path in section 4: ask the AI for a formula or script, then run it on your own machine. The real data never leaves your laptop. That’s the strongest version of Way #3.

3. Output - what “good” looks like (write this BEFORE the prompt)

A cleaned table with exactly these columns:

ColumnRule
Date (ISO)every date as YYYY-MM-DD; fix any that aren’t
Categorymapped to ONE of the fixed list - never a new value
Amountunchanged numeric value
Currencyunchanged (GBP/USD/EUR) - do NOT convert
Flagsa short note on anything suspect: duplicate, typo, format fixed, PII spotted

Below the table: a short Flags summary listing duplicates, typos and anything needing a human decision. Nothing is deleted - suspect rows stay in, flagged.

4. Instructions / Prompt - the exact text to paste

You are helping me clean an expenses CSV. Do NOT delete any rows.

Output a table with these exact columns:
Date (ISO) | Category | Amount | Currency | Flags

Rules:
- Convert every date to ISO format YYYY-MM-DD. If a date is ambiguous
  (e.g. 12/03/26), flag it as "date format assumed DD/MM/YY - confirm".
- Map every category to ONE of this fixed list ONLY:
  Travel, Meals, Accommodation, Training, Equipment, Software, Office.
  Fix casing/variants (e.g. "travel" -> Travel, "meals" -> Meals).
  If something doesn't fit the list, put it in Flags - do not invent a category.
- Leave Amount and Currency exactly as they are.
- DO NOT convert currencies and DO NOT sum amounts across currencies.
- In Flags, mark: likely duplicates, spelling typos, any date you reformatted,
  and any personal data you notice (emails, names). Flag duplicates - do not remove them.

After the table, give a short "Flags summary" listing every duplicate, typo,
and anything that needs a human to decide.

CSV:
[PASTE CSV WITH PERSONAL DATA REMOVED]

Stretch path (the privacy win) - ask for a script, run it yourself. Instead of pasting the file, paste only the header row and ask:

Here is the header of my expenses CSV: date,description,category,amount,currency,notes
Write me an Excel formula (or a short Python/Power Query script) that:
- converts the date column to ISO YYYY-MM-DD,
- standardises the category column to this fixed list: Travel, Meals,
  Accommodation, Training, Equipment, Software, Office,
- adds a "Flags" column marking duplicate rows and rows with mixed currencies.
Explain how to run it. Do not ask me to paste my data.

You run the formula/script on your own machine. The actual expense rows - and that personal email - never touch the chatbot. Same result, zero data exposure.

5. Worked Example - one real run, anonymised

Input (a few rows, personal email already stripped):

12/03/26,taxi airport,travel,42.00,GBP,
2026-03-09,Monitor stand,Equipment,45.99,GBP,
2026-03-09,Monitor stand,Equipment,45.99,GBP,
2026-03-12,Software licence (annual),Software,180.00,USD,
2026-03-14,Taxii to office,Travel,9.80,GBP,
2026-03-16,Client dinner,Meals,156.00,EUR,Madrid office visit

Output it produced:

Date (ISO)CategoryAmountCurrencyFlags
2026-03-12Travel42.00GBPReformatted from 12/03/26 - assumed DD/MM/YY, confirm
2026-03-09Equipment45.99GBPPossible duplicate (see next row)
2026-03-09Equipment45.99GBPPossible duplicate of “Monitor stand” above - keep both, human to decide
2026-03-12Software180.00USDNon-GBP currency
2026-03-14Travel9.80GBPTypo in description: “Taxii” → likely “Taxi”
2026-03-16Meals156.00EURNon-GBP currency

Flags summary

  • Duplicate: two identical “Monitor stand” rows on 2026-03-09 (£45.99 each). Both kept - a human decides whether it’s a genuine double purchase or a double-entry. (The AI did NOT delete one - correct behaviour.)
  • Typo: “Taxii to office” → “Taxi”.
  • Ambiguous date: 12/03/26 reformatted assuming DD/MM/YY → 2026-03-12.
  • Mixed currencies: rows in GBP, USD and EUR. Not converted and not summed - totalling across currencies would be wrong.
  • PII reminder: a personal email was present in the original notes and was removed before pasting. (Verify your own file before sharing.)

The catch (Way #2): the two failure modes here are silent ones. A naive “tidy this up” prompt will (a) delete one of the duplicate monitor-stand rows - destroying data you needed to see - and (b) helpfully convert and sum the USD/EUR rows into one GBP total that’s quietly wrong. The fixed prompt forbids both. Always eyeball the duplicate decision and never trust a single total that mixes currencies.

6. Guardrails - how to stay safe and sane

  • Privacy (Way #3): strip personal emails/names from the notes column first - or use the stretch path and don’t paste the data at all.
  • Human check (Way #2): you decide what happens to flagged duplicates. Never let the AI delete rows. And never trust a total that mixes GBP/USD/EUR - that’s a human (or a proper FX step) job.
  • Limits: this standardises format, not truth. It won’t know a £156 “client dinner” breaches policy, or that a 0.00 row is a free webinar. It cleans; you judge.

7. Reuse notes - so it isn’t lost

  • Owner: (your name)
  • Last updated: (date)
  • Lives in: Finance/Ops shared drive → Skills folder. Pin the fixed category list next to it so everyone maps the same way - Way #5.

Built at Innovation Day. The five ways of working: ways-of-working.md.

Downloads for this session

Grab the templates and sample files used here.