# SKILL: Clean and standardise a messy expenses CSV

> A worked Skill from Innovation Day. Built spec-first ([Way #1](../../ways-of-working.md)), with a privacy check before any data is pasted ([Way #3](../../ways-of-working.md)) and a human verifying what the AI flagged before anything is deleted ([Way #2](../../ways-of-working.md)).

---

## 1. Goal - what this Skill does, and when to use it

Take an exported expenses CSV that's full of inconsistent categories, mixed date formats and the odd typo, and turn it into a clean, consistent table - **with the dodgy rows flagged, not silently fixed.** Use it before you submit a claim batch or reconcile a month.

It replaces the squinting-at-spreadsheet-cells pass everyone does by hand.

## 2. Inputs - what you paste in for it to work

- **Input 1:** the expenses CSV (columns: date, description, category, amount, currency, notes).
- **Input 2:** your team's fixed category list (see the prompt - using a fixed list is what stops the AI inventing new categories).

> **PRIVACY CHECK** ([Way #3](../../ways-of-working.md)): expense exports leak personal data in the *notes* column.
>
> | In the file | Do this |
> |---|---|
> | `j.okoro@personalmail.com` (a personal email in a notes cell) | **Remove it before pasting.** Personal email = PII. |
> | Any name, claimant ID, or "sent receipt to…" trail | Strip it - the AI doesn't need it to clean dates and categories. |
>
> **Better still - don't paste the data at all.** See the stretch path in section 4: ask the AI for a *formula or script*, then run it on your own machine. The real data never leaves your laptop. That's the strongest version of [Way #3](../../ways-of-working.md).

## 3. Output - what "good" looks like *(write this BEFORE the prompt)*

A cleaned table with exactly these columns:

| Column | Rule |
|---|---|
| **Date (ISO)** | every date as `YYYY-MM-DD`; fix any that aren't |
| **Category** | mapped to ONE of the fixed list - never a new value |
| **Amount** | unchanged numeric value |
| **Currency** | unchanged (GBP/USD/EUR) - **do NOT convert** |
| **Flags** | a short note on anything suspect: duplicate, typo, format fixed, PII spotted |

Below the table: a short **Flags summary** listing duplicates, typos and anything needing a human decision. **Nothing is deleted** - suspect rows stay in, flagged.

## 4. Instructions / Prompt - the exact text to paste

```
You are helping me clean an expenses CSV. Do NOT delete any rows.

Output a table with these exact columns:
Date (ISO) | Category | Amount | Currency | Flags

Rules:
- Convert every date to ISO format YYYY-MM-DD. If a date is ambiguous
  (e.g. 12/03/26), flag it as "date format assumed DD/MM/YY - confirm".
- Map every category to ONE of this fixed list ONLY:
  Travel, Meals, Accommodation, Training, Equipment, Software, Office.
  Fix casing/variants (e.g. "travel" -> Travel, "meals" -> Meals).
  If something doesn't fit the list, put it in Flags - do not invent a category.
- Leave Amount and Currency exactly as they are.
- DO NOT convert currencies and DO NOT sum amounts across currencies.
- In Flags, mark: likely duplicates, spelling typos, any date you reformatted,
  and any personal data you notice (emails, names). Flag duplicates - do not remove them.

After the table, give a short "Flags summary" listing every duplicate, typo,
and anything that needs a human to decide.

CSV:
[PASTE CSV WITH PERSONAL DATA REMOVED]
```

> **Stretch path (the privacy win) - ask for a script, run it yourself.** Instead of pasting the file, paste only the *header row* and ask:
> ```
> Here is the header of my expenses CSV: date,description,category,amount,currency,notes
> Write me an Excel formula (or a short Python/Power Query script) that:
> - converts the date column to ISO YYYY-MM-DD,
> - standardises the category column to this fixed list: Travel, Meals,
>   Accommodation, Training, Equipment, Software, Office,
> - adds a "Flags" column marking duplicate rows and rows with mixed currencies.
> Explain how to run it. Do not ask me to paste my data.
> ```
> You run the formula/script on your own machine. The actual expense rows - and that personal email - never touch the chatbot. Same result, zero data exposure.

## 5. Worked Example - one real run, anonymised

> **Input (a few rows, personal email already stripped):**
>
> ```
> 12/03/26,taxi airport,travel,42.00,GBP,
> 2026-03-09,Monitor stand,Equipment,45.99,GBP,
> 2026-03-09,Monitor stand,Equipment,45.99,GBP,
> 2026-03-12,Software licence (annual),Software,180.00,USD,
> 2026-03-14,Taxii to office,Travel,9.80,GBP,
> 2026-03-16,Client dinner,Meals,156.00,EUR,Madrid office visit
> ```
>
> **Output it produced:**
>
> | Date (ISO) | Category | Amount | Currency | Flags |
> |---|---|---|---|---|
> | 2026-03-12 | Travel | 42.00 | GBP | Reformatted from `12/03/26` - assumed DD/MM/YY, confirm |
> | 2026-03-09 | Equipment | 45.99 | GBP | **Possible duplicate** (see next row) |
> | 2026-03-09 | Equipment | 45.99 | GBP | **Possible duplicate** of "Monitor stand" above - keep both, human to decide |
> | 2026-03-12 | Software | 180.00 | USD | Non-GBP currency |
> | 2026-03-14 | Travel | 9.80 | GBP | Typo in description: "Taxii" → likely "Taxi" |
> | 2026-03-16 | Meals | 156.00 | EUR | Non-GBP currency |
>
> **Flags summary**
> - **Duplicate:** two identical "Monitor stand" rows on 2026-03-09 (£45.99 each). Both kept - **a human decides** whether it's a genuine double purchase or a double-entry. *(The AI did NOT delete one - correct behaviour.)*
> - **Typo:** "Taxii to office" → "Taxi".
> - **Ambiguous date:** `12/03/26` reformatted assuming DD/MM/YY → 2026-03-12.
> - **Mixed currencies:** rows in GBP, USD and EUR. **Not converted and not summed** - totalling across currencies would be wrong.
> - **PII reminder:** a personal email was present in the original notes and was removed before pasting. *(Verify your own file before sharing.)*
>
> **The catch ([Way #2](../../ways-of-working.md)):** the two failure modes here are *silent* ones. A naive "tidy this up" prompt will (a) **delete** one of the duplicate monitor-stand rows - destroying data you needed to see - and (b) helpfully **convert and sum** the USD/EUR rows into one GBP total that's quietly wrong. The fixed prompt forbids both. Always eyeball the duplicate decision and never trust a single total that mixes currencies.

## 6. Guardrails - how to stay safe and sane

- **Privacy** ([Way #3](../../ways-of-working.md)): strip personal emails/names from the notes column first - or use the stretch path and don't paste the data at all.
- **Human check** ([Way #2](../../ways-of-working.md)): you decide what happens to flagged duplicates. Never let the AI delete rows. And never trust a total that mixes GBP/USD/EUR - that's a human (or a proper FX step) job.
- **Limits:** this standardises *format*, not *truth*. It won't know a £156 "client dinner" breaches policy, or that a 0.00 row is a free webinar. It cleans; you judge.

## 7. Reuse notes - so it isn't lost

- **Owner:** *(your name)*
- **Last updated:** *(date)*
- **Lives in:** Finance/Ops shared drive → *Skills* folder. Pin the fixed category list next to it so everyone maps the same way - [Way #5](../../ways-of-working.md).

---

*Built at Innovation Day. The five ways of working: [ways-of-working.md](../../ways-of-working.md).*
