Cassette-based document import automation
How an operations team stopped rebuilding the same logic every time a new supplier appeared
“The same cassette applies the same logic every time — which is exactly what you need when accuracy is the whole point.”
— Operations Lead, water risk & compliance platform
The client
A market-leading Dutch SaaS platform for water risk and compliance, managing thousands of locations across the Netherlands and the UK — its value depends entirely on the quality of the asset data inside it.
The challenge
New customers arrive with documents, not clean data. Risk assessment reports from previous suppliers land in every format — PDF, DOCX, CSV, XLSX — each structured differently: asset names, floor conventions, water classifications all inconsistent, all requiring manual mapping before anything can enter the platform. At scale, that doesn't get easier as the customer base grows — it gets slower and more error-prone.
Solution
I designed and built Technical Import, a browser-based internal tool built around a cassette architecture. Each cassette encodes the extraction and transformation logic for one supplier, document type, and country variant. A shared runtime handles parsing, normalisation, and export — producing a structured XLSX aligned to the platform's import schema every time.
An operator picks the relevant cassette, uploads the supplier document, previews the extracted assets, edits site information if needed, and exports — staying in control of validation without doing the extraction by hand. At delivery: 16 cassette definitions covering suppliers across the UK and Netherlands, with input support for PDF, DOCX, CSV, XLSX, and XLS through one common pipeline.
Why it works
Supplier-specific logic is isolated, documented, and versioned inside each cassette — a quirk in one supplier's format can't bleed into another import. Every cassette's logic is readable and auditable on its own. When the team previews extracted assets before export, they're reviewing a deterministic output, not guessing whether the mapping ran correctly.
Going further
Building new cassettes was the remaining bottleneck — each one meant analysing a supplier document, identifying fields, writing mapping logic, and validating against a reference. I built an AI-assisted workflow that speeds this up: given a sample document and a target schema, it analyses structure, proposes field mappings, drafts a cassette, and flags ambiguous cases for human review. What used to take hours of careful engineering now takes a fraction of the time, with the engineer focused on validation, not extraction.
Outcomes
Import prep
No manual field mapping — a clear preview step before anything reaches the import pipeline.
Consistency
Same cassette, same logic, every run — no variance from person to person.
Scalability
Every new supplier format benefits immediately from the shared runtime and AI-assisted cassette creation.
Engineering lift
Cassette creation goes from an engineering bottleneck to a routine step — supplier onboarding scales without growing the team.
If supplier onboarding runs on manual mapping and tribal knowledge, that's a pattern I've solved before — happy to talk through what a cassette-style approach would look like for your team.
Get in touch →