Data Readiness for AI in Manufacturing: A Checklist
Data readiness for AI in manufacturing — a practical checklist for COOs. What actually has to be true before an agent works, and what's a myth.
The phrase data readiness for AI gets used as a reason to delay. "We can't do agents until our data is clean." I've heard it in every plant I've walked. It's half right. Your data does need to clear a bar — but the bar is far lower and far more specific than the consultants telling you to spend two years on a data lake first would like you to believe. I ran AI at a $250M furniture manufacturer with data that was, charitably, a mess. We shipped anyway. Here's the actual checklist.
The core misunderstanding is treating data readiness as one giant binary state your whole company has to reach. It isn't. Readiness is per use case. The data needed for a supplier-document agent has nothing to do with the data needed for a demand-planning agent. You don't get your data ready. You get the data for one agent ready, ship it, then do the next.
The five-question readiness check
For any agent you're considering, run its data through these five questions. This is the whole framework. If a use case passes, build it. If it fails on a question, you know exactly what to fix instead of waving at "data quality" in the abstract.
- Accessible — Can software read this data without a human typing? (DB query, API, scheduled export.) If a person has to copy-paste it, the agent can't use it.
- Complete enough — Are the fields the agent needs actually populated, most of the time? Not perfect. Most of the time.
- Consistent — Is an entity represented the same way across records? One spelling per supplier, one format per part number, consistent units.
- Current — Is the data fresh enough for the decision? A planning agent tolerates yesterday's data. A live-status agent doesn't.
- Trustworthy — Do the people who'd use the agent already trust this data source? If ops doesn't believe the ERP's lead-time field today, an agent reading it won't change their mind.
Score each question 0-2 (no / partial / yes). A use case scoring 8+ out of 10 is ready to build. A 5-7 is buildable with scoping — narrow it to the clean subset. Below 5, fix the data first or pick a different agent.
What readiness does NOT require
Just as important is killing the myths that cause the two-year delay:
- You do not need a data lake or warehouse. Nice to have. Not required. An agent reading a read replica or a nightly export works fine. We ran production agents with no central data platform at all.
- You do not need perfect data. You need data good enough that the agent's answers beat the status quo, which is often a human guessing or reading a stale report. The bar is "better than today," not "flawless."
- You do not need all your data. You need the slice one agent uses. Ignore the other 95% until an agent needs it.
- You do not need a master data management program first. MDM is a worthy multi-year effort. It is not a prerequisite for a supplier-lookup agent. Don't let the big project block the small win.
The instinct to fix everything first is how AI initiatives spend eighteen months and ship nothing.
Readiness by use case
Different agents have wildly different data demands. Here's the realistic picture for the common first agents:
| Agent | Data needed | Typical readiness | Why |
|---|---|---|---|
| Supplier-doc intelligence | PDFs: specs, certs, datasheets, POs | High | Documents don't need clean structure — RAG handles unstructured text |
| Order-status / exception lookup | ERP order + status tables | Medium-high | Usually accessible; depends on consistency |
| Ops-review prep | ERP + BI exports | Medium | Needs joined data; tolerates batch latency |
| Demand / inventory Q&A | Planning + inventory data | Medium | Needs consistency and currency to be trusted |
| Quality / defect analysis | MES + inspection records | Lower | Often messy, free-text, inconsistent capture |
Notice the pattern: document-heavy agents are the easiest to make ready because unstructured text doesn't need the clean schema structured data does. A supplier-doc agent works on the PDFs sitting in a SharePoint folder right now. That's why it's often the best first build — the data is already "ready" by definition.
Structured-data agents demand more on consistency and currency. Quality and defect agents tend to fail the consistency check because shop-floor capture is inconsistent and free-text-heavy. That doesn't mean never. It means later, after you've cleaned that domain.
Fix data inside the pipeline, not the source
When a use case scores partial on consistency or completeness, the wrong move is a project to clean the source system. That's slow and political. The right move is to handle it in the data pipeline feeding the agent:
- Normalize in the staging layer. Map supplier-name variants, standardize units and dates, trim part-number junk — all in the pipe, leaving the source untouched.
- Make the agent honest about gaps. When data is missing or ambiguous, the agent says "I don't have that" or "two possible matches." An agent that admits uncertainty builds more trust than one that confidently guesses.
- Validate before output. Range checks and business rules catch obviously-wrong data before it reaches a user.
This keeps cleanup scoped to what one agent needs, and it compounds — each agent you build leaves that data domain a little cleaner for the next.
The readiness sequence that works
Don't run a company-wide data-readiness program. Run this loop:
- Pick a candidate agent. Score its data on the five questions.
- If it scores 8+, build it. If 5-7, narrow scope to the clean subset and build that. If under 5, pick a different agent or fix the specific failing dimension.
- Ship, measure, clean the data the next agent will need while this one runs.
Start with a document-heavy agent — supplier-doc intelligence is the usual winner — because its data is ready today. Get a win on the board in 30 days. Use that momentum and the cleaned data to tackle the structured-data agents next.
Data readiness for AI is real, but it's a checklist you run per agent, not a destination you reach before starting.
Want to know which of your agents are data-ready right now? Grab the free First 5 Agents teardown — I'll score your top five candidate agents against the five-question check and tell you which to build first and which need a data fix. Then book a 20-minute call and we'll find the one agent your data can support this month.
Let's see what's worth building first.
A 15-minute call: tell me where your AI or planning is stuck, and I'll tell you the one thing worth building first — and whether it's worth doing at all.