AI VENDOR RFP QUESTIONS

30 AI Vendor RFP Questions for Manufacturing Ops

By Jason Osajima — former VP of AI at a $250M manufacturer ·
Quick answer

30 AI vendor RFP questions for manufacturing ops, grouped by category, with the answers that separate shippers from demo shops.

Most AI vendor RFP questions are written by procurement and answered by sales, which is why they predict nothing about whether the agent ships. The standard template asks about uptime, certifications, and the model's context window. None of that tells you if a vendor can get an agent into your order queue and used by a CSR within 30 days. I was VP of AI at a $250M furniture manufacturer and read enough vendor responses to know the gap. These are the AI vendor RFP questions that actually separate the firms who ship from the ones who demo and disappear — grouped by what you're really trying to find out.

For each, I've noted the answer you want and the dodge that should worry you. Copy these straight into your RFP.

Domain and track record (questions 1-5)

You're testing whether they've done this in a setting like yours, not a B2C chatbot.

  1. Name an agent you shipped into manufacturing or distribution ops, the workflow, and the metric it moved. Want: specifics. Dodge: generic enterprise logos with no workflow named.
  2. What broke during that deployment and how did you catch it? Want: candid edge-case stories. Dodge: "it went smoothly."
  3. Which ERP/MES/WMS systems have you integrated with? Want: your stack, named. Dodge: "we integrate with everything."
  4. Give me two ops-leader references I can call who'll speak candidly. Want: live contacts. Dodge: case-study PDFs only.
  5. What manufacturing workflows do you decline to build for? Want: honest limits. Dodge: "we can do anything."

Time to value (questions 6-9)

You're testing whether they ship fast or hide in discovery.

  1. How long until one agent is live on a real workflow? Want: ~30 days. Dodge: a quarter of "discovery."
  2. What's the first paid milestone tied to? Want: a live agent. Dodge: a deliverables list.
  3. What do you need from us to hit that, and when? Want: a tight, specific list. Dodge: "full data access" with no scope.
  4. Walk me through your last project's timeline, week by week. Want: a real Gantt with a live date. Dodge: vague phases.

Evals and accuracy (questions 10-14)

This is where demo shops fall apart. You're testing for measurement discipline.

  1. How do you measure accuracy on our data before a user touches the agent? Want: evals on 100+ of your historical cases. Dodge: model benchmarks.
  2. What accuracy threshold do you ship at, and who sets it? Want: a number, agreed with you. Dodge: "it's very accurate."
  3. How do you handle the cases the agent gets wrong? Want: review gates, fallbacks, logging. Dodge: silence.
  4. Can I see an eval report from a past project? Want: a real, redacted one. Dodge: "we don't share those."
  5. How do you detect accuracy drift after launch? Want: ongoing monitoring. Dodge: "set it and forget it."

Integration and architecture (questions 15-19)

You're testing whether the agent lives in the workflow or beside it.

  1. Does the agent write back to our systems, or only read? Want: read and write. Dodge: read-only dashboard.
  2. Where does the agent surface — inside our existing tools or a new app? Want: embedded in the ERP/queue/email. Dodge: separate login.
  3. How do you handle our data formats — the malformed POs, the legacy SKUs? Want: a real plan. Dodge: "clean data required."
  4. What's your rollback plan if an integration breaks production? Want: a tested one. Dodge: improvisation.
  5. What happens to the agent if your platform goes down? Want: graceful degradation. Dodge: hard failure.

Guardrails and human-in-the-loop (questions 20-22)

You're testing whether they protect trust on high-stakes steps.

  1. Which steps run autonomously and which require human review? Want: review gates on anything customer-facing or compliance-related. Dodge: full autonomy by default.
  2. How does a user override or correct the agent? Want: a built-in path. Dodge: "they file a ticket."
  3. What's your guardrail against a confidently wrong output reaching a customer? Want: layered checks. Dodge: "the model is reliable."

Adoption and ownership (questions 23-26)

The 95% of pilots that fail, fail here. You're testing for an adoption plan.

  1. What's your plan to get our team to actually use this daily? Want: a real change plan with a named champion. Dodge: "we deliver, you adopt."
  2. How do you track adoption and usage after launch? Want: usage metrics, weekly. Dodge: "that's on you."
  3. What single business metric will this move, and how do we baseline it? Want: hours/errors/deflection, measured first. Dodge: "deployed = success."
  4. What happens to adoption when your team leaves? Want: knowledge transfer + an internal owner. Dodge: ongoing dependency.

Commercial and exit (questions 27-30)

You're testing for forecastable cost and freedom to leave.

  1. Give me total year-one cost within 20%, including integration. Want: a real number. Dodge: "depends on usage."
  2. Does our data train your models? Where does it live, and for how long? Want: explicit no on training, clear retention. Dodge: vague terms.
  3. If we leave, can we export and run what you built? Want: yes, with config and data. Dodge: total lock-in.
  4. Will you do a scoped paid proof on one of our workflows before a full contract? Want: yes. Dodge: "only after the master agreement."

How to score the responses

Don't average. Use a knockout rule. Any vendor who dodges questions 10, 15, 20, or 23 — evals, write-back integration, human-in-the-loop, adoption — is out, regardless of how strong the rest looks. Those four are the load-bearing walls. A vendor strong everywhere else but hollow on those will deliver a demo that dies in pilot, which is exactly what you're trying to avoid.

Category Knockout question Why it's load-bearing
Evals #10 No measured accuracy on your data = blind launch
Integration #15 Read-only = insights nobody acts on
Guardrails #20 No human-in-the-loop = one bad output kills trust
Adoption #23 No adoption plan = the 95% failure mode

Skip the RFP theater — test on your own work

The fastest way to use these AI vendor RFP questions is to make a vendor answer them by doing, not writing. Send me one workflow your team wishes ran itself, and I'll build a working agent on it and screen-record the result — evals, integration, guardrails, and all. Or book a call and we'll run the First 5 Agents teardown so you know exactly which workflows to put in the RFP first.

Let's see what's worth building first.

A 15-minute call: tell me where your AI or planning is stuck, and I'll tell you the one thing worth building first — and whether it's worth doing at all.

More field notes

Integrating AI Agents With Your ERP and MESConnecting AI Agents to Legacy Manufacturing SystemsData Readiness for AI in Manufacturing: A ChecklistAgentOps: Monitoring AI Agents in Production