HOW AI AGENTS WORK

How AI Agents Work on the Plant Floor (Explained)

By Jason Osajima — former VP of AI at a $250M manufacturer · Updated June 2026

Quick answer

How AI agents work on the plant floor, explained by an operator: the perceive-decide-act loop, where agents fit, and what they can't do yet.

Most explanations of how AI agents work start with a diagram of neural networks and end with nothing you can use on Monday. Here's the version a plant manager actually needs. An AI agent is software that watches a stream of data, decides what to do next based on a goal you gave it, takes an action through systems you already run, and checks whether the action worked. That's the whole loop. The interesting part isn't the model. It's that the agent closes the loop without a person clicking the button.

I ran this at a $250M manufacturer. We didn't start with anything exotic. We started with a scheduler that kept getting overridden at 6am because the night shift logged a downtime event nobody saw until standup. An agent that reads the MES event log, flags the conflict, and re-sequences the next four jobs before the morning meeting isn't magic. But it saved us roughly 40 minutes a day of expediting and one missed customer ship per month. That's the bar. Real, boring, measurable.

The four-step loop, in plant terms

Every agent, no matter how it's marketed, runs the same cycle:

Perceive — it pulls data. MES events, ERP work orders, a SCADA tag, an email from a supplier, a PDF packing slip. The agent reads the current state of the world.
Decide — it compares that state against a goal ("keep line 3 above 85% OEE," "don't let any PO go past due without a flag") and picks a next action. This is where the language model reasons.
Act — it does something. Updates a field in the ERP, sends a Teams message, drafts a reply, opens a ticket, re-sequences a job. The action runs through an API or an RPA bot into a system you control.
Check — it reads the result. Did the PO update stick? Did the line recover? If not, it tries again or escalates to a human.

The difference between an agent and the chatbot your team already pastes things into is the Act and Check steps. A chatbot answers. An agent does the thing and confirms it landed.

What makes it an "agent" and not just automation

You already have automation. PLCs, fixed RPA scripts, scheduled reports. Those follow rules you hard-coded. They break the moment reality drifts off the script — a vendor renames a column, a form gets an extra field, a supplier writes "qty" instead of "quantity."

An agent handles the drift. Because the reasoning step uses a language model, it can read a packing slip it's never seen before, figure out which number is the quantity, and map it to your PO. When it's not sure, it asks. That tolerance for messy, unstructured, real-world input is the actual unlock — and the plant floor is nothing but messy input.

	Fixed RPA / scripts	AI agent
Input	Structured, exact format	Messy, unstructured, varies
Breaks on change	Yes, silently	Adapts or asks
Handles a new vendor form	Needs a developer	Often handles it day one
Knows when it's unsure	No	Yes — escalates
Build time	Weeks per workflow	Days
Best for	High-volume, never-changes	Variable, judgment-light

Neither is better. They're different tools. The agent shines exactly where your scripts keep falling over.

Where agents actually fit first

Don't start with the moonshot. Start where you're already paying people to move data between two screens. The highest-return first agents I've seen across mid-market plants:

Order acknowledgment and entry — reading customer POs (PDF, email, EDI) and entering them into the ERP. A clerk doing 60 a day at 4 minutes each gets cut to a 30-second review.
Supplier follow-up — an agent that watches open POs, emails late vendors, parses their replies, and updates the promised date. Removes the "who's chasing this?" gap.
Downtime triage — reading MES fault codes, grouping them, and drafting the morning report with the top three loss buckets already ranked.
Quality NCR drafting — turning an inspector's three-line note plus the spec into a structured nonconformance record.
Shipping doc assembly — pulling the BOL, packing list, and cert of conformance into one packet per shipment.

Notice what these share: high volume, clear right answer, a human can verify the output in seconds, and a mistake is annoying but not catastrophic. That's the screening rule. If a single agent error could stop the line or ship bad product unchecked, that workflow waits until you've earned trust.

The human stays in the loop (on purpose)

Nobody serious runs a plant agent fully unattended on day one. You run it in three stages:

Shadow — the agent does the work and shows you what it would do. You compare against your team for two weeks. You're measuring its accuracy, not trusting it yet.
Approve — the agent drafts the action, a person clicks yes. You watch the approve rate climb. When it's catching 95%+ correctly, you move on.
Auto with exceptions — the agent acts on the clear cases and only routes the genuinely ambiguous ones to a person. That last 5% is where your people add value now.

This staging is also how you keep finance and quality comfortable. You're not asking them to trust a black box. You're showing them a measured accuracy number before anything goes live.

What agents can't do yet

Straight talk, because the hype skips this. Agents are weak where the cost of being wrong is high and the answer is genuinely judgment-heavy: pricing exceptions, safety calls, anything regulatory where you need a defensible audit trail of why. They also degrade quietly — an agent that was 96% accurate can slip to 88% when a supplier changes their format, and you won't notice unless you're tracking accuracy as a metric, not a vibe. Build the monitoring before you build trust. And they cost real money per action; a workflow that runs 50,000 times a month needs a unit-economics check, not just an accuracy check.

The takeaway for an ops leader

How AI agents work on the plant floor comes down to one shift: software that doesn't just answer, it acts and checks its own work, and it tolerates the mess your existing scripts can't. The technology is ready for the boring, high-volume, judgment-light gaps between your systems. It's not ready to run the plant. Start where a clerk is retyping data, stage the trust, measure accuracy as a hard number.

Want to see which five workflows in your plant are the right first agents? We run a free First 5 Agents teardown — you walk us through your day, we map the five highest-return, lowest-risk candidates with rough hours saved on each. Book a 30-minute call and you'll leave with a ranked list whether or not we ever work together.

Let's see what's worth building first.

A 15-minute call: tell me where your AI or planning is stuck, and I'll tell you the one thing worth building first — and whether it's worth doing at all.

Book a 15-min call →More field notes

More field notes

Agentic Automation Glossary for Manufacturers The AI Pilot-to-Production Gap: Why 90% Stall How to Scale an AI Pilot to Production in Manufacturing Why AI Pilots Fail at Manufacturers (and Fixes)