A lot of the most valuable agentic work lives behind a firewall that nothing crosses: oil and gas engineering, defense, healthcare records, finance. "Send it to a frontier model API" is a non-starter. The good news is that on-prem agentic AI is not a downgrade — it is an architecture. This is the reference we run for a regulated engineering customer whose documents legally cannot leave their environment.
The constraint shapes the stack
The rule is simple and absolute: no data egress. That single constraint cascades into every choice — models, storage, telemetry, and updates all live inside the boundary. So we build entirely on open-source components that run locally.
- Document understanding: PaddleOCR for text and layout; Table Transformer for cell-level table structure; a fine-tuned YOLOv10 for the engineering markup OCR misses.
- Reasoning: a local vision-language model (Qwen2.5-VL class) as a tiebreaker, never a hosted API.
- Retrieval: FAISS for vectors, OpenSearch for full-text and tag-centric traceability.
- The app: a typed service layer, an append-only audit log, SSO via OIDC/SAML, RBAC plus department-scoped ABAC.
Identity and audit are not optional on-prem
On-prem does not mean "trusted by default." If anything, regulated environments demand more traceability, not less. Every agent gets a scoped identity; every action lands in an immutable log; every decision is traceable per equipment tag across every document revision.
$ bytevon agent inspect --id agent:doc-reviewer --env on-prem
identity: spiffe://site/agents/doc-reviewer
egress: DENY ALL (air-gapped)
data-scope: dept=engineering classification<=internal
audit: append-only · tag-traceable
✓ POSTURE — compliant · no egress path
The part people underestimate
The hard constraint is not running a model locally. It is running it under the same governance — identity, policy, audit — that a cloud deployment would have, with none of the managed services that usually provide it.
You rebuild the control plane yourself: the policy gate, the audit store, the access model, the telemetry. That is most of the engineering. The model is a component; the environment around it is the product. Get that right and a regulated team gets agentic leverage without a single byte leaving the building — which, for them, is the only version that was ever going to ship.