Platform Engineer, Observability
- Core Platform
- Remote (global)
- Full-time
- Senior (5–8 years)
- Senior
Build the telemetry layer that makes a client's entire agent fleet visible in real time — every message, every gate decision, every plan step, queryable in milliseconds.
What you're signing up for
Nothing should run in the dark. You will own the observability discipline we engineer into client systems — structured telemetry from every surface of an agentic system, including agent messages, policy-gate decisions, plan-step completions, and health signals — made queryable in real time. This is a systems-heavy role focused on high-cardinality ingestion, fast aggregation, and the operator-facing tooling that turns raw signal into fleet control.
The work
- Design and build the telemetry pipeline we ship: ingestion, schema, storage, and retention for high-cardinality agent data.
- Build real-time trace and aggregation surfaces so operators can drill from fleet to a single agent decision.
- Define the structured event model shared across policy, planning, and security.
- Own performance: keep p95 query latency low as the fleet and event volume grow.
- Build the control hooks that let operators pause, steer, or inspect any agent or plan from the telemetry.
You bring
- 5+ years in platform, data, or infrastructure engineering with production ownership.
- Experience with time-series or columnar stores (ClickHouse, Prometheus, OpenSearch) at scale.
- Strong Go, Rust, or Python; comfort with streaming systems (Kafka, Flink, or similar).
- A track record of debugging performance in distributed pipelines under load.
- Care for the operator experience — telemetry is only useful if someone can act on it.
Bonus signal
- Experience with OpenTelemetry instrumentation and semantic conventions.
- Background in Grafana, dashboarding, or alerting systems.
- Familiarity with agent or LLM tracing.
What you'll work with
What we offer
- Competitive salary and equity.
- Fully remote, async-first.
- Hardware budget and home-office stipend.
- Dedicated time for platform polish and tooling.
Our stack
Go · ClickHouse · OpenTelemetry · Kafka
How we interview
A short application, a 30-minute intro call, one focused technical session on a real problem, and a fast decision. No trick questions, no whiteboard theater.
Apply for Platform Engineer, Observability.
No cover-letter theater. A few fields, your résumé, and links to work that shows how you think. A real engineer reads every application.
There may be a better surface for you.
Browse the rest of the open roles, or tell us what you actually want to build.