Agentic Infrastructure Engineer
Aionia is sourcing for a small number of confidential roles at frontier AI labs building at the boundary of what intelligent systems can do. This is not an application layer role. You will own the execution layer — the runtime infrastructure that determines how agents reason, act, fail, recover, and improve in production.
Where the Frontier Is Actually Being Built
Aionia is partnering with a frontier AI lab — one of a small number of organizations in the world building at the boundary of what intelligent systems can do. Not productizing existing models. Building the runtime infrastructure that determines how agents reason, act, and improve in production.
The organization operates with a founding team from the top of the AI research world, a mission that is foundational rather than commercial, and a small elite engineering team where every hire shapes the system architecture.
What You’ll Build
This is a systems engineering role at the frontier. You won’t be wiring up APIs. You’ll own the execution layer — the runtime infrastructure that makes intelligent agents reliable, observable, and continuously improving in production.
- Design and build agent harnesses that power different product and research experiences
- Build core runtime systems including execution frameworks and multi-model orchestration
- Develop control-plane logic for routing, planning, and tool invocation with strong safety guarantees
- Optimize agent systems for latency, reliability, and production correctness
- Analyze real-world failures and use data to drive iterative improvements
- Build and operate online experimentation and offline evaluation frameworks
- Improve observability, testing, and simulation systems for safe, measurable progress
- Create sandboxed environments where agents can act and self-validate safely
- Continuously adapt orchestration systems as model capabilities evolve
Stack & Tools
What They’re Looking For
- Strong experience building distributed systems or backend platforms in production environments
- A track record of improving system reliability, performance, and observability under real-world pressure
- Experience owning systems end-to-end — from design through production and iteration
- Comfort working in ambiguous, fast-moving environments with rapid iteration cycles
- Familiarity with experimentation, evaluation, or data-driven product improvement loops
- Ability to debug complex systems and identify root causes of failures — not just symptoms
- 3–15 years of backend or distributed systems engineering experience
- You’ve built or worked on agent harnesses, orchestration layers, or execution frameworks
- You think in terms of control planes, feedback loops, and system-level optimization — not just features
- You’re excited about diagnosing failure modes and iterating toward measurable improvements
- You care deeply about production quality — not just making systems work, but making them reliable, safe, and scalable
- You’re motivated by pushing the frontier of how intelligent systems behave in the real world
Why This Role Is Different
- You’re building the layer that makes AI reliable. Not above the model — around it. The execution layer is where reliability, safety, and capability actually meet.
- Research meets production. You’ll work directly alongside researchers and translate model capabilities into trustworthy systems that operate at scale.
- Small team, outsized impact. Every architectural decision you make shapes how intelligent systems behave in the world.
- The mission is the point. This lab exists to get superintelligence right. If that drives you, there is no comparable environment.
Compensation & Perks
$250,000 – $500,000+ total compensation depending on level, with competitive equity at an organization of this caliber. Aionia represents a small number of high-signal candidates for each role — every submission is vetted and purposeful.
Interview Process
Intro Call with Aionia
Role alignment, background overview, and candidate brief review
Technical Screen
Distributed systems depth, execution framework thinking, and production reliability
Practical Assessment
Systems design or project-based challenge relevant to the agent runtime layer
Founder / Team Interview
Mission alignment, engineering culture fit, and vision conversation
