← Back to Careers
Site Reliability / Infrastructure Engineer
Medior–Senior · Remote now · Later hybrid (2–3 days in office)
Apply via email2–3 lines + LinkedIn/GitHub is enough.
What you’ll build
- Reliable deployments and infrastructure for a real-time education platform
- Observability: metrics, logs, tracing, alerting, and SLO-driven ops
- Database reliability and performance (PostgreSQL), caching/streams (Redis)
- Tooling for safe iteration: CI, pre-submit checks, rollback strategies
What we’re looking for
- Strong operational mindset: you care about uptime and user experience
- Experience with production systems, incident response, and debugging
- Comfortable with cloud + containerized deploys (GCP preferred)
- Bonus: Terraform/IaC, Postgres tuning, streaming systems, WebRTC/LiveKit
Why Oktatron
Small core team (4 people)High ownershipResearch-driven engineeringHungarian market → US/global
Who you’ll work with
- A small founder-led team (currently 4 people)
- Platform + AI engineers shipping to real teachers/students
- You’ll define operational standards and help scale the system safely
What success looks like (30 / 60 / 90 days)
30 days
- Understand the system end-to-end and current reliability risks
- Ship one concrete improvement (monitoring, deployment, or DB)
- Define a first set of SLOs and alerting priorities
60 days
- Own observability and incident playbooks
- Improve release safety: canaries, rollbacks, automated checks
- Stabilize DB + streaming workloads under load
90 days
- Establish reliable ops rhythms (on-call, postmortems, capacity)
- Make scaling “boring”: predictable performance and observability
- Help prepare infra for Hungary → US scale
Apply
Send a short note and links — we respond quickly.
- 2–3 lines about what you want to build
- LinkedIn or CV
- GitHub / projects (if relevant)