← Back to Careers
Site Reliability / Infrastructure Engineer
Medior–Senior · Remote now · Later hybrid (2–3 days in office)
Apply via email2–3 lines + LinkedIn/GitHub is enough.
What you’ll build
- Observability (metrics/logs/traces), alerting, and incident playbooks
- Deployment pipelines and release reliability for real-time systems
- Infra automation, cost/perf optimization, and DB reliability
- Capacity planning for video sessions and AI workloads
What we’re looking for
- Hands-on experience running production services
- Comfort with Docker, CI/CD, and cloud infrastructure
- Bonus: GCP, PostgreSQL tuning, Redis, video/WebRTC systems
Why Oktatron
High leverage roleOwn reliability end-to-endSmall team, direct impactGlobal scale ambition
Who you’ll work with
- A small founder-led team (currently 4 people)
- Platform + product engineering — reliability is a first-class feature
- You’ll work directly with founders to shape deployment, observability, and scaling
What success looks like (30 / 60 / 90 days)
30 days
- Get full visibility into current infra, deploys, and failure modes
- Ship one quick win (alerts, logging, CI hardening, or runbook)
- Map reliability needs for real-time sessions + AI workloads
60 days
- Own observability and incident response basics end-to-end
- Improve release reliability: deploy strategy + rollback safety
- Baseline performance/cost and create scaling plan (Hungary → US)
90 days
- Set reliability standards and SLO-style thinking
- Automate infrastructure workflows and reduce operational toil
- Be a core owner of scaling and operational excellence
Apply
Send a short note + CV/LinkedIn. If you’ve built tooling or run systems at scale, include details.
- 2–3 lines about what you want to build
- LinkedIn or CV
- GitHub / projects (if relevant)