← Back to Careers

Site Reliability / Infrastructure Engineer

Medior–Senior · Remote now · Later hybrid (2–3 days in office)

Apply via email2–3 lines + LinkedIn/GitHub is enough.

What you’ll build

  • Observability (metrics/logs/traces), alerting, and incident playbooks
  • Deployment pipelines and release reliability for real-time systems
  • Infra automation, cost/perf optimization, and DB reliability
  • Capacity planning for video sessions and AI workloads

What we’re looking for

  • Hands-on experience running production services
  • Comfort with Docker, CI/CD, and cloud infrastructure
  • Bonus: GCP, PostgreSQL tuning, Redis, video/WebRTC systems

Why Oktatron

High leverage roleOwn reliability end-to-endSmall team, direct impactGlobal scale ambition

Who you’ll work with

  • A small founder-led team (currently 4 people)
  • Platform + product engineering — reliability is a first-class feature
  • You’ll work directly with founders to shape deployment, observability, and scaling

What success looks like (30 / 60 / 90 days)

30 days
  • Get full visibility into current infra, deploys, and failure modes
  • Ship one quick win (alerts, logging, CI hardening, or runbook)
  • Map reliability needs for real-time sessions + AI workloads
60 days
  • Own observability and incident response basics end-to-end
  • Improve release reliability: deploy strategy + rollback safety
  • Baseline performance/cost and create scaling plan (Hungary → US)
90 days
  • Set reliability standards and SLO-style thinking
  • Automate infrastructure workflows and reduce operational toil
  • Be a core owner of scaling and operational excellence

Apply

Send a short note + CV/LinkedIn. If you’ve built tooling or run systems at scale, include details.

  • 2–3 lines about what you want to build
  • LinkedIn or CV
  • GitHub / projects (if relevant)
Apply via email