← Back to Careers

Site Reliability / Infrastructure Engineer

Medior–Senior · Remote now · Later hybrid (2–3 days in office)

Apply via email2–3 lines + LinkedIn/GitHub is enough.

What you’ll build

  • Reliable deployments and infrastructure for a real-time education platform
  • Observability: metrics, logs, tracing, alerting, and SLO-driven ops
  • Database reliability and performance (PostgreSQL), caching/streams (Redis)
  • Tooling for safe iteration: CI, pre-submit checks, rollback strategies

What we’re looking for

  • Strong operational mindset: you care about uptime and user experience
  • Experience with production systems, incident response, and debugging
  • Comfortable with cloud + containerized deploys (GCP preferred)
  • Bonus: Terraform/IaC, Postgres tuning, streaming systems, WebRTC/LiveKit

Why Oktatron

Small core team (4 people)High ownershipResearch-driven engineeringHungarian market → US/global

Who you’ll work with

  • A small founder-led team (currently 4 people)
  • Platform + AI engineers shipping to real teachers/students
  • You’ll define operational standards and help scale the system safely

What success looks like (30 / 60 / 90 days)

30 days
  • Understand the system end-to-end and current reliability risks
  • Ship one concrete improvement (monitoring, deployment, or DB)
  • Define a first set of SLOs and alerting priorities
60 days
  • Own observability and incident playbooks
  • Improve release safety: canaries, rollbacks, automated checks
  • Stabilize DB + streaming workloads under load
90 days
  • Establish reliable ops rhythms (on-call, postmortems, capacity)
  • Make scaling “boring”: predictable performance and observability
  • Help prepare infra for Hungary → US scale

Apply

Send a short note and links — we respond quickly.

  • 2–3 lines about what you want to build
  • LinkedIn or CV
  • GitHub / projects (if relevant)
Apply via email