← Back to Careers

Site Reliability / Infrastructure Engineer

Medior–Senior · Remote now · Later hybrid (2–3 days in office)

Apply via email2–3 lines + LinkedIn/GitHub is enough.

What you’ll build

Reliable deployments and infrastructure for a real-time education platform
Observability: metrics, logs, tracing, alerting, and SLO-driven ops
Database reliability and performance (PostgreSQL), caching/streams (Redis)
Tooling for safe iteration: CI, pre-submit checks, rollback strategies

What we’re looking for

Strong operational mindset: you care about uptime and user experience
Experience with production systems, incident response, and debugging
Comfortable with cloud + containerized deploys (GCP preferred)
Bonus: Terraform/IaC, Postgres tuning, streaming systems, WebRTC/LiveKit

Why Oktatron

Small core team (4 people)High ownershipResearch-driven engineeringHungarian market → US/global

Who you’ll work with

A small founder-led team (currently 4 people)
Platform + AI engineers shipping to real teachers/students
You’ll define operational standards and help scale the system safely

What success looks like (30 / 60 / 90 days)

30 days

Understand the system end-to-end and current reliability risks
Ship one concrete improvement (monitoring, deployment, or DB)
Define a first set of SLOs and alerting priorities

60 days

Own observability and incident playbooks
Improve release safety: canaries, rollbacks, automated checks
Stabilize DB + streaming workloads under load

90 days

Establish reliable ops rhythms (on-call, postmortems, capacity)
Make scaling “boring”: predictable performance and observability
Help prepare infra for Hungary → US scale

Apply

Send a short note and links — we respond quickly.

2–3 lines about what you want to build
LinkedIn or CV
GitHub / projects (if relevant)

Apply via email