← Back to Careers

Site Reliability / Infrastructure Engineer

Medior–Senior · Remote now · Later hybrid (2–3 days in office)

Apply via email2–3 lines + LinkedIn/GitHub is enough.

What you’ll build

Observability (metrics/logs/traces), alerting, and incident playbooks
Deployment pipelines and release reliability for real-time systems
Infra automation, cost/perf optimization, and DB reliability
Capacity planning for video sessions and AI workloads

What we’re looking for

Hands-on experience running production services
Comfort with Docker, CI/CD, and cloud infrastructure
Bonus: GCP, PostgreSQL tuning, Redis, video/WebRTC systems

Why Oktatron

High leverage roleOwn reliability end-to-endSmall team, direct impactGlobal scale ambition

Who you’ll work with

A small founder-led team (currently 4 people)
Platform + product engineering — reliability is a first-class feature
You’ll work directly with founders to shape deployment, observability, and scaling

What success looks like (30 / 60 / 90 days)

30 days

Get full visibility into current infra, deploys, and failure modes
Ship one quick win (alerts, logging, CI hardening, or runbook)
Map reliability needs for real-time sessions + AI workloads

60 days

Own observability and incident response basics end-to-end
Improve release reliability: deploy strategy + rollback safety
Baseline performance/cost and create scaling plan (Hungary → US)

90 days

Set reliability standards and SLO-style thinking
Automate infrastructure workflows and reduce operational toil
Be a core owner of scaling and operational excellence

Apply

Send a short note + CV/LinkedIn. If you’ve built tooling or run systems at scale, include details.

2–3 lines about what you want to build
LinkedIn or CV
GitHub / projects (if relevant)

Apply via email