Deskripsi Pekerjaan
Join QuantumLeap Systems as a Senior Site Reliability Engineer and architect the future of cloud infrastructure at scale. You'll drive innovation in our next-generation fintech platform, ensuring mission-critical systems maintain 99.99% uptime while enabling rapid feature deployment. Collaborate with elite engineering teams to transform complex business challenges into elegant, automated solutions. We offer competitive equity packages, unlimited PTO, and opportunities to shape industry-wide SRE best practices.
Tanggung Jawab
- Design, implement, and maintain scalable cloud infrastructure on AWS/GCP with IaC (Terraform/CloudFormation)
- Develop robust monitoring, alerting, and incident response systems using Prometheus/Grafana
- Automate deployment pipelines and CI/CD workflows (GitLab CI, Jenkins)
- Optimize system performance and cost efficiency through continuous improvement initiatives
- Lead post-mortem analyses and implement corrective actions for production incidents
- Mentor junior engineers on SRE principles and operational excellence
Kualifikasi
- 5+ years of experience in SRE, DevOps, or infrastructure engineering roles
- Expertise in container orchestration (Kubernetes, Docker) and serverless architectures
- Strong proficiency in at least one programming language (Python/Go/Bash)
- Deep knowledge of observability tools (Prometheus, Datadog, New Relic)
- Experience with cloud-native security frameworks and compliance standards
- Proven ability to design high-throughput, low-latency distributed systems
- Excellent problem-solving skills with a systems-thinking mindset