All roles

Open role

Senior Site Reliability Engineer

Remote · Sweden Full-time

Join us as a Senior Site Reliability Engineer In this key role, you’ll improve and drive the availability, performance, efficiency, change management, monitoring, security, incident response, and capacity planning for our products and services You’ll enjoy significant stakeholder interaction, working in collaboration with engineers to ensure a principled approach to delivering change in a safe and secure way This is a chance to join an inclusive team with a collaborative ethos and a commitment to innovation and professional development You’ll need to have the flexibility to support the team by working shifts and weekends on rotation What you'll do As a Senior Site Reliability Engineer, you’ll act as a hands-on expert responsible for ensuring the reliability, availability, and performance of critical production platforms. You’ll lead the adoption of Site Reliability Engineering (SRE) practices, embedding resilience, observability, and operational excellence into distributed systems running on AWS and Kubernetes. You’ll also take ownership of 24/7 production support models, ensuring systems are highly available and that incidents are effectively managed and learned from. We’ll expect you as well to design and operate highly resilient AWS-based Kubernetes platforms (EKS) aligned with enterprise standards while owning and continuously improving production reliability, availability, and Service Level Agreement or Service Level Objective (SLA/SLO) frameworks. You’ll lead incident management, escalation, and 24/7 on-call practices, including post-incident reviews, and embed SRE principles such as error budgets, toil reduction, and reliability engineering into delivery teams. Furthermore, you’ll implement infrastructure and platform automation using Terraform and GitOps methodologies and drive self-healing, auto-scaling, and failure recovery mechanisms using tools such as Karpenter. In addition to this, you’ll be: Building secure and scalable networking and service communication such as Cilium Defining and operating observability platforms using Grafana, Prometheus, Loki, and Tempo Partnering with DevOps and engineering teams to ensure production readiness and operational excellence Leading complex troubleshooting across distributed systems and cloud-native environments Developing reusable “golden paths,” operational runbooks, and reliability patterns Ensuring platforms meet regulatory, security, and operational risk requirements Using data, Service Level Indicators (SLIs), and metrics to drive continuous improvement and proactive reliability enhancements The skills you'll need We’re looking for a highly experienced Site Reliability Engineer with a strong background in operating large-scale, business-critical platforms and a passion for reliability engineering. You must also have deep expertise in managing production systems on AWS and Kubernetes (EKS), along with strong experience in 24/7 support models, incident management, and on-call leadership. Moreover, you’ll need to demonstrate advanced knowledge of SRE principles such as SLIs, SLOs, error budgets, and toil reduction, as well as proficiency in Terraform, GitOps, and cloud automation practices. Hands-on experience with GitLab continuous integration and continuous delivery pipelines and Argo CD is also essential. In addition, you’ll have to bring: A strong understanding of Kubernetes networking, security, and service mesh technologies, ideally using Cilium Experience scaling infrastructure using Karpenter and auto-scaling strategies Expertise in observability tooling, including Grafana, Prometheus, Loki and Tempo A proven ability to troubleshoot and resolve complex, cross-system production issues Experience operating in regulated or high-security environments Strong leadership, mentoring, and stakeholder engagement capabilities The ability to balance reliability, risk, and delivery in a fast-paced environment Hours 35 Job Posting Closing Date: 03/06/2026 Ways of Working:Remote First

More open positions

Territory Sales Manager

Work from home Full-time role

Lead Engineer – Commercial Field Service

Work from home Full-time role

Principal Consultant - Agentic AI Engg

Work from home Full-time role

Sales Specialist

Work from home Full-time role

Director, Customer Solutions (Auto Finance Solutions/Vehicle Values)

Work from home Full-time role

[Remote] Data Scientist

Work from home Full-time role

US Journalist - Insurance (Remote)

Work from home Full-time role

Account Executive, Vancouver

Work from home Full-time role

Entry-Level - Remote Sales Representative

Work from home Full-time role

Experienced Online Chat Text Supervisor - 988 Crisis Intervention Program

Work from home Full-time role

[Remote] Database Report Writer

Work from home Full-time role

Experienced Bilingual Customer Service Representative (Spanish) – Remote Opportunity with careerzynith

Work from home Full-time role

Remote Part-Time Data Entry Specialist – Flexible Hours for College Students | Competitive Hourly Pay + Growth Opportunities

Work from home Full-time role

Senior National Acct Executive-Carrier Sales

Work from home Full-time role

[Remote] Senior Director Product Management, Security, Compliance and Risk

Work from home Full-time role

Senior Product Manager / Product Engineer

Work from home Full-time role

Remote Helpdesk Support Technician (Active Directory) Professional Charlotte, NC

Work from home Full-time role

HRIS Specialist

Work from home Full-time role

Data Scientist II, Marketplace

Work from home Full-time role

Partner Relationship Manager - Mexico

Work from home Full-time role

RTX Consumables Sub-Category Lead, Non-Product (Remote)

Work from home Full-time role