All roles

Open role

Senior Site Reliability Engineer- Sunnyvale, CA, the US

Remote · Brazil Full-time

About the Role

Senior Site Reliability Engineer (Payments Infrastructure) Kody is seeking a Senior Site Reliability Engineer to ensure the reliability, availability, scalability, and operational excellence of our global payment platform. You will own production observability, incident response, service-level management, and cloud infrastructure reliability across mission-critical payment processing systems operating in Europe, Asia, and North America.

Responsibilities

  • Participate in a follow-the-sun production on-call rotation as a primary incident responder.
  • Diagnose, triage, mitigate, and coordinate resolution of production incidents across payment services, Kubernetes platforms, databases, messaging systems, and cloud infrastructure.
  • Define and maintain SLOs, SLIs, error budgets, alerting standards, and operational readiness processes.
  • Drive reliability improvements through automation, observability, capacity planning, performance optimization, and post-incident reviews.
  • Partner with engineering teams to improve resilience, security, and operational maturity in PCI-DSS-regulated environments.
  • Lead incident management during SEV1/SEV2 events and improve response effectiveness and MTTR.

Requirements

  • 5+ years of experience in Site Reliability Engineering, Platform Engineering, DevOps, or Cloud Infrastructure roles supporting mission-critical production systems.
  • Strong hands-on experience with AWS, Kubernetes (EKS), Terraform, PostgreSQL, Redis, Kafka, Linux, networking, and modern observability platforms.
  • Deep understanding of distributed systems, cloud-native architectures, high availability, disaster recovery, capacity planning, and performance optimization.
  • Proven experience operating payment, banking, fintech, or other highly regulated systems with stringent security, compliance, and uptime requirements.
  • Strong knowledge of SRE principles, including SLOs, SLIs, error budgets, incident management, alert governance, and operational excellence.

Leadership & Operational Excellence

  • Demonstrates strong ownership and accountability, taking end-to-end responsibility for service reliability and customer impact.
  • Possesses a strong sense of urgency during production incidents while maintaining sound judgment and structured decision-making under pressure.
  • Applies a systematic and methodical approach to troubleshooting, root-cause analysis, and incident resolution in complex distributed environments.
  • Data-driven mindset with the ability to leverage metrics, telemetry, trends, and service-level indicators to prioritize reliability investments and operational improvements.
  • Continuously drives engineering excellence through iterative improvement, automation, standardization, and elimination of operational toil.
  • Proven ability to lead cross-functional incident response efforts, coordinate stakeholders, and communicate effectively during high-severity production events.
  • Champions a culture of operational readiness, continuous learning, post-incident improvement, and blameless accountability.
  • Demonstrates strong mentoring and technical leadership skills, influencing engineering teams to build reliable, scalable, and resilient systems by design.
  • Lead a dynamic and innovative team in a very rapidly growing company.
  • Competitive package.
  • Collaborative, inclusive environment where your contributions are recognized and valued.

More open positions

reputed company Kubernetes Engineer; Fulltime- Remote

Work from home Full-time role

Remote role of OpenShift/Kubernetes Engineer

Work from home Full-time role

Senior Kubernetes Engineer

Work from home Full-time role

Kubernetes Engineer Remote

Work from home Full-time role

Kubernetes Engineer (DoD Secret | Weeknight Mission Readiness | Remote – U.S.)

Work from home Full-time role

Experienced Global Client Success Manager – Packaging Industry Expertise

Work from home Full-time role

Experienced Customer Service Representative – Bank of America Operations Support

Work from home Full-time role

Insurance Agent Fully Remote

Work from home Full-time role

Regulatory Business Analyst - Regulatory Technology (Contract)

Work from home Full-time role

Senior NICE CXOne Engineer

Work from home Full-time role

Medical Intake Coordinator / Data Entry Specialist – Healthcare Administration & Patient Services

Work from home Full-time role

TECHNICAL SALES REPRESENTATIVE

Work from home Full-time role

Remote Customer Care Representative – Pharmacy Services & Wellness Support (Fully Remote) at careerzynith

Work from home Full-time role

Senior HRIS Analyst - Workday (Remote)

Work from home Full-time role

Project Lion - Prompt Engineer - United States (Remote, Part-Time)

Work from home Full-time role

Real Estate Specialist (High Volume, Quota-Carrying Sales, Remote)

Work from home Full-time role

Remote Community Engagement Chat Moderator – Flexible Home‑Based Role with $25‑$35/hr Compensation

Work from home Full-time role

Senior Electrical Engineer

Work from home Full-time role

(Senior) IT Application Manager (gn)

Work from home Full-time role

Steuerfachkraft (m/w/d) in Weißbach mindestens 52.000€ - 100% Remote möglich

Work from home Full-time role

Field Service Engineer

Work from home Full-time role