Senior Site Reliability Engineer
Seeking a self-motivated and driven Senior Site Reliability Engineer to work remotely, who will lead reliability initiatives for critical systems, design automation to reduce operational risk, and enhance observability and incident response across teams. Key responsibilities Design, implement, and operate highly available, scalable services in cloud environments, primarily Azure Define and evolve SLOs/SLIs and lead long-term reliability improvements for owned services Participate in and often lead incident response for Sev0-Sev2 events while mentoring other engineers on reliability best practices Required qualifications 8+ years of experience in Site Reliability Engineering, DevOps, or Software Engineering roles with ownership of production systems Proficiency in at least one modern programming language (e.g., Go, Python, C#, Java) for automation and service development Practical experience with large-scale services in public cloud environments, preferably Azure Experience with observability stacks and CI/CD systems for deployment strategies Familiarity with incident management and on-call operations for 24x7 services