Job Title: Site Reliability Engineer (SRE) / Infrastructure Operations MID LEVEL Role Overview Responsible for managing day-to-day infrastructure operations, including monitoring, alerting, and driving stability improvements across the environment. Key Responsibilities Monitor overall infrastructure health and system performance Track key performance metrics such as CPU, memory, and disk utilization Tune alerts to improve signal-to-noise ratio and reduce alert fatigue Support disaster recovery (DR) rehearsals and readiness activities Maintain and update runbooks, documentation, and operational reports Required Experience 4–6 years of experience in Site Reliability Engineering (SRE) or infrastructure operations Hands-on experience with VMware environments Experience with monitoring tools such as PRTG, Datadog, or similar platforms Strong incident management experience, including response and resolution processes Core Skills & Competencies Solid understanding of infrastructure performance metrics (CPU, memory, disk, etc.) Experience with alert tuning and optimization Ability to proactively detect and troubleshoot performance issues Strong incident management and operational response capabilities Screening Signals Look for candidates who: Understand CPU Ready thresholds and their impact on performance Have hands-on experience tuning alerts to reduce noise Can proactively identify and resolve performance bottlenecks Demonstrate strong incident management experience in production environments

Site Reliability Engineer

INSIGHT GLOBAL

Vagas semelhantes

Vaga Afirmativa Para Pcd | Pessoa Assistente De Ti (Remoto)

TRILLIA

Médico De Família E Comunidade - Telemedicina - Ribeirão Preto E Região

BIO

Analista Financeiro Contador(A)

ADAFLOW

Esl Online Teacher

AMERILINGUA

Sr. Software Engineer Ii - Data Solutions & Measurement

CINT

Financial Operation

MEITUAN

Sap Consultant

THUNDERSOFT

Receba vagas semelhantes por e-mail