Job Title: Site Reliability Engineer (SRE) / Infrastructure Operations MID LEVELRole OverviewResponsible for managing day-to-day infrastructure operations, including monitoring, alerting, and driving stability improvements across the environment.Key ResponsibilitiesMonitor overall infrastructure health and system performanceTrack key performance metrics such as CPU, memory, and disk utilizationTune alerts to improve signal-to-noise ratio and reduce alert fatigueSupport disaster recovery (DR) rehearsals and readiness activitiesMaintain and update runbooks, documentation, and operational reportsRequired Experience4–6 years of experience in Site Reliability Engineering (SRE) or infrastructure operationsHands-on experience with VMware environmentsExperience with monitoring tools such as PRTG, Datadog, or similar platformsStrong incident management experience, including response and resolution processesCore Skills & CompetenciesSolid understanding of infrastructure performance metrics (CPU, memory, disk, etc.)Experience with alert tuning and optimizationAbility to proactively detect and troubleshoot performance issuesStrong incident management and operational response capabilitiesScreening SignalsLook for candidates who:Understand CPU Ready thresholds and their impact on performanceHave hands-on experience tuning alerts to reduce noiseCan proactively identify and resolve performance bottlenecksDemonstrate strong incident management experience in production environments

Site Reliability Engineer

INSIGHT GLOBAL

Vagas semelhantes

Senior Smb Account Executive, Uber For Business Brazil

UBER CORPORATE

Enterprise Account Executive, Airlines Vertical, Uber For Business

UBER CORPORATE

Senior New Business Mid-Market Account Executive, Uber For Business Brazil

UBER CORPORATE

Enterprise Growth Account Executive, Uber For Business

UBER CORPORATE

Talent Acquisition Analyst

OMODA E JAECOO BRASIL

Business Partner

EMPRESA CONFIDENCIAL

Pessoa Analista De Cobrança Ii

TRILLIA

Receba vagas semelhantes por e-mail