Senior SRE Analyst

Negotiable Salary

Indeed

Full-time

Onsite

No experience limit

No degree limit

Praça do Patriarca, 62 - Centro Histórico de São Paulo, São Paulo - SP, 01002-010, Brazil

Favourites

Some content was automatically translatedView Original

Description

We are looking for a Senior SRE Analyst to join our team and work on maintaining and evolving our infrastructure and production platforms, ensuring high availability, reliability, and performance of services. **What you will do:** * Provide second-level (N2) support for incidents and requests related to MB infrastructure and Production platforms, ensuring service availability and stability, with autonomy to conduct investigations and propose fixes * Support production infrastructure management by diagnosing and resolving issues to minimize downtime and ensure service continuity * Execute scripts and procedural operations in Production and non-Production environments, aiming for standardization and reduction of manual tasks * Automate repetitive processes and tasks to improve operational efficiency and environment reliability * Support maintenance and provisioning of Infrastructure as Code (IaC) using Terraform, contributing improvements and versioning best practices * Monitor systems and applications, investigating alerts and logs (including Kubernetes), analyzing issues and implementing solutions * Collaborate with developers and other teams to resolve problems and evolve systems to be more resilient, scalable, reliable, and high-performing * Participate in post-mortem analyses and support the creation of incident reports, ensuring follow-up on actions and continuous improvement * Suggest and implement performance, observability, and scalability improvements across services and platforms **What we require from you:** * Bachelor’s degree in Information Systems, Computer Science, Engineering, or related fields (or equivalent experience) * Prior experience in SRE, DevOps, or technical support/infrastructure roles, including hands-on production support and incident response * Solid knowledge of Linux and operating systems * Practical knowledge of public cloud (GCP, AWS, Azure) and their core services * Experience with containers and orchestration (Docker and Kubernetes) * Familiarity with version control tools such as Git and repositories on GitHub * Knowledge and experience with Infrastructure as Code (IaC), particularly Terraform (and awareness of best practices for maintenance/evolution) * Proficiency in automation scripting (Python, Bash, etc.) * Familiarity with CI/CD tools (Jenkins, GitHub Actions, etc.) * Organizational skills and strong communication abilities to handle tickets, prioritize requests via Jira, and interact with development teams * Problem-solving skills and critical thinking **Nice-to-have:** * Prior experience with cloud production environments, especially GCP (Google Cloud Platform) * Knowledge and experience with monitoring and observability (e.g., Prometheus, Grafana, Stackdriver) * Basic understanding of cloud security, access control, and identity management * Exposure to SRE practices and DevOps culture * Knowledge and experience with complementary IaC/automation tools (Ansible and/or Terragrunt) * Certifications in GCP, Kubernetes, or Terraform (or SRE/DevOps/Cloud Providers) * Experience with databases and microservices development

Source: indeed View original post