




Job Summary: We are seeking an SRE to ensure the reliability, availability, and scalability of critical systems on AWS, with a focus on automation, observability, and Kubernetes, supporting both operational stability and platform evolution. Key Highlights: 1. Experience working in AWS cloud environments with Kubernetes and automation 2. Focus on reliability, performance, and efficiency 3. Collaboration with engineering and operations teams We are looking for a professional to work as a Site Reliability Engineer (SRE), responsible for ensuring the reliability, availability, and scalability of critical systems in AWS cloud environments, with strong emphasis on automation, observability, and Kubernetes operations. This person will support production platforms’ stability and evolution, ensuring SLA compliance, assisting in incident reduction and MTTR improvement, and driving continuous improvements in performance, security, and cost efficiency. They will collaborate closely with engineering and operations teams, contributing to the organization’s SRE maturity. **Responsibilities:** – Ensure defined SLAs, SLOs, and SLIs for critical services. – Implement and enhance proactive monitoring and alerting. – Automate deployment, scaling, and operational processes. – Perform troubleshooting and incident analysis in production environments. – Lead post-mortems and root cause analyses. – Plan capacity and optimize costs in AWS cloud environments. – Manage and operate Kubernetes clusters (EKS), including deployments, upgrades, and maintenance. – Implement and maintain infrastructure-as-code using Terraform and Ansible. – Build and maintain CI/CD pipelines (Jenkins, GitHub Actions, GitLab CI). – Apply observability practices using Prometheus, Grafana, ELK Stack, and CloudWatch. – Enforce cloud security best practices, including IAM policies and access control. **Requirements:** – Solid experience as an SRE or in a similar role within cloud environments. – Experience with AWS (EC2, S3, RDS, IAM, VPC, Auto Scaling, Load Balancer). – Experience with Kubernetes, preferably Amazon EKS. – Knowledge of infrastructure-as-code (Terraform and/or Ansible). – Experience with CI/CD tools. – Knowledge of observability (monitoring, logging, and metrics). – Proficiency in automation languages (Python, Bash, or Go). – Experience with cloud security practices. – Knowledge of AWS cost management and resource optimization. **Preferred Qualifications:** – Experience managing large-scale workloads on EKS. – Advanced knowledge of AWS networking and security. – Experience in high-availability and mission-critical environments. – Mindset oriented toward automation and continuous improvement. **Desired Profile:** – Analytical mindset with strong problem-solving orientation. – Proactivity and strong sense of ownership. – Ability to work autonomously. – Strong communication and collaboration skills within cross-functional teams. – Focus on reliability, performance, and efficiency. **Important Information:** Work Model: Hybrid. Location: Barueri/SP. Working Hours: Monday to Friday – 09:00 to 18:00. Employment Type: CLT.


