




We are looking for a **Senior SRE Analyst** to ensure high availability, scalability, and reliability of critical systems in a cloud environment. The professional will play a strategic role in automation, observability, and continuous infrastructure improvement. ### **Key Responsibilities** * Ensure compliance with **SLA, SLO, and SLI** for critical services. * Implement and maintain **proactive monitoring and alerts**. * Automate **deployment, scaling, and recovery** processes. * Conduct **incident analysis and post-mortems**. * Plan capacity, optimize costs, and improve cloud environment efficiency. ### **Technical Requirements:** **AWS Cloud** * EC2, S3, RDS, IAM, VPC * Auto Scaling and Load Balancer * Cost management and resource optimization **Containers and Orchestration** * Kubernetes (Amazon EKS) * Helm * Cluster, deployment, and upgrade management **Infrastructure as Code** * Terraform * Ansible **CI/CD** * Jenkins * GitHub Actions * GitLab CI **Observability** * Prometheus * Grafana * ELK Stack * CloudWatch **Programming Languages** * Python * Bash * Go (for automation and scripting) **Security** * IAM policies * Access control * Security best practices and compliance ### **Desired Profile** * Strong **reliability engineering** mindset * Analytical vision and focus on incident prevention * Experience in critical, high-scale environments * Strong communication skills and collaborative work with technical teams


