···
Log in / Register

Site Reliability Engineer - SRE

R$100/day
Indeed
Full-time
Onsite
No experience limit
No degree limit
79Q22222+22
Favourites
Share

Description

Summary: Seeking a Senior Site Reliability Engineer to build and support reliable, high-capacity, and well-performing systems, collaborating with product development teams in a DevOps model to enhance predictability and accelerate time to market. Highlights: 1. Collaborate with product development teams in a DevOps model 2. Drive initiatives to enhance system reliability and performance 3. Mentor and nurture engineers across varying levels of experience **Senior SRE** Location: Argentina, Bolivia, Mexico, Paraguay, Colombia, Are you looking for a career that makes a positive difference in your life and reimagines learners and educators across the globe? Do you want to work with fun and social people in a positive and engaged virtual office environment? We are hiring a \*\*Senior Site Reliability Engineer \*\*who will build and support reliable, high\-capacity, and well\-performing systems in support of our mission to protect and improve our customer platforms, with an ever\-watchful eye on reliability, security, performance, cost, and operational excellence. As a Sr Site Reliability Engineer, you will collaborate in a DevOps model with product development teams; designing, deploying, and managing automation tools that increase predictability as well as time to market while reducing cost. Our cloud stack includes: * Cloud: AWS ( Cloudfront, S3, EC2, ECS, SES, SQS, SNS, Load Balancing, VPC, Config, Systems Manager, Lambda, API Gateway, DB services many more). * Cloud (OCI cloud know how a plus. ( Exacs,OCI Compute, Load Balancers, Networking, VCN, Object storage) * Infrastructure as Code: Terraform * Programming: Python, Golang, Bash , Ansible * Containers: AWS ECS * Security: Rapid7, WAF * Web: Apache httpd, Apache Tomcat, Angular * Config Management and provisioning: Ansible, Packer * Telemetry: NewRelic, CloudWatch, DataDog * DevSecOps: Artifactory, Jenkins, CircleCI, SonarQube, Jfrog X\-Ray, Control Tower, GitHub Enterprise and more Your contributions * Cloud Engineering * Collaborate with product development teams in a DevOps model, designing, deploying, and managing automation tools to enhance predictability and accelerate time to market * Identify the highest\-impact opportunities to optimize existing systems; ensuring “right\-sized” solutions in consideration of technical and business constraints * Drive initiatives to enhance system reliability and performance * Ensure repeatability, traceability, and transparency of our infrastructure automation (infrastructure\-as\-code, monitoring\-as\-code) * Participate in continual learning of the AWS ecosystem, game day scenarios, and professional conferences * Actively monitor AWS costs, using optimization tools to maximize ROI while meeting Service Level Objectives. * Observability Engineering * Ownership of reliability, uptime, system security, cost, operations, capacity, resiliency, and performance\-analysis thereof * Leads initiatives to improve the reliability and stability of applications and platforms using data\-driven analytics to improve service levels * Ensure that the architecture and deployment models are adequately designed to meet SLA commitments * Serve as the primary point of contact during major incidents for your application, and demonstrate the ability to identify and resolve issues that trigger on\-call alarms. * Maintain and enhance telemetry systems to improve visibility into application performance and business metrics, ensuring operational workloads are effectively managed * Develop, communicate, collaborate, and monitor standard processes to promote the long\-term health and sustainability of operational development tasks * DevSecOps * Support healthy software development practices, including complying with agile software development methodology, building standards for code reviews, work packaging, and continuous delivery * Partner with CyberSecurity and develop plans and automation to respond to new risks and vulnerabilities * Resiliency Engineering * Collaborate with dev teams to identify failure points and blast radius of systems * Validate the effectiveness of monitoring and observability configurations * Coordinate failure injection testing * Observe and document steady state production levels, growth patterns * Plan and forecast for seasonal growth, communicate trend lines with leadership, enhance infrastructure scaling plans to accommodate 2x planned load * Coordinate improvements of existing software and infrastructure to meet resiliency goals * Mentor and nurture engineers across varying levels of experience; foster growth by setting high\-reaching goals, and providing support to achieve them. * Ability to expand and collaborate across different levels and stakeholder groups. * Documents and shares knowledge within the organization via internal forums and communities of practice. * Good to have Kubernetes experience, EKS or managed their own Kubernetes clusters * Must have used terraform to create infrastructure within AWS. Must bring an automation\-first mindset to the team. * On\-call participation required. Person will lead triage bridges when necessary * Will be expected to monitoring customer experience, application metrics like golden signals/KPIs and infrastructure health. * Needs to work proactively across team boundaries on a daily basis. Qualifications * Experience as a software engineer, with practical experience developing, debugging, and deploying enterprise applications * Experience with infrastructure automation technologies, preferably Terraform * Experience in container/container\-fleet\-orchestration technologies, preferably EKS or ECS * Versatility with troubleshooting diverse sets of hosting technologies: web server platforms, application platforms, operating systems, network components, virtualization technologies, storage, and database platforms. * Experience with continuous\-deployment based software development lifecycles (e.g. CI/CD) * Experience with application caching strategies and high concurrency workloads * Strong communication, problem solving, root cause analysis and systems engineering skills * Ability to design and manage escalation response plans from monitoring, react, respond, remediate and retrospect in culturally aligned (proactive, customer focused, collaborative, data\-driven) ways. * Demonstrated expertise building and managing highly scaled production infrastructure in the cloud * BS Degree in Computer Science (or related technical field and/or equivalent industry experience) Job Type: Contract Contract length: 12 months Pay: R$100\.00 per hour Expected hours: 8 per week Work Location: Remote

Source:  indeed View original post
João Silva
Indeed · HR

Company

Indeed
Cookie
Cookie Settings
Our Apps
Download
Download on the
APP Store
Download
Get it on
Google Play
© 2025 Servanan International Pte. Ltd.