Senior Site Reliability Engineer (SRE)

Indeed

Full-time

Onsite

No experience limit

No degree limit

Praça Mal. Deodoro, 174 - Historic Center, Porto Alegre - RS, 90010-300, Brazil

Favourites

Some content was automatically translatedView Original

Description

Job Summary: We are seeking a Senior Site Reliability Engineer (SRE) to work on mission-critical distributed systems, ensuring reliability, scalability, and resilience at massive scale. Key Highlights: 1. Tackle real-world challenges and create solutions that make a difference 2. Develop cutting-edge technology end-to-end in an innovative environment 3. Continuously elevate the standard of operational excellence ### **About Azion** We are a global technology company specializing in digital applications and security. Our platform helps enterprises operate more agilely—reducing response time and increasing system reliability. At Azion, our purpose is to simplify application development and transform the future with cutting-edge technology. Here, you’ll have the opportunity to grow within an innovative environment alongside a high-performance team, tackling real challenges and creating impactful solutions. ### **About the Role** At Azion, we develop all our technology end-to-end and support applications requiring ultra-high availability, low latency, and globally recognized security. We seek a Senior Site Reliability Engineer (SRE) to work on mission-critical distributed systems, ensuring reliability, scalability, and resilience at massive scale. This position demands hands-on experience in complex environments, with strong technical expertise to handle critical incidents, build automation, design resilient architectures, and continuously elevate the standard of operational excellence. ### **Key Challenges** * Ensure efficiency and resilience of services serving millions of users by monitoring availability, latency, performance, and capacity; * Manage the full lifecycle of critical incidents: detection, on-call response, communication, root cause analysis (RCA), blameless postmortems, and follow-up on corrective actions. * Define, implement, and monitor SLIs and SLOs, aligning technical metrics with business objectives; * Develop and maintain observability, monitoring, and alerting systems (metrics, logs, traces); * Design and operate distributed infrastructures (bare metal, cloud, and hybrid), with emphasis on performance, scalability, and security; * Implement redundancy, fault isolation, and disaster recovery strategies; * Create and evolve internal automation and tooling to reduce toil, accelerate operations, and increase reliability; * Conduct capacity planning and forecasting, anticipating bottlenecks and ensuring sustainable growth; * Promote SRE culture (error budget, best practices, readiness drills, chaos engineering). ### **Minimum Requirements** * Solid experience with highly complex UNIX/Linux distributed architectures (microservices, layered systems); * Hands-on experience with monitoring, on-call support, and incident management using tools such as Prometheus, Grafana, log managers, etc.; * Practical experience defining and tracking SLIs/SLOs and error budgets; * Advanced knowledge of Linux, networking, and protocols (HTTP, DNS, TCP/IP) troubleshooting; * Proficiency in orchestration and automation (Docker, Kubernetes, Terraform, Ansible, Puppet, Git, CI/CD); * Proficiency in programming languages such as Python or Golang; * Intermediate English proficiency. ### **Desirable Qualifications** * Completed or ongoing degree in Information Technology or related fields; * Experience in mission-critical environments (millions of users, low latency, high availability); * Experience with cloud computing (AWS, GCP, Azure) and infrastructure-as-code; * Experience with chaos engineering, DDoS mitigation, or large-scale capacity planning; * Open-source contributions and/or active participation in SRE technical communities; * Advanced English proficiency. ### **Benefits & Azion Way of Life** * CLT employment contract; * Health and dental insurance; * Flexible meal and food allowance (Flash Card), including during vacation periods; * Commuter allowance (no payroll deduction); * Annual internal hackathons; * Mobility allowance (additional amount for commuting); * Freestyle (incentive for customizing your workstation); * Stock options (per company policy); * Birthday day off; * TotalPass; * Flexible working hours (truly flexible); * Nomad Program allowing remote work from anywhere for up to 30 days per year (per policy); * Annual international exchange program. ### **FlexWork Model** We offer a FlexWork model prioritizing cultural immersion and collaboration. For the first three months, you will work **on-site** at the local office—a crucial phase to build strong relationships and forge a genuine connection with our values and goals. We believe this initial immersion not only strengthens the team but also fuels creativity and innovation. After this period, you may apply for the **hybrid** model, working on-site at least three times per week. This approach balances in-person interaction with autonomy, fostering a dynamic and productive work environment. At Azion, all applications are welcome, regardless of gender, sexual orientation, age, pregnancy, disability, ethnicity, skin color, country of origin, or religion. We believe an inclusive environment contributes to our success and that respect underpins all our relationships. Join our team! We’re excited to meet you and walk together toward success in technology!

Source: indeed View original post

João Silva

Indeed · HR

Company

Indeed

João Silva

Indeed · HR

Similar jobs

Senior Site Reliability Engineer (SRE)

Description

Company

Similar jobs

Systems Development Assistant - Sicredi Rota das Terras RS/MG - RS Headquarters

Assistant Store Manager (Store | Forecourt)

Unit Manager - Soledade

Financial Analyst

Operations Director | Unicred Pioneira - Casca/RS

Operations Supervisor (Extrusion) - Casca/RS