Senior Site Reliability Engineer (SRE)

Indeed

Full-time

Onsite

No experience limit

No degree limit

Praça Mal. Deodoro, 174 - Centro Histórico, Porto Alegre - RS, 90010-300, Brazil

Favourites

Some content was automatically translatedView Original

Description

Job Summary: Azion is seeking a Senior Site Reliability Engineer (SRE) to work on mission-critical distributed systems, ensuring reliability, scalability, and resilience at massive scale. Key Highlights: 1. Innovative environment with a high-performance team 2. End-to-end cutting-edge technology development 3. Real-world challenges and impactful solutions ### **About Azion** We are a global technology company specializing in digital applications and security. Our platform helps businesses operate more agilely—reducing response time and increasing system reliability. At Azion, our purpose is to simplify application development and transform the future with cutting-edge technology. Here, you’ll have the opportunity to grow within an innovative environment alongside a high-performance team, tackling real-world challenges and building solutions that make a difference. ### **About the Role** At Azion, we develop all our cutting-edge technology end-to-end and support applications requiring ultra-high availability, low latency, and globally recognized security. We’re looking for a Senior Site Reliability Engineer (SRE) to work on mission-critical distributed systems, ensuring reliability, scalability, and resilience at massive scale. This role requires hands-on experience in complex environments, with strong technical expertise to handle critical incidents, build automation, design resilient architectures, and continuously elevate operational excellence. ### **Key Challenges** * Ensure efficiency and resilience of services serving millions of users by monitoring availability, latency, performance, and capacity; * Manage the full lifecycle of critical incidents: detection, on\-call response, communication, root cause analysis (RCA), blameless postmortems, and follow-up on corrective actions. * Define, implement, and track SLIs and SLOs, aligning technical metrics with business objectives; * Develop and maintain observability, monitoring, and alerting systems (metrics, logs, traces); * Design and operate distributed infrastructures (bare metal, cloud, and hybrid), focusing on performance, scalability, and security; * Implement redundancy, fault isolation, and disaster recovery strategies; * Build and evolve internal automation and tooling to reduce toil, accelerate operations, and increase reliability; * Conduct capacity planning and forecasting to anticipate bottlenecks and ensure sustainable growth; * Promote SRE culture (error budget, best practices, readiness drills, chaos engineering). ### **Minimum Requirements** * Solid experience with highly complex distributed UNIX/Linux architectures (microservices, layered systems); * Hands-on experience with monitoring, on\-call support, and incident management using tools such as Prometheus, Grafana, log management platforms, etc.; * Practical experience defining and tracking SLIs/SLOs and error budgets; * Advanced knowledge of Linux, network, and protocol troubleshooting (HTTP, DNS, TCP/IP); * Experience with orchestration and automation (Docker, Kubernetes, Terraform, Ansible, Puppet, Git, CI/CD); * Proficiency in programming languages such as Python or Golang; * Intermediate English proficiency. ### **Preferred Qualifications** * Completed or ongoing degree in Information Technology or related fields; * Experience in mission-critical environments (millions of users, low latency, high availability); * Hands-on experience with cloud computing (AWS, GCP, Azure) and infrastructure-as-code; * Experience with chaos engineering, DDoS mitigation, or large-scale capacity planning; * Open-source contributions and/or active participation in SRE technical communities; * Advanced English proficiency. ### **Benefits \& Azion Way of Life** * CLT employment contract; * Health and dental insurance; * Flexible VR and VA (Flash Card), including during vacation periods; * Commuter allowance (no payroll deduction); * Annual internal hackathons; * Mobility allowance (additional amount for commuting); * Freestyle (incentive for customizing your workstation); * Stock options (per company policy); * Birthday day off; * TotalPass; * Flexible working hours (truly flexible); * Nomad Program allowing remote work from anywhere for up to 30 days per year (per company policy); * Annual international exchange program. ### **FlexWork Model** We offer a FlexWork model prioritizing cultural integration and collaboration. For the first three months, you will work **on\-site** at the local office—a crucial phase to build strong relationships and forge a genuine connection with our values and goals. We believe this initial immersion not only strengthens the team but also drives creativity and innovation. After this period, you may apply for the **hybrid** model, working onsite at least three times per week. This approach balances in-person interaction and autonomy, creating a dynamic and productive work environment. At Azion, all applications are welcome regardless of gender, sexual orientation, age, pregnancy, disability, ethnicity, skin color, country of origin, or religion. We believe an inclusive environment contributes to our success and that respect underpins all our relationships. Join our team! We’re excited to meet you and build a path to success in technology together!

Source: indeed View original post

João Silva

Indeed · HR

Company

Indeed

João Silva

Indeed · HR

Similar jobs

Senior Site Reliability Engineer (SRE)

Description

Company

Similar jobs

Software Engineer (Golang)

IT INTERNSHIP - SUPPORT

Senior Developer

Bricklayer - Canoas - RS

7733 - EDUCATIONAL PROJECTS MONITOR (After-school)

Interpreter and Exhibition Support (Movelsul Brasil)