




Job Summary: Ensure stability, availability, and performance of Windows and Linux environments by applying SRE practices to enhance reliability and automation. Key Highlights: 1. Work in critical and high-availability environments. 2. Design system administration solutions. 3. Collaborate with teams to develop automation strategies. **\[Infra] Senior Site Reliability Engineering \- SRE** ======================================================= TOTVS \| São Paulo \- SP \| Hybrid Job Description Ensure stability, availability, and performance of Windows and Linux environments by applying SRE practices to increase platform reliability, reduce incidents, and promote service automation and standardization, ensuring operational continuity and quality. Embedded in a context of critical and high-availability environments, the professional will work on infrastructure evolution projects, monitoring and observability improvements, routine automation, and incident response, collaborating closely with Support teams. Responsibilities and Duties WHAT YOU WILL DO: * Maintain on-premises servers (Windows and Linux) and Cloud servers (GCP and Azure), being responsible for creating and maintaining required configurations to ensure infrastructure system stability; * Implement and enhance monitoring, observability, and alerting solutions. * Proactively and predictively identify potential failures before they become incidents. * Design system administration solutions for various operational and project requirements; * Maintain recommended best practices for managing Windows systems and services across all environments. Perform fault localization and error/log analysis to improve daily server maintenance; * Create and modify scripts to execute tasks; * Provide information on ways to improve stability, security, efficiency, and scalability of the Microsoft environment; * Collaborate with other teams and team members to develop automation strategies and deployment processes; * Expand IT professional knowledge and skills alongside the team and implement new IT projects; * Conduct **incident analysis and resolution**, performing troubleshooting in distributed environments. * Analyze and resolve incidents and issues within the environment; Requirements and Qualifications **WHAT WE EXPECT FROM YOU?** * Advanced knowledge of servers (Windows and Linux) and public clouds (Azure and GCP). * Knowledge of scripting languages such as VBScript, .NET, and PowerShell; * Solid knowledge of monitoring, alerting, and observability practices and tools (e.g., Grafana, Datadog). * Experience or familiarity with applications developed in: Node.js, Java, AdvPL, and databases: SQL Server, MongoDB, Oracle. * Experience with incident and outage investigation and resolution. * Knowledge of CI/CD and deployment automation. Salary Range To be determined Employment Type CLT Benefits * TOTVS Online University — a corporate university offering free content and certifications for every TOTVS employee; * +Healthy Program — supports each TOTVER with advisory services and initiatives focused on physical, mental, and financial well-being; * +Advantages Program — Latin America’s largest discount network, exclusively for our employees; * +Care Program — personal support program for employees and their families, offering guidance across multiple specialties including psychology, social work, pet consulting, etc.; * Einstein Conecta — free online medical consultation service provided by physicians from Hospital Israelita Albert Einstein; * Health and dental insurance; * Meal and/or food allowance; * Transportation allowance and shuttle services at select metro stations; * Extended maternity and paternity leave; * Nursing room; * Bicycle parking; * Changing rooms; * Life insurance; * Daycare assistance; * Private pension plan; * Office designed to stimulate creativity and productivity, featuring snack areas, game rooms, billiard tables, and relaxation chairs; * Gympass. About the Company As a technology leader, we are a universe of nonconformist individuals driven by innovation, autonomy, learning, and performance. Together, we create opportunities, transform futures, and share knowledge. Here, your professional development happens in an inclusive, respectful, and energizing environment — people supporting people! We pursue sustainable growth, leveraging data and AI to drive smarter, more efficient outcomes for our customers. Join us in innovating and building the future of technology. \#VemPraTOTVS \#SomosTOTVS


