




Job Summary: A monitoring professional to work in 24x7 operations, responsible for diagnosing and resolving issues in distributed applications and infrastructure, with a focus on process optimization and integrity assurance. Key Highlights: 1. Monitoring of systems, networks, and distributed applications 2. Proactive analysis and resolution of incidents in 24x7 environments 3. Process optimization and continuous improvement in monitoring Description: 1\. Technical: Applications and Systems: Understanding of internal Hiper distributed applications and data ingestion, transformation, and output processes. * Basic infrastructure knowledge: Servers, databases, Google Cloud, and connectivity. * Monitoring: Experience with infrastructure monitoring tools such as Nagios and Grafana. * Analysis and troubleshooting: Ability to diagnose problems and respond to failures (logs, alerts, incidents, requests). * * SI: Information security and LGPD awareness * Knowledge of and oversight over data processing 2\. Behavioral Attention to detail: Ability to identify failures and inconsistencies in monitoring processes. * Critical and analytical mindset: Ability to identify improvements and optimize workflows, as well as analyze logs, alerts, and metrics to interpret potential failures and predict impacts. * Investigative profile: Ability to identify possible root causes of problems and suggest corrective solutions. * Commitment and accountability: Strict adherence to procedures and SLAs. Remaining attentive and available during your assigned shift in the 24x7 operation. * Strong communication skills: Ability to interact effectively with Engineering, Development, Infrastructure, and Customer Support teams. * Resilience and emotional control: Handling incidents and tight deadlines without compromising quality, maintaining composure and emotional intelligence to make sound decisions during crises. * Proactivity and adaptability: Anticipating problems and suggesting process improvements. Demonstrating curiosity, willingness, and readiness to learn new technologies and continuously advance. * * Organization and documentation: Recording incidents, managing tickets, and documenting best practices for the team. Responsibilities and Duties Perform real-time monitoring of servers, networks, applications, Google Cloud, Nagios, and Grafana. * Analyze incidents and classify their impact and urgency. * Proactively resolve issues while preserving SLA compliance. * Clearly log incidents in Jira and ServiceNow, maintaining a detailed history including applied solutions and root causes. * Keep the Knowledge Base updated with recurring procedures and solutions. * Prepare performance and availability reports for monitored resources. * Propose and implement improvements to monitoring processes. * Stay current with industry best practices in monitoring. * Ensure integrity and security of monitored information. * * Understand and follow the entire change management process (GMUD). Requirements and Qualifications Experience in system and network monitoring. * Experience interpreting logs and analyzing failures. * Familiarity with Jira and ServiceNow tools. * Cloud knowledge — Google Cloud. * * Degree in Information Technology (or related fields) 2512040202181439101


