




Ensure visibility and continuous monitoring of the health of systems and applications by implementing observability solutions that enable proactive identification, diagnosis, and resolution of issues, ensuring high service availability and performance. Responsibilities **Implement and maintain observability tools (monitoring, metrics, logs, and tracing).** Create dashboards and alerts for tracking performance and availability. **Analyze metrics and logs to identify trends and potential failures.** Collaborate with development and infrastructure teams to improve system visibility. **Support critical incidents with detailed analysis and improvement recommendations.** Ensure compliance with SRE (Site Reliability Engineering) and DevOps practices. Technical Requirements **Experience with tools such as Prometheus, Grafana, ELK Stack, Datadog, New Relic, or similar.** Knowledge of metrics, logs, tracing, and observability concepts. **Familiarity with cloud environments (AWS, Azure, GCP).** Basic knowledge of automation and scripting (Python, Shell). **Knowledge of containers and orchestration (Docker, Kubernetes).** **Behavioral Competencies** Analytical ability and problem-solving skills. **Proactivity and attention to detail.** Strong communication and collaboration with multidisciplinary teams. Job Type: Full-time CLT Compensation: R$10\.500,00 \- R$12\.000,00 per month Benefits: * Health insurance * Dental insurance * Education allowance * Profit sharing * Life insurance * Food voucher * Meal voucher * Transportation voucher Work Location: Remote hybrid, for Pinheiros, SP


