




Job Summary: Experienced professional in platform engineering, DevOps, and SRE to define and implement technical standards, tools, and operational architectures focused on reliability and scalability. Key Highlights: 1. Solid experience in platform engineering, DevOps, and SRE. 2. Experience with CI/CD, observability, infrastructure as code, and cloud. 3. Lead PoCs, promote tech talks, and sustainable innovations. Description: * Solid experience in platform engineering, DevOps, and SRE topics * Experience defining and deploying technical standards and tools in large-scale environments * Experience defining and deploying CI/CD pipelines (Terraform, Ansible, GitLab CI, ArgoCD, Jenkins, Azure DevOps) * Experience defining and deploying observability tools (e.g., Prometheus, Grafana, ELK, Datadog, New Relic, APM) * Experience with infrastructure as code and scripting languages such as Python, Bash, or Go for automation and scripts * Experience with cloud observability and security (monitoring, logging, alerting, compliance) * Experience with cloud platforms (GCP, AWS, or Azure) * Knowledge of DevSecOps security practices. * Hands-on experience with Site Reliability Engineering (SRE), including incident management and capacity planning * Certifications in SRE and/or Cloud (e.g., SRE Foundation, GCP Professional Architect, AWS Solutions Architect, CKA, GCP Professional Cloud Architect, AWS Solutions Architect, Azure Expert) * Bachelor's degree in Computer Science, Software Engineering, or related fields. * Preferred: Experience with GCP * Advanced English * Evaluate solutions and support implementation of DevOps tools and practices, ensuring integration with security policies * Define and evolve reference architecture and observability tools (APM, Prometheus, Zabbix, etc.) and reliability management * Define operational architecture standards for reliability and availability * Define and maintain operational architecture frameworks and guidelines, focusing on resilience and scalability * Define reliability metrics jointly with infrastructure and engineering teams * Promote automation practices for incidents, auto\-healing, chaos engineering, and automated runbooks * Support technical decisions during critical incidents, capacity planning, and architectural changes. * Lead implementation of tool PoCs * Promote tech talks with architects and engineers, disseminating SRE and DevOps engineering culture across the organization * Maintain an external perspective, continuously updating knowledge and proposing sustainable innovations. 2511290202181894410


