




Job Summary: Experienced professional in platform engineering, DevOps, and SRE to define and implement technical standards, tools, and operational architectures. Key Highlights: 1. Solid experience in platform engineering, DevOps, and SRE 2. Definition and large-scale implementation of technical standards and tools 3. Promotion of incident automation, auto-healing, and chaos engineering Description: * Solid experience in platform engineering, DevOps, and SRE topics * Experience defining and implementing technical standards and tools in large-scale environments * Experience defining and implementing CI/CD pipelines (Terraform, Ansible, GitLab CI, ArgoCD, Jenkins, Azure DevOps) * Experience defining and implementing observability tools (e.g., Prometheus, Grafana, ELK, Datadog, New Relic, APM) * Experience with infrastructure-as-code and scripting languages such as Python, Bash, or Go for automation and scripting * Experience with cloud observability and security (monitoring, logging, alerting, compliance) * Experience with cloud platforms (GCP, AWS, or Azure) * Knowledge of DevSecOps security practices * Hands-on experience with Site Reliability Engineering (SRE), including incident management and capacity planning * Certifications in SRE and/or Cloud (e.g., SRE Foundation, GCP Professional Architect, AWS Solutions Architect, CKA, GCP Professional Cloud Architect, AWS Solutions Architect, Azure Expert) * Bachelor's degree in Computer Science, Software Engineering, or related fields * Preferred: Experience with GCP * Advanced English proficiency * Evaluate solutions and support implementation of DevOps tools and practices ensuring integration with security policies * Define and evolve reference architecture and observability tools (APM, Prometheus, Zabbix, etc.) and reliability management * Define operational architecture standards for reliability and availability * Define and maintain operational architecture frameworks and guidelines, focusing on resilience and scalability * Define reliability metrics jointly with infrastructure and engineering teams * Promote practices for incident automation, auto\-healing, chaos engineering, and automated runbooks * Support technical decision-making during critical incidents, capacity planning, and architectural changes * Lead implementation of PoCs for tools * Conduct tech talks with architects and engineers, promoting SRE and DevOps culture across the organization * Maintain an external perspective, continuously updating knowledge and proposing sustainable innovations. 2511290202181894410


