




Description: Requirements to apply for the position: * Experience in managing Kubernetes clusters and ecosystem (Helm, Service Mesh Istio/Linkerd, Ingress, OPA/Kyverno security policies, etc.); * Experience with Observability (Logs, Traces, Metrics) using tools such as Datadog, Prometheus, OpenTelemetry, Grafana, ELK; * Experience with CI/CD tools and deployment strategies (Canary, Blue Green, etc); * Experience with cloud infrastructure (GCP, AWS, or Azure), with preference for GCP; * Knowledge of network protocols (TCP/IP, HTTP, DNS), cloud network topologies, and integration troubleshooting; * Solid knowledge in at least one programming language (Python, NodeJS, Java, or Go) for automations and internal service development; * Knowledge in database performance optimization (SQL and NoSQL). It would be great if you have knowledge in: * Bachelor's degree in Computer Science, Computer Engineering, Information Technology, or related fields; * Kubernetes certifications (CKA, CKAD, CKS); * Cloud certifications (Associate, Professional, or Architect); * Experience or familiarity with financial industry compliance standards; * Knowledge in Engineering Platform/IDP architecture; * Software engineering knowledge (Design Patterns, best practices, and operation of different stacks). Help us design the solution! * Connect business needs from product teams with infrastructure and SRE, translating requirements into efficient, secure, and resilient platforms; * Design, implement, and maintain automation and monitoring tools to ensure platform resilience and reliability; * Define and implement functional and technical Observability requirements (APM, Logs, Metrics, and Tracing); * Establish and monitor SLIs (Service Level Indicators) and SLOs (Service Level Objectives), managing error budgets to balance innovation and stability; * Lead responses to critical incidents, minimizing business impact and conducting post\-mortem analyses to identify root causes; * Automate infrastructure provisioning and management as code (IaC) using Terraform, following GitOps best practices; * Collaborate with development teams to promote a culture of continuous improvement and design solutions that are inherently scalable, secure, and resilient. 2511090202181797403


