




Job Summary: A professional to develop and maintain scalable data pipelines, working with large volumes of data and implementing data engineering solutions. Key Highlights: 1. Work with large volumes of data, ensuring quality and integrity. 2. Collaborate with technical and business teams to deliver efficient solutions. 3. Develop distributed processing and optimization solutions. Responsibilities * Develop and maintain scalable and high-performance data pipelines. * Work with large volumes of structured and unstructured data, ensuring quality, integrity, and availability. * Implement integrations via APIs (REST and/or messaging), performing data transformation and workflow automation. * Develop distributed processing solutions and query optimization. * Automate deployments and resource provisioning using infrastructure-as-code principles. * Orchestrate data workflows, ensuring monitoring, versioning, and observability. * Collaborate with technical and business teams to translate requirements into efficient data engineering solutions. Technical Requirements (Hard Skills) * Solid experience in Python for data manipulation, automation, and integrations. * Solid experience in advanced SQL, including query optimization and data modeling. * Knowledge of distributed and parallel processing, applying cluster computing concepts. * Experience with cloud-based data architecture (preferably AWS). Data integration via APIs, authentication, error handling, and integration patterns. * Code versioning and automation practices (Git and DevOps/DataOps). * Familiarity with infrastructure-as-code and automated provisioning. * Knowledge of pipeline orchestration and governance of process execution. * Familiarity with distributed query engines and large-scale processing optimization. * Experience with data workflow orchestration using tools such as Airflow or similar (not mandatory, but desirable). Desirable Knowledge (Tools) Not mandatory, but considered differentiators for the full-level position: * Version control and CI/CD platforms (e.g., GitLab); * Data processing and cluster tools (Databricks, Spark); * Data lake platforms and distributed SQL engines (Trino, Dremio); * Pipeline orchestrators (Airflow); * Cloud solutions (AWS) and infrastructure-as-code (Terraform); * API integration and consumption; Prior experience with these environments will be considered a differentiator, but we seek professionals with a solid technical foundation and strong learning capability.


