




Description: * Bachelor's degree completed; * Knowledge of technologies, solutions, and Big Data concepts; * Proficiency in at least one programming language; * Familiarity with Git code versioning; * Understanding of the data lifecycle and concepts such as data lineage, governance, privacy, retention, anonymization, etc.; * Collaborative mindset with Data Science and Analytics teams, and business teams to understand and meet data needs; * Excellent communication skills, proactive sharing, and context-seeking to collaborate effectively across different teams. Differentiators: * Experience developing and maintaining ETL pipelines to process large volumes of data, ensuring data quality and integrity; * Proficiency in Python; * Knowledge of GCP cloud services: Storage Transfer Service, Cloud Storage, Cloud Functions, Pub/Sub, DataFlow, BigQuery, Dataplex, Cloud Computer; * Experience with Apache Airflow or other data workflow tools; * Experience with columnar storage solutions and/or data lakehouse concepts; * Knowledge of CI/CD strategies, infrastructure as code (Terraform), observability, among others; * Understanding of how optimization works in various scenarios and how it can be improved, such as query performance, service scalability, and cluster management; * Curiosity about rapidly evolving technologies in the data engineering domain and eagerness to learn and master them, bringing significant impact to our team. * Develop systems and processes to collect data from various sources, such as databases, applications, sensors, and devices; * Design and maintain databases and data storage systems, including relational and non-relational databases; * Prepare and clean collected data, ensuring it is ready for analysis. This involves data transformation, aggregation, and standardization; * Integrate data from multiple sources into a unified data environment, ensuring data consistency and integrity; * Create and maintain data pipelines that efficiently automate data transfer and transformation; * Implement security measures to protect data against unauthorized access and leaks; * Collaborate to provide datasets ready for analysis; * Monitor data system performance and optimize scalability and efficiency; * Maintain complete and up-to-date documentation of processes, data flows, and system architecture. 2510180202231113370


