




**Responsibilities and duties:** Design and maintain efficient data pipelines for ingestion, transformation, and availability. Work with AWS Athena, Glue, and S3 to query and organize large volumes of data in Parquet format. Integrate and model data in MySQL and PostgreSQL for transactional and analytical systems. Implement and maintain ETL/ELT processes using tools such as Apache Spark, Airflow, Databricks, etc. Create and maintain dashboards and reports in Superset and Power BI, ensuring reliability and good user experience. Collaborate with engineering, data science, and product teams to support data-driven decisions. Ensure data governance, quality, versioning, and documentation. Implement best practices for security and compliance throughout the data lifecycle. **Requirements and qualifications:** Practical experience with AWS Athena and Parquet formats. Strong knowledge of ETL/ELT and orchestration tools such as Apache Airflow. Experience with Python for data manipulation and transformation. Proficiency in SQL and experience with MySQL and PostgreSQL. Knowledge of Apache Spark and distributed data processing. Experience creating dashboards in Power BI and/or Apache Superset. Experience with data pipelines in cloud environments (AWS, Azure, or GCP). Ability to work independently, with strong communication and technical clarity. **Desirable differentiators:** Experience in data governance and data cataloging (Glue Data Catalog, DataHub, Apache Atlas). **Experience with streaming data pipelines (e.g.:** Kinesis, Kafka). Knowledge of integrations with data observability tools. Previous experience in agile teams and multidisciplinary squads.


