




We are looking for a Data Engineer to work on building data pipelines, both for ingesting new data and for constructing the layers of the Data Lake, as well as building variable/feature store books. The activities will involve working on strategic projects with large-scale clients, in an embedded allocation model within the client. Technologies * Spark (Python or Scala) and PySpark * AWS technologies: EMR, S3, IAM * Airflow (orchestration) * OCI Data Flow * OCI Storage * Shell Script * Docker * Git * Bitbucket Soft Skills * Experience building distributed pipelines using Spark * Practical experience in AWS environments, especially EMR and S3 * Experience with pipeline orchestration * Knowledge of cloud data architecture (AWS, OCI, or GCP) * Ability to analyze existing pipelines and propose adjustments for optimization during migration * Experience with automation and scripting (shell) Hard Skills * Building and migrating medium- and large-scale Spark pipelines * Proficiency in Airflow (DAGs, operators, sensors, best practices) * Ability to debug and optimize distributed jobs * Knowledge of version control (Git) and GitFlow workflow * Writing clean, secure, and testable code * Ability to contribute to defining and improving technical standards for the team Soft Skills * Autonomy and ownership over deliverables and migration stages * Clear communication with architecture, data, and infrastructure teams * Proactivity in identifying and resolving issues * Focus on quality, documentation, and governance of migrated pipelines * Ability to solve problems creatively and efficiently


