




Job Summary: Data Engineering Intern to support the team in developing and maintaining data pipelines, with a focus on web data collection, processing, and organization. Key Highlights: 1. Will support the development of reliable and scalable databases. 2. Focus on gaining practical experience and professional growth. 3. No prior professional experience required. **Description and Responsibilities:** **Schedule:** 30 hours per week. 4 days remote and 1 day onsite. **Level:** Internship **Employment Type:** Not specified *We are seeking a Data Engineering Intern to support the team in developing and maintaining data pipelines, with a focus on automated data collection, processing, and organization of information sourced from the web.* *The candidate will handle structured and unstructured data, contributing to the construction of reliable and scalable databases that support analytics, data products, and internal systems.* **Key Responsibilities** * Develop and maintain automated data collection routines from public internet sources * Implement web scraping and web crawling processes * Clean, transform, and structure data in various formats * Parse content such as HTML, PDF, and other formats * Support segmentation and organization of large volumes of text for use in information retrieval systems (e.g., chunking) * Assist in building and automating pipelines using Python and/or SQL * Document pipelines, data flows, and best practices **The focus will be on gaining practical experience, aiming for a balance between productivity and professional growth.** **Requirements:** Currently enrolled in an undergraduate program in Computer Science, Engineering, Information Systems, Data Science, or related fields Expected graduation: 2027.1 to 2028.1 Basic knowledge of Python Familiarity with SQL and databases Interest in data engineering, automation, and data systems Analytical, organized profile with strong willingness to learn Aptitude for handling raw and unstructured data **Preferred Qualifications** Academic or personal experience with web scraping and web crawling Knowledge of libraries such as BeautifulSoup, Scrapy, Selenium, or similar Experience parsing PDF files Knowledge of Git and version control Exposure to data pipelines or cloud-based data environments No prior professional experience is required, but personal or academic projects in **Data Engineering** will be valued. **Benefits:** Transportation allowance


