




Job Summary: Dadoteca is seeking a professional to develop and implement OCR and Chemometrics solutions, driving innovation and professional development. Key Highlights: 1. Working on innovative OCR and Chemometrics projects 2. Collaborating with multidisciplinary teams and professional development 3. Researching new technologies in AI and Chemometrics Dadoteca is an innovative technology company dedicated to delivering high-quality solutions to our clients through a collaborative work environment that fosters professional development and innovation. **Responsibilities:** * Design and implement OCR models using advanced frameworks and libraries for extracting data from structured and unstructured documents. * Analyze, prepare, and pre-process large volumes of textual and numerical data for use in machine learning and deep learning models. * Develop data processing pipelines, including extraction, transformation, and storage of OCR results. * Integrate OCR solutions with other tools and systems to automate workflows and data analysis processes. * Conduct training and fine-tuning of OCR models to improve accuracy in specific scenarios, such as different languages, fonts, formats, and noise levels. * Explore and apply NLP (Natural Language Processing) techniques to enrich analysis and categorization of extracted text. * Apply Chemometrics and multivariate analysis techniques (e.g., PCA, PLS, multivariate regression, and classification methods) for modeling, interpreting complex data, and supporting decision-making. * Develop predictive models combining textual, numerical, and spectral data where applicable. * Collaborate with multidisciplinary teams to ensure integration of OCR and Chemometrics solutions into broader data analysis projects. * Monitor and improve model performance in production, ensuring scalability, robustness, and reliability. * Research new technologies related to OCR, artificial intelligence, and Chemometrics, staying current with industry trends. **Requirements:** * Bachelor’s degree in Computer Science, Engineering, Mathematics, Statistics, Chemistry, Chemical Engineering, or related fields. A postgraduate degree or specialization in Data Science, AI, or Chemometrics is desirable. * Proven experience in data science projects, with focus on OCR, image processing, and/or multivariate analysis. * Practical knowledge of Chemometrics, including techniques such as PCA, PLS, multivariate regression, classification methods, and model validation. * Advanced proficiency in deep learning frameworks such as TensorFlow, PyTorch, or Keras. * Experience with OCR libraries such as Tesseract, Google Vision, AWS Textract, ABBYY FineReader, or similar tools. * Proficiency in image pre-processing techniques (OpenCV or PIL) to enhance document quality. * Strong programming skills in Python or R, with emphasis on data science applications and statistical modeling. * Familiarity with relational and non-relational databases for data storage and querying. * Experience with code versioning tools (Git) and MLOps practices. **Competencies:** * Ability to translate complex business problems into efficient analytical and technical solutions. * Guide teams in high-complexity projects and contribute to colleagues’ technical growth. * Commitment to delivering high-quality solutions with measurable impact. * Ability to clearly and concisely present technical insights to both technical and non-technical audiences. * Proactivity in proposing innovative solutions and overcoming technical challenges. **Preferred Qualifications:** * Applied experience in Chemometrics using real-world data, including model interpretation and communication of results to business units. * Familiarity with pre-trained models such as Google Vision AI, AWS Textract, or Azure Cognitive Services. * Knowledge of advanced NLP techniques for analyzing and organizing extracted text. * Experience deploying OCR and chemometric models in scalable production environments, including cloud platforms (Azure, AWS, Google Cloud). * Relevant certifications, such as Microsoft Certified: Azure AI Engineer Associate or Google Cloud Professional Data Engineer.


