Job Summary:
We are looking for an experienced Data Engineer – AI & ML to design, build, and maintain scalable data pipelines that power AI and machine learning applications. You will work with large datasets, optimize data flows, and collaborate with data scientists and analysts to enable real-time insights and advanced analytics. The ideal candidate will have expertise in data engineering, cloud platforms, ETL frameworks, and big data technologies.
Job Summary:
Design and Build Data Pipelines
- Develop, test, and maintain ETL pipelines to move and transform data efficiently.
- Ensure scalability, efficiency, and maintainability of data pipelines for downstream analytics and AI/ML models.
- Collaborate with stakeholders to define data requirements and implement data processing solutions.
Data Integration
Integrate internal databases, APIs, third-party vendors, and flat files into a unified data ecosystem.
Optimize real-time and batch data ingestion systems.
Structure data to support analytics and AI/ML use cases.
Data Storage and Management
- Design and manage data storage solutions such as data lakes, cloud storage, and NoSQL/relational databases.
- Implement data security, backup, and disaster recovery best practices.
- Optimize storage systems for performance and cost efficiency.
Data Transformation
- Develop data transformation logic to clean, enrich, and standardize raw data.
- Ensure data accuracy, consistency, and integrity through validation and processing frameworks.
Automation and Optimization
- Automate data extraction, transformation, and loading (ETL) tasks to enhance efficiency.
- Optimize data workflows to reduce processing time and improve pipeline performance.
- Troubleshoot and resolve data pipeline performance bottlenecks.
Collaboration with Data Teams
- Work with Data Scientists, Analysts, and Business Teams to ensure the availability of high-quality data.
- Assist in data preparation for AI/ML model training and deployment.
Data Quality Assurance
- Implement data validation checks to maintain accuracy and completeness.
- Establish data quality standards and proactively identify and resolve data inconsistencies.
Monitoring and Maintenance
- Set up monitoring and logging for pipelines to detect failures or delays.
- Perform regular maintenance and upgrades on data infrastructure.
- Stay up to date with emerging data engineering technologies.
Documentation and Reporting
- Maintain comprehensive documentation for data pipeline architectures, ETL processes, and schemas.
- Create reports on data pipeline performance, identifying areas for improvement.
Stay Updated with Technology Trends
- Explore and implement cutting-edge tools, technologies, and best practices in data engineering and AI/ML.
- Participate in industry conferences, webinars, and training.
Qualifications & Experience:
Education:
Bachelor’s or Master’s degree in Computer Science, Information Technology, Data Engineering, or a related field.
Technical Skills:
- Proficiency in Python, Java, or Scala for data processing.
- Strong knowledge of SQL and relational databases (e.g., MySQL, PostgreSQL, MS SQL Server).
- Experience with NoSQL databases (e.g., MongoDB, Cassandra, HBase).
- Hands-on experience with ETL frameworks and tools (e.g., Apache NiFi, Talend, Informatica, Airflow).
- Expertise in big data technologies (e.g., Hadoop, Apache Spark, Kafka).
- Experience with cloud platforms (AWS, Azure, Google Cloud) and related data storage/processing services.
- Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes).
- Knowledge of data modeling concepts (star schema, snowflake schema).
- Understanding of data lakes, data warehousing architectures, and scalable storage solutions.
- Experience with version control systems (e.g., Git) and collaboration tools (e.g., Jira, Confluence).
Soft Skills:
- Strong problem-solving skills to address complex data challenges.
- Excellent communication skills for interacting with technical and non-technical stakeholders.
- Attention to detail with a commitment to data accuracy and integrity.
- Ability to thrive in a fast-paced, team-oriented environment.
Experience:
- 10+ years of total experience in software/data engineering.
- 5+ years of experience building and maintaining scalable data pipelines.
- Proven ability to implement data engineering solutions at scale.
- Experience working with data governance, compliance, and security.
Preferred Qualifications:
- Experience with machine learning and preparing data for AI/ML models.
- Knowledge of stream processing frameworks (e.g., Apache Kafka, Apache Flink).
- Cloud certifications (e.g., AWS Certified Big Data – Specialty, Google Cloud Professional Data Engineer).
- Experience with DevOps practices and CI/CD pipelines for data workflows.
- Familiarity with automation/orchestration tools (e.g., Apache Airflow, Luigi).
- Experience with data visualization and reporting tools (e.g., Tableau, Power BI).
Work Environment & Benefits:
- Collaborative and fast-paced work culture.
- Opportunity to work with cutting-edge data and AI/ML technologies.
- Career growth opportunities in data engineering and AI/ML fields.
If you are passionate about building next-generation data solutions that power AI and machine learning, we’d love to hear from you!
Please reach-out to h1b@infyshine.com