Back to Job Listings

Senior Data / Machine Learning Engineer – AI Solutions

SpringCube

Full time - Manager

IT Data Centre & Infrastructure

Singapore ( Hybrid )

Published 4 weeks ago

Salary: Above SGD15,000

Contact Employer
  • Share:
Send Feedback
Report This Job

Job Description

The SpringCube team curated the following job opportunity to help you in your job search. Explore the position above to find your next career move.

Senior Data / Machine Learning Engineer – AI Solutions

Company Overview:
This company specializes in AI-driven synthetic data platforms, enabling financial services, government agencies, and other industries to share data globally in a privacy-preserving manner. The platform anonymizes sensitive real data into synthetic data that mimics real data, ensuring 100% compliance with privacy laws.

Job Description:
As a Senior Data and Machine Learning Engineer, you will be responsible for transforming academic research into scalable, production-ready solutions for synthetic tabular data generation. This individual contributor (IC) role is ideal for someone with hands-on experience in scaling systems to handle large datasets and optimizing data pipelines for enterprise applications.

You will collaborate closely with research teams to optimize performance, ensuring seamless integration of systems, while handling data from financial institutions, government agencies, and other large-scale entities.

Key Responsibilities:

  • Work with Machine Learning concepts and algorithms to transform AI and data science code into scalable, production-ready systems.
  • Ingest data from enterprise relational databases such as Oracle, SQL Server, PostgreSQL, and MySQL, and data warehouses like Snowflake, BigQuery, Redshift, and Azure Synapse for large-scale analytics.
  • Ensure data quality by validating incoming data, checking for missing values, duplicates, and outliers.
  • Design and build scalable data pipelines using tools like Spark, Dask, Ray, Polars, and DuckDB.
  • Choose appropriate storage formats such as Parquet, Arrow, or CSV based on data processing needs.
  • Ensure scalability of systems from a single node to multiple nodes, optimizing data storage and retrieval across large datasets.
  • Utilize GPU acceleration and parallel processing for large-scale model training and data processing.
  • Implement robust error handling, automatic retries, and data recovery mechanisms in the event of pipeline failures.
  • Write clear documentation for data pipelines, workflows, and system architectures to support team collaboration.

Essential Skills and Qualifications:

  • 4+ years of experience in scaling data pipelines and machine learning systems.
  • Proficiency with Python and libraries such as Pandas, NumPy, Scikit-learn, Polars, and DuckDB.
  • Strong experience with data validation using tools like Pandera or Pydantic.
  • Hands-on experience with ETL/ELT pipelines across relational databases, data warehouses, and cloud storage.
  • Solid understanding of GPU parallelization for deep learning models, especially with PyTorch.
  • Experience with distributed computing frameworks like Dask or Ray.

Good to Have:

  • Familiarity with data lineage and metadata management systems.
  • Experience with Pytest for testing and validating research code.
  • Background in logging and monitoring in production environments.

Disclaimer:
SpringCube curates tech job listings from various company websites to support tech professionals in Singapore during these challenging times.

  1. No Endorsement: Job ads on SpringCube do not imply endorsement of their authenticity or quality.
  2. No Client Relationship: This company is not a client of SpringCube unless stated.
  3. Users must click to apply, redirecting to the employer’s career page.
  4. No Liability: SpringCube is not liable for inaccuracies.