Back to Job Listings

Research Engineer (Foundation Model), Machine Learning Systems

SpringCube

Full time - Associate/Junior Executive

IT Services & Consulting

Singapore ( Onsite )

Published 3 weeks ago

Salary: Disclosed upon interview

Contact Employer
  • Share:
Send Feedback
Report This Job

Job Description

The SpringCube team curated the following job opportunity to help you in your job search. Explore the position below to find your next career move.

Research Engineer (Foundation Model), Machine Learning Systems

Company Overview
An innovative global incubator for platforms shaping the future of commerce, content, and entertainment, the company connects over 2.5 billion people worldwide through its suite of products. With a mission to inspire creativity and enrich lives, it strives to build a positive and safe online environment across its extensive global presence, employing more than 110,000 people across 30+ countries.

Role Summary
The Machine Learning (ML) Systems team is seeking a Research Engineer to enhance machine learning infrastructure and optimize large-scale parallel training for advanced deep learning models. This role includes end-to-end system development, from hardware acceleration to deployment, for models such as LLMs and stable diffusion.

Key Responsibilities

  • Optimize large-scale parallel training for advanced deep learning algorithms, including LLMs, multi-modality models, and reinforcement learning.
  • Research and develop accelerated computing architecture, management, and monitoring for machine learning systems.
  • Deploy distributed machine learning systems for training and inference.
  • Manage cross-layer optimization involving system algorithms, AI algorithms, and hardware for ML tasks (GPU, ASIC).

Qualifications

  • Minimum Requirements:
    • Bachelor’s degree or higher in distributed/parallel computing or a related field, with knowledge of recent advances in computing, networking, and hardware.
    • Proficiency with machine learning algorithms and frameworks like PyTorch and Jax.
    • Understanding of GPU/ASIC architecture.
    • Expertise in programming in a Linux environment, with languages like C/C++, CUDA, or Python.
  • Preferred Qualifications:
    • Experience in GPU-based HPC, RDMA networks (MPI, NCCL).
    • Knowledge of distributed training frameworks (e.g., DeepSpeed, FSDP).
    • Proficiency in AI compiler stacks (torch.fx, XLA, MLIR).
    • Background in cloud-based system design and large-scale data processing.
    • Expertise in CUDA programming and performance tuning (e.g., cutlass, triton).

Disclaimer:
SpringCube curates tech job listings from various company websites to support tech professionals in Singapore during these challenging times.

  1. No Endorsement: Job ads on SpringCube do not imply endorsement of their authenticity or quality.
  2. No Client Relationship: This company is not a client of SpringCube unless stated.
  3. Users must click to apply, redirecting to the employer’s career page.
  4. No Liability: SpringCube is not liable for inaccuracies.