Back to Job Listings

Staff Platform Engineer – High Performance Computing Platform Management

SpringCube

Full time - Senior Associate/ Asst Manager

IT Services & Consulting

Singapore, All Areas

Published 5 days ago

Salary: SGD5,000 - SGD10,000

Contact Employer
  • Share:
Send Feedback
Report This Job

Job Description

The SpringCube team curated the following job opportunity to help you in your job search. Explore the position below to find your next career move.

Company Overview
This organization is a leading provider of high-performance computing (HPC) solutions and cloud infrastructure services. It specializes in building resilient, scalable, and secure computing platforms for enterprise and research applications. The company fosters a culture of technical excellence, innovation, and collaboration, providing teams with cutting-edge tools and resources to drive performance and efficiency in HPC environments.

Job Title: Staff Platform Engineer – High Performance Computing Platform Management
Location: Singapore, Singapore
Employment Type: Full-time, On-site
Department: Cloud Infrastructure and Services – Big Data Platform

Job Description
Responsibilities
• Lead a team to deliver resilient, scalable, and secure HPC platforms, including compute nodes, storage systems, networks, and job scheduling systems.
• Design, implement, and manage HPC infrastructure platforms to meet organizational needs.
• Develop storage solutions for HPC workloads to ensure efficient data storage and retrieval.
• Design and implement high-performance networking solutions, including InfiniBand, Ethernet, and other interconnects.
• Plan and manage HPC resource capacity, including forecasting, procurement, and deployment of new hardware and software.
• Manage HPC clusters, including optimization, monitoring, troubleshooting, and job scheduling/resource allocation.
• Ensure security and compliance of the HPC infrastructure, including access controls, patches, and regular security audits.
• Collaborate with stakeholders such as data scientists and developers to optimize application performance and provide technical support on HPC usage.

Qualifications
• Degree in Computer Science, Computer Engineering, or a related field.
• 8+ years of experience managing HPC systems, including Linux, Unix, or equivalent operating systems.
• Strong understanding of HPC architectures, including clusters, grids, and cloud environments.
• Experience with HPC job scheduling systems such as Slurm, Torque, or LSF.
• Expertise in storage systems, including SAN, NAS, and object storage.
• Experience with high-performance networking, including InfiniBand and Ethernet interconnects.
• Familiarity with cloud computing platforms such as AWS, Azure, or Google Cloud.
• Proficiency in scripting languages like Python, Perl, or Bash.
• Experience with containerization tools (Docker, Kubernetes) and complementary technologies including Knative, Run:AI, Grafana, Prometheus, Kyverno, ArgoCD, Rancher, NVIDIA BCM, and NVIDIA Superpod architecture.
• Proven experience in leading engineering teams.

Nice to Have
• Certifications in NVIDIA AI Infrastructure and Operations or Certified Kubernetes Administrator.
• Experience with machine learning or deep learning frameworks such as TensorFlow or PyTorch.
• Familiarity with agile development methodologies and version control systems like Git.

Disclaimer
SpringCube curates tech job listings from various company websites to support tech professionals in Singapore.

  1. No Endorsement: Job ads on SpringCube do not imply endorsement of their authenticity or quality.
  2. No Client Relationship: This company is not a client of SpringCube unless stated.
  3. To Apply: Click the “Apply” button to be redirected to the hiring company’s application page for this job.
  4. No Liability: SpringCube is not liable for inaccuracies.