Back to Job Listings

Senior Responsible AI Scientist – AI Evaluation for Large Language Models (LLMs)

SpringCube

Full time - Manager

Artificial Intelligence AI

Singapore ( Hybrid )

Published 2 weeks ago

Salary: SGD10,000 - SGD15,000

Contact Employer
  • Share:
Send Feedback
Report This Job

Job Description

The SpringCube team curated the following job opportunity to help you in your job search. Explore the position below to find your next career move.

Senior Responsible AI Scientist – AI Evaluation for Large Language Models (LLMs)

Company Overview

This company is an AI assurance venture focused on ensuring the safe, reliable, and responsible deployment of artificial intelligence. Their work spans AI validation across critical areas such as accuracy, robustness, explainability, fairness, privacy, and security. They assist companies in evaluating and stress-testing AI systems to ensure they meet the highest standards for safety and performance.

Job Description

We are seeking a Senior Responsible AI Scientist to lead the design and implementation of robust frameworks for evaluating Large Language Models (LLMs) and generative AI systems. The role involves developing evaluation strategies for models like GPT, BERT, and T5, ensuring that they are safe and effective for real-world applications. You will collaborate with clients and teams to refine evaluation metrics and implement cutting-edge research to address the challenges of LLM evaluation.

As the Senior Responsible AI Scientist, you will:

  • Lead the creation and execution of frameworks to evaluate the performance of generative AI systems, including foundation models and fine-tuned models.
  • Establish metrics and benchmarks for model quality, including output fidelity, diversity, creativity, and bias detection.
  • Perform technical evaluations and “red-team” tests on LLMs, assessing them for robustness, performance, bias, and vulnerabilities such as prompt injection attacks.
  • Work closely with clients to design custom evaluation methods based on scientific research tailored to their needs.
  • Collaborate with product management teams to build AI evaluation frameworks and tools that assess the robustness, explainability, fairness, privacy, safety, and security of LLMs.
  • Curate and manage large, high-quality datasets for evaluating LLMs, ensuring the data is ethically sourced and free from bias.
  • Mentor junior data scientists and contribute to the advancement of LLM evaluation methodologies.
  • Stay up-to-date with the latest advancements in NLP and LLM evaluation, applying cutting-edge methods to improve model performance.

Key Responsibilities

  • Design and implement evaluation frameworks for LLMs, including models such as GPT, BERT, T5, and others.
  • Define metrics for assessing model performance, including perplexity, BLEU, ROUGE, accuracy, coherence, and bias detection.
  • Manage large language datasets, ensuring quality and ethical considerations in data curation.
  • Mentor junior team members and guide them in best practices for LLM evaluation.
  • Collaborate with research teams to integrate the latest advancements in NLP into evaluation methodologies.

Qualifications

  • 5 to 8 years of experience in deploying and evaluating LLMs in real-world applications.
  • Strong experience in evaluating LLMs using metrics like perplexity, BLEU, ROUGE, and human-centered evaluation techniques.
  • Proven experience with large, complex language datasets, including text preprocessing and tokenization.
  • Strong programming skills in Python, with experience in building automated model evaluation pipelines.
  • Excellent written and verbal communication skills, with the ability to explain technical concepts to non-technical stakeholders.
  • Passion for the safe and responsible use of AI, with a focus on LLMs.

Nice to Have

  • Published research in the field of generative AI or model evaluation.
  • Hands-on experience with model explainability tools and methods.
  • Familiarity with cloud-based platforms (e.g., AWS, GCP) for scalable model evaluation and deployment.

Disclaimer: SpringCube curates tech job listings from various company websites to support tech professionals in Singapore during these challenging times.

  1. No Endorsement: Job ads on SpringCube do not imply endorsement of their authenticity or quality.
  2. No Client Relationship: This company is not a client of SpringCube unless stated.
  3. To apply, click the “Apply” button to be redirected to the hiring company’s application page for this job.
  4. No Liability: SpringCube is not liable for inaccuracies.