Back to Job Listings

Principal Architect – Cloud and Observability

SpringCube

Full time - Principal Engineer

Healthcare Services & Tech

United States, Boston - Massachusetts

Published 2 weeks ago

Salary: Disclosed upon interview

Contact Employer
  • Share:
Send Feedback
Report This Job

Job Description

The SpringCube team curated the following job opportunity to help you in your job search. Explore the position below to find your next career move.

Company Overview
A leading U.S.-based healthcare organization is seeking a Principal Architect to shape observability and hybrid cloud practices across its enterprise. The organization delivers innovative health solutions, connecting patients, providers, and systems through advanced technology, cloud infrastructure, and AI-driven analytics to improve healthcare experiences and operational efficiency.

Summary
The Principal Architect will own enterprise observability and hybrid cloud architecture, establishing standards, reference designs, and technical direction for monitoring and multi-cloud infrastructure. This hands-on role requires building pipelines, defining telemetry strategies, and guiding teams in best practices for observability and cloud operations.

Responsibilities

Observability

  • Own the enterprise observability reference architecture covering metrics, logs, traces, and events across all environments (cloud and on-prem)
  • Drive an OpenTelemetry-first instrumentation strategy, including standard libraries, semantic conventions, collector topologies, and pipeline design
  • Build and operate telemetry pipelines on Grafana Mimir, Loki, and Tempo, including multi-tenant configurations, retention policies, and capacity planning
  • Define reliability measures: SLOs, SLIs, error budgets, and alerting frameworks consistently across all lines of business
  • Integrate observability tooling with incident management platforms such as ServiceNow ITOM and xMatters
  • Establish telemetry schema standards to ensure teams emit actionable and technically compliant data

Hybrid Multi-Cloud

  • Build and maintain reference architectures for hybrid environments: OpenShift on-prem with KVM/libvirt and Dell PowerFlex storage, plus Azure, AWS, and GCP
  • Lead standards for workload identity and federation using SPIFFE/SPIRE and cloud-native IAM patterns
  • Provide guidance on compute runtime selection (containers, VMs, bare metal, serverless) with a clear decision framework
  • Connect autoscaling and capacity planning behavior to telemetry signals
  • Advance FinOps maturity by integrating cost data, establishing unit economics, and promoting open billing standards

AI + Observability

  • Identify opportunities for AI/ML in observability: anomaly detection, root cause analysis, log clustering, and smarter alerting
  • Define observability standards for AI-powered systems, covering latency, token costs, model drift, and related signals
  • Ensure new AI-powered platforms are instrumented correctly from day one

Required Qualifications

  • 10+ years in infrastructure, cloud architecture, platform engineering, or SRE
  • 8+ years of architecture experience in observability, cloud infrastructure, or both
  • Solid experience with at least two of Azure, AWS, or GCP, including networking, identity, compute, and storage
  • 5+ years with Kubernetes in production (OpenShift, EKS, AKS, or GKE)
  • 5+ years with OpenTelemetry or similar frameworks (collectors, SDKs, semantic conventions, pipeline design)
  • 5+ years with observability platforms: Grafana/Mimir/Loki/Tempo, Prometheus, Datadog, Splunk, Dynatrace, or comparable tools
  • Experience defining SLOs/SLIs and building alerting strategies at an organizational level
  • Proven track record writing architecture standards that teams adopted
  • Strong communication skills with engineers and senior leadership

Preferred Qualifications

  • On-prem/private cloud experience (OpenShift Virtualization, KVM/libvirt, VMware, Dell PowerFlex)
  • Workload identity (SPIFFE/SPIRE) and zero-trust networking
  • Infrastructure-as-code (Terraform, Pulumi, Helm, ArgoCD)
  • Streaming platforms such as Kafka or Confluent
  • AIOps or ML-based anomaly detection experience
  • FinOps background: cloud cost optimization, chargeback, unit economics
  • Service mesh (Istio, Envoy, Linkerd) or eBPF-based tools (Cilium, Pixie)
  • Engagement in open-source communities (CNCF, OpenTelemetry)
  • Healthcare, insurance, or financial services experience (HIPAA/SOX familiarity)
  • Cloud certifications are a plus but not required

Disclaimer
SpringCube curates tech job listings from various company websites to support tech professionals in globally.

1. No Endorsement: Job ads on SpringCube do not imply endorsement of their authenticity or quality.
2. No Client Relationship: This company is not a client of SpringCube unless stated.
3. To Apply: Click the “Apply” button to be redirected to the hiring company’s application page for this job.
4. No Liability: SpringCube is not liable for inaccuracies.