Site Reliability Engineer Job at Berkley Hunt, San Jose, CA

clc1SDhzcE9NWHZvdzJlQ0pDZFJRVFlTY0E9PQ==
  • Berkley Hunt
  • San Jose, CA

Job Description

Senior Site Reliability Engineer (GPU Compute) | Hybrid – Bay Area, CA

Berkley Hunt is supporting a fast-growing AI startup building a high-performance, cloud-native platform to power cutting-edge machine learning workloads. As they scale, they’re hiring a Senior/Staff Infrastructure Engineer to lead the development of a scalable GPU compute environment from the ground up.

About the Role:

This is a high-impact role for an experienced infrastructure engineer who thrives in fast-paced environments and wants to shape the future of AI infrastructure. You’ll design, build, and operate the systems that enable high-throughput GPU workloads at scale—collaborating closely with the core engineering team to optimize performance, efficiency, and reliability.

If you're excited about solving deep technical challenges in distributed compute and cloud automation, this could be a standout opportunity.

Responsibilities:

  • Build and maintain a large-scale, distributed GPU compute platform powering AI workloads.
  • Develop backend systems in Python to orchestrate GPU jobs, manage routing, observability, and capacity.
  • Design and implement infrastructure with tools like Terraform, Ansible, and Kubernetes across cloud and bare metal environments.
  • Own the reliability, scalability, and performance of the platform, from provisioning to deployment and monitoring.
  • Collaborate with the engineering team to shape infrastructure vision and technical strategy over the next 1–5 years.
  • Drive automation and improvements to minimize operational overhead and scale efficiently.

Requirements:

  • 6+ years of experience in cloud infrastructure or backend engineering roles.
  • Deep knowledge of distributed compute systems, especially involving GPU orchestration.
  • Proficiency with Python and infrastructure-as-code tools (e.g., Terraform, Ansible).
  • Solid experience with Kubernetes and CI/CD pipelines.
  • Strong understanding of cloud platforms (AWS, GCP, or Azure); bare metal experience is a plus.
  • Excellent problem-solving skills and a proactive, ownership-driven mindset.

Nice to Have:

  • Experience at a high-growth startup or in scaling large infrastructure systems.
  • Familiarity with GPU resource scheduling and performance optimization.
  • Hands-on experience with observability stacks (Prometheus, Grafana, Loki, Thanos).
  • A passion for automation, infrastructure design, and moving fast without breaking things.

Job Tags

Similar Jobs

Ashley Gardens of Mt. Vernon

Wellness Director (RN /LPN) Job at Ashley Gardens of Mt. Vernon

 ...integrity, inclusivity, positivity, and wellness. If you want to help us build a culture...  ...meet the team at: As the Wellness Director youre the clinical leader and oversee the...  ...Vernon look for in a Wellness Director (Health Services): Maintains a current state... 

Garden Plaza at Post Falls

Lifestyles Services Director Activities Department Manager Job at Garden Plaza at Post Falls

***ACTIVITIES DIRECTOR***Position SummaryThe Lifestyles Services Director plans, organizes, develops and directs the overall operations...  ...department hours appropriatelyMaintains an approachable management styleTreats all associates respectfullyAdditional Requirements... 

Diverse Lynx

Service Delivery Manager Job at Diverse Lynx

 ...Job Title: Service Delivery Manager Location: Seattle, WA Duration: Fulltime Desired Skills: Customer Relationship Management Job Description : Build and maintain strong relationships with clients, understanding their needs and expectations... 

OLLY

Temporary Growth Marketing Manager Job at OLLY

 ...Accepted file types: pdf, doc, docx, txt, rtfEnter manuallyAccepted file types: pdf, doc, docx, txt, rtfAre you currently employed by Unilever? * Select...Have you been previously employed by Unilever? * Select...Are you a U.S citizen or authorized to work in the United... 

21c Museum Hotels

Valet/Shuttle Driver Job at 21c Museum Hotels

 ...General Purpose: Under general supervision, provides porter and valet services to hotel guests and ensures their satisfaction and...  ...enthusiasm for all things 21c. Must have and maintain a valid driver's license. Must provide a driving record of 3 points or less...