AI ML GenAI on NVIDIA H100 GPUs on Red Hat OpenShift AI
Last updated 7/2025
Duration: 1h 29m | .MP4 1920x1080 30 fps(r) | AAC, 44100 Hz, 2ch | 817.14 MB
Genre: eLearning | Language: English
Last updated 7/2025
Duration: 1h 29m | .MP4 1920x1080 30 fps(r) | AAC, 44100 Hz, 2ch | 817.14 MB
Genre: eLearning | Language: English
OpenShift & OpenShift AI on NVIDIA H100: From Bare-Metal to Production in One Day
What you'll learn
- Stand up a bare-metal H100 node, validate firmware & BIOS, and register it in a fresh OpenShift cluster
- Install and tune the NVIDIA GPU Operator with Multi-Instance GPU (MIG) profiles for maximum utilisation
- Deploy Red Hat OpenShift AI (RHOAI) and run a real Mistral LLM workload with Ollama
- Monitor, troubleshoot, upgrade, and scale the platform in production
Requirements
- One NVIDIA H100 (or other Ampere/Hopper) server—physical or virtualised
- A workstation that can SSH into the node and run the "oc" CLI
- (Optional) A Red Hat account to pull mirrored images
Description
Unlock the power of enterprise-grade AI in your own data center—step-by-step, from bare-metal to production-ready inference. In this hands-on workshop, you’ll learn how to transform a single NVIDIA H100 server and a lightweight virtualization host into a fully featured Red Hat OpenShift cluster running OpenShift AI, the NVIDIA GPU Operator, and real LLM workloads (Mistral-7B with Ollama). We skip the theory slides and dive straight into keyboards and terminals—every YAML, every BIOS toggle, every troubleshooting trick captured on video.
What you’ll build
A three-node virtual control plane + one bare-metal GPU worker, deployed via the new Agent-based Installer
GPU Operator with MIG slicing, UUID persistence, and live metrics in Grafana
OpenShift AI (RHODS) with Jupyter and model-serving pipelines
A production-grade load balancer, DNS zone, and HTTPS ingress—no managed cloud needed
Hands-on every step: you’ll inspect firmware through iDRAC, patch BIOS settings, generate a custom Agent ISO, boot the cluster, join the GPU node, and push an LLM endpoint you can curl in under a minute. Along the way, we’ll upgrade OpenShift, monitor GPU temps, and rescue a “Node Not Ready” scenario—because real life happens.
Who should enroll
DevOps engineers, SREs, and ML practitioners who have access to a GPU server (H100, H800, or even an A100) and want a repeatable, enterprise-compatible install path. Basic Linux and kubectl skills are assumed; everything else is taught live.
By course end, you’ll have a battle-tested Git repository full of manifests, a private Agent ISO pipeline you can clone for new edge sites, and the confidence to stand up—or scale out—your own GPU-accelerated OpenShift AI platform. Join us and ship your first on-prem LLM workload today.
Who this course is for:
- Machine Learning Engineers
- DevOps Engineers
- Site Reliability Engineers (SREs)
- Python Developers Exploring Infrastructure
- First Steppers into AI Operations
More Info