AI Engineering with Modal
Published 6/2025
Duration: 3h 24m | .MP4 1280x720 30 fps(r) | AAC, 44100 Hz, 2ch | 1.98 GB
Genre: eLearning | Language: English
Published 6/2025
Duration: 3h 24m | .MP4 1280x720 30 fps(r) | AAC, 44100 Hz, 2ch | 1.98 GB
Genre: eLearning | Language: English
A hands-on guide to building and deploying scalable AI systems with Modal.
What you'll learn
- Build and deploy scalable AI infrastructure by defining custom container images, specifying GPU/CPU resources, and managing persistent data with Modal Volumes.
- Develop a complete Automatic Speech Recognition (ASR) pipeline to transcribe long audio files in parallel using GPU-accelerated models on Modal.
- Fine-tune a transformer encoder model for a text classification task on a custom dataset, leveraging Modal for GPU-powered training and experiment management.
- Deploy trained machine learning models as scalable, live web APIs using Modal's built-in FastAPI endpoints for real-time inference.
- Launch a high-throughput, OpenAI-compatible API for Large Language Model (LLM) inference using vLLM on Modal.
Requirements
- Strong Python skills
- Comfortable with working in the terminal
Description
NOTE:This course is not complete and new content is being added weekly. Therefore the course is cheaper at the moment and the cost will increase over time as hours of more content is added.
Welcome toAI Engineering with Modal, your hands-on guide to building and deploying production-grade AI systems with nothing but Python.
This course is designed to transform your workflow. We'll ditch the complex YAML files, Dockerfiles, and cloud configuration panels. Instead, you'll learn how to define your entire AI infrastructure—from custom container images and on-demand GPUs to persistent storage and scalable web endpoints—directly within your Python code.
This is aproject-based coursewhere you won't just learn the theory; you'll build and deploy real-world AI applications from the ground up:
A Scalable ASR Pipeline:Build a robust system to transcribe long audio files in parallel using a GPU-accelerated Automatic Speech Recognition model.
A Fine-Tuned Classification Model:Fine-tune a modern transformer model (ModernBERT) on a custom dataset for a text classification task and deploy it as a live API.
A High-Throughput LLM Endpoint:Launch an OpenAI-compatible API for a powerful Large Language Model (like Qwen or Gemma) using vLLM for blazingly fast inference.
And More To Come!
By the end of this course, you'll be able to confidently take any AI model from a local script to a scalable, cloud-native application ready for production.
What you will master:
Modal Fundamentals:Go from zero to hero with Modal's core concepts, running functions remotely and in parallel.
Infrastructure as Code:Build custom container images, reserve powerful GPUs (A100s, H100s), manage CPU/memory, and use persistent Volumes—all in Python.
End-to-End AI Pipelines:Structure complex AI tasks like model training, batch processing, and inference into clean, manageable Modal applications.
Model Fine-Tuning at Scale:Leverage on-demand GPUs to run and manage fine-tuning jobs for modern transformer models.
Effortless Deployment:Deploy your trained models and LLMs as fast, scalable web APIs with just a few lines of code.
And More To Come!
This course is for anyone familiar with Python and interested in AI.
Who this course is for:
- Intermediate Python developers and software engineers looking to level up on building AI applications.
More Info