Machine Learning For Embedded Systems With Arm Ethos-U Npu
Published 9/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.70 GB | Duration: 4h 43m
Published 9/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.70 GB | Duration: 4h 43m
Deploy CNNs and AI models on ARM-based embedded devices with Ethos-U NPU, TensorFlow Lite Micro, and Alif E7 ML kit
What you'll learn
Learn the Full Workflow of Tiny Machine Learning Model on Embedded Devices
Understand How Testor Flow Lite for Microcontroller (TFLM) Library will be parse and run the Machine Learning Model underence on your embedded device
Understand the Standard Machine Learning Models Limitations on Embedded Systems and the needs to have different and optimized flow for Limited Resources Devices
Learn How ARM had helped to create and define dedicated hardware , architectures and compiler to allow Machine Larning Model Inference on embedded devices
You will get to lear ARM based Machine Learning Hardware Accelerators families (Ethos-U) and associated System On Chip Design Integration of those Accelerators
Requirements
You should have some understanding of embedded systems based devices and their limitations
Some basic understanding of ARM based architectures and System integration
Description
Machine Learning for Embedded Systems with ARM Ethos-UAre you ready to bring the power of machine learning to the world of embedded systems? This course gives you a complete, hands-on journey into how modern AI models — like CNNs for vision and audio tasks — can be deployed efficiently on ARM-based platforms with dedicated NPUs.Unlike most machine learning courses that stop at training, here you will go end-to-end, from model design all the way to running inference on real embedded hardware.What you’ll learnCore ML theory for embedded devicesUnderstand the key stages of a neural network execution pipeline.Learn the roles of convolution, flattening, activation functions, and softmax in CNNs.Build a strong foundation in how ML operations are optimized for resource-constrained devices.Model preparation workflowTrain your model in TensorFlow.Convert it to a lightweight .tflite model.Optimize and compile it with the ARM Vela compiler to generate instructions for the Ethos-U NPU.Running inference on embedded devicesSee how the TensorFlow Lite Micro (TFLM) runtime executes models in C++.Understand how ML operations are dispatched to CMSIS-NN kernels and the Ethos-U hardware accelerator for maximum efficiency.Get a clear picture of the full inference path from model to silicon.Hands-on with real hardwareWork with the Alif E7 ML development kit to put theory into practice.Step through board setup and boot.Explore the Alif E7 block diagram to understand its ML-capable architecture.Clone, build, and deploy Keyword Spotting and Image Classification demos.Run the models on the board and observe real-time outputs.Why this course is uniqueBridges the gap between machine learning theory and embedded deployment.Covers the complete workflow from training to NPU execution — not just pieces in isolation.Demonstrates everything on a real ARM-based platform with AI acceleration.Practical, hardware-driven approach using the Alif E7 ML dev kit with projects you can reproduce on a Windows machine.Whether you are an embedded engineer looking to break into AI, or a machine learning practitioner curious about deploying on hardware accelerators, this course will give you the knowledge and practical skills to run ML models efficiently on modern embedded systems.Enroll now and start your journey into embedded machine learning with ARM Ethos-U!
Overview
Section 1: Machine Learning For Embedded Devices Architecture Overview
Lecture 1 Tiny Machine Learning Model Developement Flow
Lecture 2 Different Machine Learning Models Overview
Lecture 3 Standard Trained ML Models Challenge on Embedded Devices
Lecture 4 CNN Model As Use Case for Embedded Devices
Lecture 5 Convolution Stage
Lecture 6 Activation (RELU) Satge
Lecture 7 Pooling (Optional) Stage
Lecture 8 Stacking Multiple Layers
Lecture 9 Flattern
Lecture 10 Dense/Connected Layer
Lecture 11 SoftMax and Final Decision
Section 2: Tensor Flow Light For Microcontroller Based Model
Lecture 12 Tensor Flow Lite for Microcontroller (TFLM) Based ML Models Generation Flow
Lecture 13 Tensor Flow Main Framework Training Stage
Lecture 14 H5 to .TFLite File Convertion
Lecture 15 ARM NPU Vela Compilation Stage
Lecture 16 Summary of Supported TFLM Machine Learning Operations
Section 3: ARM NPU Vela Compiler
Lecture 17 Vela Compiler Use Case
Lecture 18 Vela Compiler Work Flow Overview
Lecture 19 Vela Compiler Installation Pre-requisites
Lecture 20 Vela Compiler Installation
Lecture 21 Vela Compiler Supported Commands Line Summary
Lecture 22 Vela Compiler System Configuration File
Lecture 23 Supported Memory Configurations Modes
Lecture 24 Vela Compiler Tuning Configuration File
Section 4: TFLM Based Machine Learning FlatBuffer for ARM NPU Based Hardware
Lecture 25 .TFLite Vela Compiled Output File Generation Summary
Lecture 26 Flat Buffer File Format
Lecture 27 Dump the Vela Output .TFLite Flat Buffer to JSON File Format Representation
Lecture 28 Flat Buffer File JSON presentation Example
Lecture 29 Flat Buffer Elements: Opcode Table Section
Lecture 30 Flat Buffer Elements: Buffers Section
Lecture 31 Flat Buffer Elements: SubGraph Section
Lecture 32 Flat Buffer Elements: Machine Learning Operators Opcodes Indexing
Lecture 33 Flat Buffer Elements: Tensors Indexing
Lecture 34 Flat Buffer Elements: Buffers Indexing
Lecture 35 NPU CUstom Operation ETHOSU_CONV_2D
Lecture 36 ETHOSU_CONV_2D NPU Operation Data Stream
Lecture 37 ETHOSU_CONV_2D NPU Operation Embedded Memory Informations
Lecture 38 .TFLite Flat Buffer File Metadata Informations Part1
Lecture 39 .TFLite Flat Buffer File Metadata Informations Part2 (Memory Usage)
Lecture 40 .TFLite Flat Buffer File Metadata Informations Part3 (Architecture Config)
Lecture 41 .TFLite Flat Buffer File Metadata Example Summary
Section 5: ARM Ethos-U NPU Input Data Stream
Lecture 42 ETHOSU_CONV_2D Operator Input Data Stream Overview
Lecture 43 DMA Transfer Commands
Lecture 44 DMA Transfer Commands Summary
Lecture 45 Memory Regions Informations
Lecture 46 Looping and Tiling Logic Overview
Lecture 47 Tiling Logic
Lecture 48 NPU Micro-Operations
Section 6: ARM ETHOS-U/N NPU (Embedded AI Hardware Accelerators) Families
Lecture 49 High Performance Vs Low Power ARM ETHOS NPUs (ETHOS-U Vs ETHOS-N)
Lecture 50 ARM ETHOS-U Low Power NPUs Usage
Lecture 51 ARM Cortex-M55 + Ethos-U Hardware System Integration Overview
Lecture 52 ARM Cortex-M55 + Ethos-U Hardware System Integration Example
Lecture 53 ARM Ethos-U & Cortex-M/A System Inegration Topologies
Lecture 54 ARM NPU Hardware Block Diagram Overview
Lecture 55 NPU Functional Block Diagram
Section 7: Tensor Flow Light for Microcontroller (TFLM) C++ Runtime Library
Lecture 56 TFLM Top Level Flow
Lecture 57 TFLM Interpreter Initialization Stage
Lecture 58 TFLM Initialization: Arena Memory Allocation
Lecture 59 TFLM Initialization: Interpreter Instantiation Part1
Lecture 60 TFLM Initialization: Interpreter Instantiation Part2
Lecture 61 TFLM Nodes Allocation: Tensors Allocation Part1
Lecture 62 TFLM Nodes Allocation: Tensors Allocation Part2
Lecture 63 TFLM Nodes Allocation: Tensors Allocation Part3
Lecture 64 TFLM Nodes Allocation: Tensors Allocation Part4
Lecture 65 TFLM Operators Invokation Overview
Lecture 66 TFLM Operators Invokation: Kernel & DSP Based Operators
Lecture 67 TFLM Operators Invokation: Custom NPU Ethos-U Based Operator
Section 8: ARM CMSIS-NN (Neural Network) Library
Lecture 68 ARM CMSIS-NN Role in Machine Learning Execution Flow
Lecture 69 CMSIS-NN As Neural Library for Cortex-M Micro-Processors
Lecture 70 CMSIS-NN Software Architecture Design
Lecture 71 CMSIS-NN Compile Time Feature Flags Selection
Lecture 72 CMSIS-NN APIs Summary
Lecture 73 CMSIS-NN Processors Targets Specific Implementation
Section 9: Alif E7 Board For Embedded Based Machine Learning Use Cases
Lecture 74 Alif E7 System on Chip Hardware Block Diagram
Lecture 75 Alif E7: High Performance (HP) vs High Efficiency (HE) ETHOS-U55 ARM NPUs
Lecture 76 Alif E7 Development Kit
Lecture 77 Alif E7 Development Kit Schematic Overview
Lecture 78 Alif E7 Development Kit Jumpers Configuration
Lecture 79 Alif E7 Development Kit Presentation
Lecture 80 Booting Alif E7 Development Kit
Lecture 81 Live Booting of Alif E7 ML Development Kit
Lecture 82 Alif E7 Development Kit Builtin Example
Section 10: Alif E7 Examples & Setup Environment Guide
Lecture 83 Setup Guide Slides and Repository Links
Lecture 84 Alif E7 Examples & Demos Repository
Lecture 85 Alif E7 Windows Machine Configuration
Lecture 86 Alif E7 Required Tools Installation on Windows Machine
Lecture 87 Python > 3.10 Package Installation
Lecture 88 Tools & Examples Projects Repositories Cloning and Setup
Lecture 89 How to Configure & Build the Key Word Spotting Example (KWS)
Lecture 90 How to Configure & Build the Key Word Spotting Example (KWS): Configure Stage
Lecture 91 How to Configure & Build the Key Word Spotting Example (KWS): Ninja Build Stage
Lecture 92 Inspect the Output KWS Example .axf File
Lecture 93 Image Classification Use Case
Lecture 94 Prepare to Execute our Machine Learning Examples (KWS & Image Classification)
Lecture 95 Running the Key Word Spotting (KWS) Example on Alif E7 ML Board
Lecture 96 Running the Image Classification (image_class) Example on Alif E7 ML Board
Embedded Systems developers who wants to start learning Machine Learning for embedded devices