Machine Learning For Embedded Systems With Arm Ethos-U Npu

Posted By: ELK1nG

Machine Learning For Embedded Systems With Arm Ethos-U Npu
Published 9/2025
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 1.70 GB | Duration: 4h 43m

Deploy CNNs and AI models on ARM-based embedded devices with Ethos-U NPU, TensorFlow Lite Micro, and Alif E7 ML kit

What you'll learn

Learn the Full Workflow of Tiny Machine Learning Model on Embedded Devices

Understand How Testor Flow Lite for Microcontroller (TFLM) Library will be parse and run the Machine Learning Model underence on your embedded device

Understand the Standard Machine Learning Models Limitations on Embedded Systems and the needs to have different and optimized flow for Limited Resources Devices

Learn How ARM had helped to create and define dedicated hardware , architectures and compiler to allow Machine Larning Model Inference on embedded devices

You will get to lear ARM based Machine Learning Hardware Accelerators families (Ethos-U) and associated System On Chip Design Integration of those Accelerators

Requirements

You should have some understanding of embedded systems based devices and their limitations

Some basic understanding of ARM based architectures and System integration

Description

Machine Learning for Embedded Systems with ARM Ethos-UAre you ready to bring the power of machine learning to the world of embedded systems? This course gives you a complete, hands-on journey into how modern AI models — like CNNs for vision and audio tasks — can be deployed efficiently on ARM-based platforms with dedicated NPUs.Unlike most machine learning courses that stop at training, here you will go end-to-end, from model design all the way to running inference on real embedded hardware.What you’ll learnCore ML theory for embedded devicesUnderstand the key stages of a neural network execution pipeline.Learn the roles of convolution, flattening, activation functions, and softmax in CNNs.Build a strong foundation in how ML operations are optimized for resource-constrained devices.Model preparation workflowTrain your model in TensorFlow.Convert it to a lightweight .tflite model.Optimize and compile it with the ARM Vela compiler to generate instructions for the Ethos-U NPU.Running inference on embedded devicesSee how the TensorFlow Lite Micro (TFLM) runtime executes models in C++.Understand how ML operations are dispatched to CMSIS-NN kernels and the Ethos-U hardware accelerator for maximum efficiency.Get a clear picture of the full inference path from model to silicon.Hands-on with real hardwareWork with the Alif E7 ML development kit to put theory into practice.Step through board setup and boot.Explore the Alif E7 block diagram to understand its ML-capable architecture.Clone, build, and deploy Keyword Spotting and Image Classification demos.Run the models on the board and observe real-time outputs.Why this course is uniqueBridges the gap between machine learning theory and embedded deployment.Covers the complete workflow from training to NPU execution — not just pieces in isolation.Demonstrates everything on a real ARM-based platform with AI acceleration.Practical, hardware-driven approach using the Alif E7 ML dev kit with projects you can reproduce on a Windows machine.Whether you are an embedded engineer looking to break into AI, or a machine learning practitioner curious about deploying on hardware accelerators, this course will give you the knowledge and practical skills to run ML models efficiently on modern embedded systems.Enroll now and start your journey into embedded machine learning with ARM Ethos-U!

Overview

Section 1: Machine Learning For Embedded Devices Architecture Overview

Lecture 1 Tiny Machine Learning Model Developement Flow

Lecture 2 Different Machine Learning Models Overview

Lecture 3 Standard Trained ML Models Challenge on Embedded Devices

Lecture 4 CNN Model As Use Case for Embedded Devices

Lecture 5 Convolution Stage

Lecture 6 Activation (RELU) Satge

Lecture 7 Pooling (Optional) Stage

Lecture 8 Stacking Multiple Layers

Lecture 9 Flattern

Lecture 10 Dense/Connected Layer

Lecture 11 SoftMax and Final Decision

Section 2: Tensor Flow Light For Microcontroller Based Model

Lecture 12 Tensor Flow Lite for Microcontroller (TFLM) Based ML Models Generation Flow

Lecture 13 Tensor Flow Main Framework Training Stage

Lecture 14 H5 to .TFLite File Convertion

Lecture 15 ARM NPU Vela Compilation Stage

Lecture 16 Summary of Supported TFLM Machine Learning Operations

Section 3: ARM NPU Vela Compiler

Lecture 17 Vela Compiler Use Case

Lecture 18 Vela Compiler Work Flow Overview

Lecture 19 Vela Compiler Installation Pre-requisites

Lecture 20 Vela Compiler Installation

Lecture 21 Vela Compiler Supported Commands Line Summary

Lecture 22 Vela Compiler System Configuration File

Lecture 23 Supported Memory Configurations Modes

Lecture 24 Vela Compiler Tuning Configuration File

Section 4: TFLM Based Machine Learning FlatBuffer for ARM NPU Based Hardware

Lecture 25 .TFLite Vela Compiled Output File Generation Summary

Lecture 26 Flat Buffer File Format

Lecture 27 Dump the Vela Output .TFLite Flat Buffer to JSON File Format Representation

Lecture 28 Flat Buffer File JSON presentation Example

Lecture 29 Flat Buffer Elements: Opcode Table Section

Lecture 30 Flat Buffer Elements: Buffers Section

Lecture 31 Flat Buffer Elements: SubGraph Section

Lecture 32 Flat Buffer Elements: Machine Learning Operators Opcodes Indexing

Lecture 33 Flat Buffer Elements: Tensors Indexing

Lecture 34 Flat Buffer Elements: Buffers Indexing

Lecture 35 NPU CUstom Operation ETHOSU_CONV_2D

Lecture 36 ETHOSU_CONV_2D NPU Operation Data Stream

Lecture 37 ETHOSU_CONV_2D NPU Operation Embedded Memory Informations

Lecture 38 .TFLite Flat Buffer File Metadata Informations Part1

Lecture 39 .TFLite Flat Buffer File Metadata Informations Part2 (Memory Usage)

Lecture 40 .TFLite Flat Buffer File Metadata Informations Part3 (Architecture Config)

Lecture 41 .TFLite Flat Buffer File Metadata Example Summary

Section 5: ARM Ethos-U NPU Input Data Stream

Lecture 42 ETHOSU_CONV_2D Operator Input Data Stream Overview

Lecture 43 DMA Transfer Commands

Lecture 44 DMA Transfer Commands Summary

Lecture 45 Memory Regions Informations

Lecture 46 Looping and Tiling Logic Overview

Lecture 47 Tiling Logic

Lecture 48 NPU Micro-Operations

Section 6: ARM ETHOS-U/N NPU (Embedded AI Hardware Accelerators) Families

Lecture 49 High Performance Vs Low Power ARM ETHOS NPUs (ETHOS-U Vs ETHOS-N)

Lecture 50 ARM ETHOS-U Low Power NPUs Usage

Lecture 51 ARM Cortex-M55 + Ethos-U Hardware System Integration Overview

Lecture 52 ARM Cortex-M55 + Ethos-U Hardware System Integration Example

Lecture 53 ARM Ethos-U & Cortex-M/A System Inegration Topologies

Lecture 54 ARM NPU Hardware Block Diagram Overview

Lecture 55 NPU Functional Block Diagram

Section 7: Tensor Flow Light for Microcontroller (TFLM) C++ Runtime Library

Lecture 56 TFLM Top Level Flow

Lecture 57 TFLM Interpreter Initialization Stage

Lecture 58 TFLM Initialization: Arena Memory Allocation

Lecture 59 TFLM Initialization: Interpreter Instantiation Part1

Lecture 60 TFLM Initialization: Interpreter Instantiation Part2

Lecture 61 TFLM Nodes Allocation: Tensors Allocation Part1

Lecture 62 TFLM Nodes Allocation: Tensors Allocation Part2

Lecture 63 TFLM Nodes Allocation: Tensors Allocation Part3

Lecture 64 TFLM Nodes Allocation: Tensors Allocation Part4

Lecture 65 TFLM Operators Invokation Overview

Lecture 66 TFLM Operators Invokation: Kernel & DSP Based Operators

Lecture 67 TFLM Operators Invokation: Custom NPU Ethos-U Based Operator

Section 8: ARM CMSIS-NN (Neural Network) Library

Lecture 68 ARM CMSIS-NN Role in Machine Learning Execution Flow

Lecture 69 CMSIS-NN As Neural Library for Cortex-M Micro-Processors

Lecture 70 CMSIS-NN Software Architecture Design

Lecture 71 CMSIS-NN Compile Time Feature Flags Selection

Lecture 72 CMSIS-NN APIs Summary

Lecture 73 CMSIS-NN Processors Targets Specific Implementation

Section 9: Alif E7 Board For Embedded Based Machine Learning Use Cases

Lecture 74 Alif E7 System on Chip Hardware Block Diagram

Lecture 75 Alif E7: High Performance (HP) vs High Efficiency (HE) ETHOS-U55 ARM NPUs

Lecture 76 Alif E7 Development Kit

Lecture 77 Alif E7 Development Kit Schematic Overview

Lecture 78 Alif E7 Development Kit Jumpers Configuration

Lecture 79 Alif E7 Development Kit Presentation

Lecture 80 Booting Alif E7 Development Kit

Lecture 81 Live Booting of Alif E7 ML Development Kit

Lecture 82 Alif E7 Development Kit Builtin Example

Section 10: Alif E7 Examples & Setup Environment Guide

Lecture 83 Setup Guide Slides and Repository Links

Lecture 84 Alif E7 Examples & Demos Repository

Lecture 85 Alif E7 Windows Machine Configuration

Lecture 86 Alif E7 Required Tools Installation on Windows Machine

Lecture 87 Python > 3.10 Package Installation

Lecture 88 Tools & Examples Projects Repositories Cloning and Setup

Lecture 89 How to Configure & Build the Key Word Spotting Example (KWS)

Lecture 90 How to Configure & Build the Key Word Spotting Example (KWS): Configure Stage

Lecture 91 How to Configure & Build the Key Word Spotting Example (KWS): Ninja Build Stage

Lecture 92 Inspect the Output KWS Example .axf File

Lecture 93 Image Classification Use Case

Lecture 94 Prepare to Execute our Machine Learning Examples (KWS & Image Classification)

Lecture 95 Running the Key Word Spotting (KWS) Example on Alif E7 ML Board

Lecture 96 Running the Image Classification (image_class) Example on Alif E7 ML Board

Embedded Systems developers who wants to start learning Machine Learning for embedded devices