Tags
Language
Tags
June 2025
Su Mo Tu We Th Fr Sa
1 2 3 4 5 6 7
8 9 10 11 12 13 14
15 16 17 18 19 20 21
22 23 24 25 26 27 28
29 30 1 2 3 4 5
    Attention❗ To save your time, in order to download anything on this site, you must be registered 👉 HERE. If you do not have a registration yet, it is better to do it right away. ✌

    ( • )( • ) ( ͡⚆ ͜ʖ ͡⚆ ) (‿ˠ‿)
    SpicyMags.xyz

    Transformers In Computer Vision - English Version

    Posted By: ELK1nG
    Transformers In Computer Vision - English Version

    Transformers In Computer Vision - English Version
    Published 1/2023
    MP4 | Video: h264, 1280x720 | Audio: AAC, 44.1 KHz
    Language: English | Size: 3.21 GB | Duration: 5h 31m

    Transformers in Computer Vision - English version

    What you'll learn

    What are transformer networks?

    State of the Art architectures for CV Apps like Image Classification, Semantic Segmentation, Object Detection and Video Processing

    Practical application of SoTA architectures like ViT, DETR, SWIN in Huggingface vision transformers

    Attention mechanisms as a general Deep Learning idea

    Inductive Bias and the landscape of DL models in terms of modeling assumptions

    Transformers application in NLP and Machine Translation

    Transformers in Computer Vision

    Different types of attention in Computer Vision

    Requirements

    Practical Machine Learning course

    Practical Computer Vision course (ConvNets)

    Introduction to NLP course

    Description

    Transformer Networks are the new trend in Deep Learning nowadays. Transformer models have taken the world of NLP by storm since 2017. Since then, they become the mainstream model in almost ALL NLP tasks. Transformers in CV are still lagging, however they started to take over since 2020. We will start by introducing attention and the transformer networks. Since transformers were first introduced in NLP, they are easier to be described with some NLP example first. From there, we will understand the pros and cons of this architecture. Also, we will discuss the importance of unsupervised or semi supervised pre-training for the transformer architectures, discussing Large Scale Language Models (LLM) in brief, like BERT and GPT.This will pave the way to introduce transformers in CV. Here we will try to extend the attention idea into the 2D spatial domain of the image. We will discuss how convolution can be generalized using self attention, within the encoder-decoder meta architecture. We will see how this generic architecture is almost the same in image as in text and NLP, which makes transformers a generic function approximator. We will discuss the channel and spatial attention, local vs. global attention among other topics.In the next three modules, we will discuss the specific networks that solve the big problems in CV: classification, object detection and segmentation. We will discuss Vision Transformer (ViT) from Google, Shifter Window Transformer (SWIN) from Microsoft, Detection Transformer (DETR) from Facebook research, Segmentation Transformer (SETR) and many others. Then we will discuss the application of Transformers in video processing, through Spatio-Temporal Transformers with application to Moving Object Detection, along with Multi-Task Learning setup.Finally, we will show how those pre-trained arcthiectures can be easily applied in practice using the famous Huggingface library using the Pipeline interface.

    Overview

    Section 1: Introduction

    Lecture 1 Introduction

    Section 2: Overview of Transformer Networks

    Lecture 2 The Rise of Transformers

    Lecture 3 Inductive Bias in Deep Neural Network Models

    Lecture 4 Attention is a General DL idea

    Lecture 5 Attention in NLP

    Lecture 6 Attention is ALL you need

    Lecture 7 Self Attention Mechanisms

    Lecture 8 Self Attention Matrix Equations

    Lecture 9 Multihead Attention

    Lecture 10 Encoder-Decoder Attention

    Lecture 11 Transformers Pros and Cons

    Lecture 12 Unsupervised Pre-training

    Section 3: Transformers in Computer Vision

    Lecture 13 Module roadmap

    Lecture 14 Encoder-Decoder Design Pattern

    Lecture 15 Convolutional Encoders

    Lecture 16 Self Attention vs. Convolution

    Lecture 17 Spatial vs. Channel vs. Temporal Attention

    Lecture 18 Generalization of self attention equations

    Lecture 19 Local vs. Global Attention

    Lecture 20 Pros and Cons of Attention in CV

    Section 4: Transformers in Image Classification

    Lecture 21 Transformers in image classification

    Lecture 22 Vistion Transformers (ViT and DeiT)

    Lecture 23 Shifted Window Transformers (SWIN)

    Section 5: Transformers in Object Detection

    Lecture 24 Transformers in Object detection

    Lecture 25 Obejct Detection methods review

    Lecture 26 Object Detection with ConvNet - YOLO

    Lecture 27 DEtection TRansformers (DETR)

    Lecture 28 DETR vs. YOLOv5 use case

    Section 6: Transformers in Semantic Segmentation

    Lecture 29 Module roadmap

    Lecture 30 Image Segmentation using ConvNets

    Lecture 31 Image Segmentation using Transformers

    Section 7: Spatio-Temporal Transformers

    Lecture 32 Spatio-Temporal Transformers - Moving Object Detection and Multi-trask Learning

    Section 8: Huggingface Vision Transformers

    Lecture 33 Module roadmap

    Lecture 34 Huggingface Pipeline overview

    Lecture 35 Huggingface vision transformers

    Lecture 36 Huggingface Demo using Gradio

    Section 9: Conclusion

    Lecture 37 Course conclusion

    Section 10: Material

    Lecture 38 Slides

    Intermediate to Advanced CV Engineers,Intermediate to Advanced CV Researchers