Unstructured Data Preprocessing For Rag Apps & Llms - [New]
Published 8/2024
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 2.12 GB | Duration: 3h 1m
Published 8/2024
MP4 | Video: h264, 1920x1080 | Audio: AAC, 44.1 KHz
Language: English | Size: 2.12 GB | Duration: 3h 1m
Master Unstructured Data with ViT, Metadata, Advanced Chunking, Hybrid Search, and RAG Techniques
What you'll learn
Master Unstructured Data Processing: Learn to efficiently extract, process, and normalize data from diverse document formats, including PDFs, PowerPoints
Implement Advanced Metadata Enrichment: Understand how to enrich documents with comprehensive metadata, enabling more accurate and relevant data retrieval
Apply Vision Models and Chunking Techniques: Gain practical skills in applying vision models like ViT and advanced chunking methods to manage, analyze
Build and Deploy Hybrid Search Engines: Develop and deploy hybrid search engines that combine content-based retrieval with metadata-driven queries
Requirements
Basic Programming Knowledge: Familiarity with programming concepts, particularly in Python and JavaScript, will help learners understand and apply the course content more effectively.
Familiarity with AI Concepts: A basic understanding of AI, LLMs, or machine learning will make it easier to grasp the data preprocessing and RAG concepts covered in the course.
Description
Unlock the power of unstructured data and elevate your AI-driven applications with this comprehensive course on transforming unstructured data into actionable insights using advanced techniques. Whether you’re a developer, data scientist, or AI enthusiast, this course will equip you with the skills to extract, process, and normalize content from diverse document formats—including PDFs, PowerPoints, Word files, HTML pages, tables, and images—making your data-ready for sophisticated RAG systems and Large Language Models (LLMs).In this hands-on course, you'll delve deep into the Unstructured Framework, a powerful tool for managing and normalizing unstructured data. I'd like you to learn how to enrich your documents with metadata, apply advanced chunking techniques, and use hybrid search methods to enhance your data retrieval and generation processes. With a focus on real-world applications, you’ll gain practical experience in preprocessing documents using vision models like ViT, extracting valuable information through table transformers, and seamlessly integrating these components into your RAG-powered applications.What You’ll Learn:Master the Unstructured Framework: Understand how to leverage the Unstructured Framework for handling and normalizing diverse data types, optimizing them for use in RAG systems and LLMs.Advanced Metadata Extraction: Learn to enrich your documents with comprehensive metadata, improving search accuracy and relevance in AI-driven applications.Implement Cutting-Edge Chunking Techniques: Apply advanced chunking methods to manage and process large datasets, ensuring efficient data handling and retrieval.Harness Hybrid Search Capabilities: Explore hybrid search techniques that combine metadata and content-based retrieval, boosting the performance of your query engines.Document Image Analysis with ViT: Utilize vision models like ViT and table transformers to analyze and preprocess document images, enhancing your ability to extract and utilize unstructured data.Why This Course?This course is designed for professionals who want to go beyond basic data processing and dive into advanced techniques for managing unstructured data in RAG systems. Through a series of practical projects, you’ll gain the expertise to build and deploy robust, scalable data engines that can handle complex queries and generate contextually relevant responses. Whether you’re looking to enhance your current skill set or explore new frontiers in AI-driven development, this course provides the knowledge and hands-on experience you need to succeed.Join us and master the art of transforming unstructured data into powerful, structured insights for your RAG systems and LLM applications!
Overview
Section 1: Introduction
Lecture 1 Introductions and What the Course is About and Prerequisites
Lecture 2 Course Structure
Section 2: Download Source Code
Lecture 3 Source code
Lecture 4 Course Slides
Section 3: Development Environment Setup
Lecture 5 Development Environment Setup - Overview
Lecture 6 Setup OpenAI API Account and API Key
Lecture 7 Setup the Unstructured Account and FREE API Key
Lecture 8 Unstructured Framework Test Run
Section 4: Data Preprocessing for LLMs - Deep Dive
Lecture 9 Data Preprocessing Deep Dive - Overview
Lecture 10 Data Preprocessing for LLMs Overview - Why Data Preprocessing is Hard
Lecture 11 Challenges with Unstructured Data
Lecture 12 How Content Extraction Works - Cleaning and Data Normalization
Lecture 13 Chunking and Structuring Data and Workflow Orchestration
Lecture 14 The Unstructured Framework - The Whole Workflow and Overview
Section 5: Check in
Lecture 15 Check in
Section 6: Hands-on: The Unstructured Framework - Preprocessing HTML, PDFs & PPTX Documents
Lecture 16 Hands-on: Preprocessing a PDF File and Dissecting the Extracted JSON Data
Lecture 17 Hands-on: Preprocessing a PPTX (PowerPoint) File
Lecture 18 Hands-on: Preprocessing an HTML File
Lecture 19 Benefits of Normalizing Content - Summary
Section 7: Chunking and Metadata Extraction
Lecture 20 Content Chunking and Metadata Extraction - Overview
Lecture 21 Finding Elements Associated with Chapters - Hands-on
Lecture 22 Semantic Similarity - Hybrid Search and Saving Documents to Vector Database
Lecture 23 Code Restructuring - Avoid Multiple Document Preprocessing
Lecture 24 Semantic Similarity Challenges - Information Recency Criteria
Lecture 25 Chunking for Document Elements and Benefits - Full Overview
Lecture 26 Chunking Document Content - Hands-on
Lecture 27 Summary
Section 8: Preprocessing Complex Documents - PDFs and Images
Lecture 28 Preprocessing Complex Documents - PDFs and Images - Overview
Lecture 29 Document Image Analysis Methods: Document Layout Detector and Visual Transformer
Lecture 30 Advantages and Disadvantages of ViT and DLD
Lecture 31 Preprocessing HTML and PDF files - Fast
Lecture 32 Preprocessing with Document Layout Detection and Comparing the Results
Lecture 33 Table Content Extraction - Hands-on
Lecture 34 Summarizing the Table Data with LangChain - Hands-on
Section 9: Build a RAG System Using Learned Techniques - Full Use Case
Lecture 35 Put it All Together - Build a RAG System Using What You've Learned - Overview
Lecture 36 Preprocessing a PDF File and Showing Tabular Content as Well - Part 1
Lecture 37 Filtering out References and Headers from PDF - Part 2
Lecture 38 Preprocess PPTX & MD File and Save Document Elements to Vector Database: Part 3
Lecture 39 Chat with Your Own Documents - PDF - Part 4
Lecture 40 Chat with Your Own Documents - MD and PPTX Documents - Final
Section 10: Wrap up
Lecture 41 What's Next
Developers and Programmers,Data Scientists and AI Enthusiasts who are looking to expand their knowledge of unstructured data processing, metadata enrichment, and the creation of Retrieval-Augmented Generation (RAG) systems.,Technical Professionals working in fields where data normalization, chunking, and hybrid search are critical, and who wish to implement robust solutions using the Unstructured framework and Vision Transformers (ViT).,AI and ML Practitioners who are interested in leveraging cutting-edge techniques to preprocess and manage diverse document formats, such as PDFs, PowerPoints, and HTML, for enhanced machine learning and LLM applications.