Tags
Language
Tags
May 2025
Su Mo Tu We Th Fr Sa
27 28 29 30 1 2 3
4 5 6 7 8 9 10
11 12 13 14 15 16 17
18 19 20 21 22 23 24
25 26 27 28 29 30 31
    Attention❗ To save your time, in order to download anything on this site, you must be registered 👉 HERE. If you do not have a registration yet, it is better to do it right away. ✌

    ( • )( • ) ( ͡⚆ ͜ʖ ͡⚆ ) (‿ˠ‿)
    SpicyMags.xyz

    Evaluating Large Language Models (LLMs)

    Posted By: IrGens
    Evaluating Large Language Models (LLMs)

    Evaluating Large Language Models (LLMs)
    ISBN: 9780135451922 | .MP4, AVC, 1280x720, 30 fps | English, AAC, 2 Ch | 7h 56m | 2.2 GB
    Instructor: Sinan Ozdemir

    The Sneak Peek program provides early access to Pearson video products and is exclusively available to subscribers. Content for titles in this program is made available throughout the development cycle, so products may not be complete, edited, or finalized, including video post-production editing.

    Introduction

    Evaluating Large Language Models (LLMs): Introduction

    Lesson 1: Foundations of LLM Evaluation

    Learning objectives
    1.1 Introduction to Evaluation: Why It Matters
    1.2 Generative versus Understanding Tasks
    1.3 Key Metrics for Common Tasks

    Lesson 2: Evaluating Generative Tasks

    Learning objectives
    2.1 Evaluating Multiple-Choice Tasks
    2.2 Evaluating Free Text Response Tasks
    2.3 AIs Supervising AIs: LLM as a Judge

    Lesson 3: Evaluating Understanding Tasks

    Learning objectives
    3.1 Evaluating Embedding Tasks
    3.2 Evaluating Classification Tasks
    3.3 Building an LLM Classifier with BERT and GPT

    Lesson 4: Using Benchmarks Effectively

    Learning objectives
    4.1 The Role of Benchmarks
    4.2 Interrogating Common Benchmarks
    4.3 Evaluating LLMs with Benchmarks

    Lesson 5: Probing LLMs for a World Model

    Learning objectives
    5.1 Probing LLMs for Knowledge
    5.2 Probing LLMs to Play Games

    Lesson 6: Evaluating LLM Fine-Tuning

    Learning objectives
    6.1 Fine-Tuning Objectives
    6.2 Metrics for Fine-Tuning Success
    6.3 Practical Demonstration: Evaluating Fine-Tuning
    6.4 Evaluating and Cleaning Data

    Lesson 7: Case Studies

    Learning objectives
    7.1 Evaluating AI Agents: Task Automation and Tool Integration
    7.2 Measuring Retrieval-Augmented Generation (RAG) Systems
    7.3 Building and Evaluating a Recommendation Engine Using LLMs
    7.4 Using Evaluation to Combat AI Drift
    7.5 Time-Series Regression

    Lesson 8: Summary of Evaluation and Looking Ahead

    Learning objectives
    8.1 When and How to Evaluate
    8.2 Looking Ahead: Trends in LLM Evaluation

    Summary

    Evaluating Large Language Models (LLMs): Summary


    Evaluating Large Language Models (LLMs)