Deep Learning with PyTorch Training and applying deep learning and generative AI models, 2nd Edition (Howard Huang, Eli Stevens, Luca Antiga etc.)（Z-Library）

M A N N I N G Howard Huang Eli Stevens Luca Antiga Thomas Viehmann Training and applying deep learning and generative AI models SECOND EDITION

PRODUCTION SERVER CLOUD PRODUCTION (ONNX, TORCH.COMPILE, TORCH.EXPORT) TRAINED MODEL TRAINING LOOP BATCH TENSOR SAMPLE TENSORS DATA SOURCE MULTIPROCESS DATA LOADING UNTRAINED MODEL DISTRIBUTED TRAINING ON MULTIPLE SERVERS/GPUS An end-to-end machine learning pipeline from data to training to production Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

Praise for the first edition With this publication, we finally have a definitive treatise on PyTorch. It covers the basics and abstractions in great detail. —From the Foreword by Soumith Chintala, Cocreator of PyTorch Deep learning divided into digestible chunks with code samples that build up logically. —Mathieu Zhang, NVIDIA Timely, practical, and thorough. Don’t put it on your bookshelf but next to your laptop. —Philippe Van Bergen, P2 Consulting Deep Learning with PyTorch offers a very pragmatic overview of deep learning. It is a didactical resource. —Orlando Alejo Mendez Morales, ExperianLicensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

Deep Learning with PyTorch SECOND EDITION TRAINING AND APPLYING DEEP LEARNING AND GENERATIVE AI MODELS HOWARD HUANG LUCA ANTIGA ELI STEVENS THOMAS VIEHMANN MANN I NG SHELTER ISLAND Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2026 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. The authors and publisher have made every effort to ensure that the information in this book was correct at press time. The authors and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Manning Publications Co. Development editor: Elesha Hyde 20 Baldwin Road Technical editor: Fábio Vinicius Moreira Perez PO Box 761 Review editor: Angelina Lazukić Shelter Island, NY 11964 Production editor: Aleksandar Dragosavljević Copy editor: Alisa Larson Proofreader: Mike Beady Typesetter: Dennis Dalinnik Cover designer: Marija TudorISBN: 9781633438859 Printed in the United States of America Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

To Fred, Mary, Ward, and Edward, the best unpaid editors, critics, and supporters I could ask for —Howard Huang Same :-) But, really, this is for you, Alice and Luigi —Luca Antiga To my wife (this book would not have happened without her invaluable support and partnership), my parents (I would not have happened without them), and my children (this book would have happened a lot sooner but for them), thank you for being my home, my foundation, and my joy —Eli Stevens To Eva, Rebekka, Jonathan, and David —Thomas ViehmannLicensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

contents preface xvii acknowledgments xix about this book xxi about the authors xxvii about the cover illustration xxviii PART 1 CORE PYTORCH..................................................1 1 Introducing deep learning and the PyTorch library 3 1.1 What is deep learning? 4 1.2 The shift from machine learning to deep learning 5 1.3 What to expect 7 1.4 Why PyTorch? 8 The deep learning competitive landscape 9 1.5 How PyTorch supports deep learning projects 10 1.6 Hardware and software requirements 13 Using Jupyter Notebooks 14 1.7 Exercises 15vii Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

CONTENTSviii 2 Pretrained networks 17 2.1 A pretrained network that recognizes the subject of an image 18 Obtaining a pretrained network for image recognition 20 AlexNet 21 ■ The Vision Transformer 23 ■ Ready, set, almost run 23 ■ Run! 26 2.2 Generating and editing images 29 The inpainting process 29 ■ A network that turns horses into zebras 30 2.3 Model Zoo: Hugging Face 34 2.4 A pretrained network that describes scenes 35 BLIP in action 36 2.5 Conclusion 37 2.6 Exercises 38 3 It starts with a tensor 39 3.1 The world as floating-point numbers 40 3.2 Tensors: Multidimensional arrays 42 From Python lists to PyTorch tensors 42 ■ Constructing our first tensors 43 ■ The essence of tensors 43 3.3 Indexing tensors 46 3.4 Broadcasting 47 3.5 Named tensors 48 3.6 Tensor element types 51 Specifying the numeric type with dtype 51 ■ A dtype for every occasion 52 ■ Managing a tensor’s dtype attribute 52 3.7 The tensor API 53 3.8 Tensors: Scenic views of storage 55 Indexing into storage 55 ■ Modifying stored values: In-place operations 56 3.9 Tensor metadata: Size, offset, and stride 57 Views of another tensor’s storage 58 ■ Transposing without copying 59 ■ Transposing in higher dimensions 61 Contiguous tensors 62 3.10 Moving tensors to the GPU 64 Managing a tensor’s device attribute 653.11 NumPy interoperability 66 Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

CONTENTS ix 3.12 Generalized tensors are tensors, too 67 3.13 Serializing tensors 68 Serializing to HDF5 with h5py 68 3.14 Conclusion 70 3.15 Exercises 70 4 Real-world data representation using tensors 72 4.1 Working with images 73 Adding color channels 74 ■ Loading an image file 74 Changing the layout 75 ■ Normalizing the data 76 4.2 3D images: Volumetric data 77 Loading a specialized format 78 4.3 Representing tabular data 79 Using a real-world dataset 79 ■ Loading a wine data tensor 80 Representing scores 83 ■ One-hot encoding 83 ■ When to categorize 85 ■ Finding thresholds 87 4.4 Working with time series 89 Adding a time dimension 89 ■ Shaping the data by time period 91 ■ Ready for training 93 4.5 Representing text 96 Converting text to numbers 97 ■ One-hot-encoding characters 97 One-hot encoding whole words 99 ■ Text embeddings 101 Text embeddings as a blueprint 103 4.6 Conclusion 104 4.7 Exercises 104 5 The mechanics of learning 106 5.1 A timeless lesson in modeling 107 5.2 Learning is just parameter estimation 109 A hot problem 110 ■ Gathering some data 110 ■ Visualizing the data 111 ■ Choosing a linear model as a first try 111 5.3 Less loss is what we want 112 From problem back to PyTorch 113 5.4 Down along the gradient 116 Decreasing loss 117 ■ Getting analytical 118 ■ Iterating to fit the model 121 ■ Normalizing inputs 124 ■ Visualizing (again) 127 Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

CONTENTSx 5.5 PyTorch’s autograd: Backpropagating all things 128 Computing the gradient automatically 128 ■ Optimizers à la carte 132 ■ Training, validation, and overfitting 136 Training set 141 ■ Autograd nits and switching it off 142 5.6 Conclusion 144 5.7 Exercises 144 6 Using a neural network to fit the data 146 6.1 Artificial neurons 147 Composing a multilayer network 149 ■ Understanding the error function 149 ■ Adding nonlinearity with activation functions 150 ■ More activation functions 152 ■ Choosing the best activation function 153 ■ What learning means for a neural network 154 6.2 The PyTorch nn module 157 Using nn.Module as a callable 157 ■ Returning to the linear model 158 6.3 Finally, a neural network 163 Replacing the linear model 163 ■ Inspecting the parameters 164 Comparing to the linear model 167 6.4 Conclusion 167 6.5 Exercises 168 7 Telling birds from airplanes: Learning from images 170 7.1 A dataset of tiny images 171 Downloading CIFAR-10 171 ■ The Dataset class 172 Dataset transforms 174 ■ Normalizing data 176 7.2 Distinguishing birds from airplanes 178 Building the dataset 179 ■ A fully connected model 180 Output of a classifier 181 ■ Representing the output as probabilities 182 ■ Training the classifier 189 ■ The limits of going fully connected 196 7.3 Conclusion 198 7.4 Exercises 199 8 Using convolutions to generalize 200 8.1 The case for convolutions 201What convolutions do 201 Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

CONTENTS xi 8.2 Convolutions in action 204 Padding the boundary 206 ■ Detecting features with convolutions 208 ■ Looking further with depth and pooling 211 Putting it all together for our network 213 8.3 Subclassing nn.Module 215 Our network as an nn.Module 216 ■ How PyTorch keeps track of parameters and submodules 218 ■ The functional API 219 8.4 Training our convolutional neural network 220 Measuring accuracy 222 ■ Saving and loading our model 223 Training on the GPU 224 8.5 Model design 226 Adding memory capacity: Width 227 ■ Helping our model to converge and generalize: Regularization 229 ■ Going deeper to learn more complex structures: Depth 233 ■ Comparing the designs from this section 239 ■ It’s already outdated 239 8.6 Conclusion 240 8.7 Exercises 240 PART 2 PRACTICAL DEEP LEARNING APPLICATIONS .......243 9 How transformers work 245 9.1 A motivating example: Generating names character by character 247 9.2 Self-supervised learning 249 Limits of the bigram model 251 9.3 Generating our training data 252 9.4 Embeddings and linear layers 254 Visualizing embeddings 259 9.5 Attention 260 Dot product self-attention 261 ■ Scaled dot product causal self-attention 264 9.6 Transformers 268 The decoder 269 9.7 Other Transformer architectures 274 The encoder 274 ■ The encoder-decoder 275Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

CONTENTSxii 9.8 Tokenization 276 Generating sentences 278 9.9 The Vision Transformer 279 9.10 Conclusion 281 9.11 Exercises 281 10 Diffusion models for images 283 10.1 History of VAEs and GANs 284 10.2 Motivator for diffusion models 285 10.3 Diffusion in detail 286 10.4 Setting up the data 287 10.5 The forward process 289 10.6 Training 293 Loss 295 10.7 Reversing diffusion (how to sample) 297 10.8 Conclusion 300 10.9 Exercises 300 11 Using PyTorch to fight cancer 302 11.1 Introduction to the use case 303 11.2 Preparing for a large-scale project 304 11.3 What is a CT scan, exactly? 306 11.4 The project: An end-to-end detector for lung cancer 309 Why can’t we just throw data at a neural network until it works? 312 ■ Our data source: The LUNA Grand Challenge 317 ■ Downloading the LUNA data 317 11.5 Conclusion 318 12 Combining data sources into a unified dataset 320 12.1 Raw CT data files 322 12.2 Parsing LUNA’s annotation data 322 Training and validation sets 324 ■ Unifying our annotation and candidate data 325 12.3 Loading individual CT scans 327 Hounsfield Units 329 Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

CONTENTS xiii 12.4 Locating a nodule using the patient coordinate system 330 The patient coordinate system 331 ■ CT scan shape and voxel sizes 333 ■ Converting between millimeters and voxel addresses 333 ■ Extracting a nodule from a CT scan 335 12.5 Straightforward dataset implementation 336 Caching candidate arrays with the getCtRawCandidate function 339 ■ Constructing our dataset in LunaDataset.__init__ 340 ■ A training/validation split 340 ■ Rendering the data 342 12.6 Conclusion 342 12.7 Exercises 343 13 Training a classification model to detect suspected tumors 344 13.1 A foundational model and training loop 344 13.2 The main entry point for our application 347 13.3 Pretraining setup and initialization 349 Initializing the model and optimizer 350 ■ Care and feeding of data loaders 351 13.4 Our first-pass neural network design 353 The core convolutions 354 ■ The full model 357 13.5 Training and validating the model 360 The computeBatchLoss function 362 ■ The validation loop is similar 364 13.6 Outputting performance metrics 365 The logMetrics function 366 13.7 Running the training script 369 Data needed for training 370 ■ Interlude: The tqdm function 371 13.8 Evaluating the model: Getting 99.7% correct means we’re done, right? 372 13.9 Graphing training metrics with TensorBoard 373 Running TensorBoard 374 ■ Adding TensorBoard support to the metrics logging function 377 13.10 Why isn’t the model learning to detect nodules? 379 13.11 Conclusion 380 13.12 Exercises 380 Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

CONTENTSxiv 14 Improving training with metrics and augmentation 382 14.1 High-level plan for improvement 383 14.2 Good dogs vs. bad guys: False positives and false negatives 384 14.3 Graphing the positives and negatives 386 Recall is Chirpy’s strength 388 ■ Precision is Dozer’s forte 389 Implementing precision and recall in logMetrics 390 ■ Our ultimate performance metric: The F1 score 391 ■ How does our model perform with our new metrics? 395 14.4 What does an ideal dataset look like? 396 Making the data look less like the actual and more like the “ideal” 399 ■ Contrasting training with a balanced LunaDataset to previous runs 404 ■ Recognizing the symptoms of overfitting 406 14.5 Revisiting the problem of overfitting 408 An overfit face-to-age prediction model 408 14.6 Preventing overfitting with data augmentation 409 Specific data augmentation techniques 410 ■ Seeing the improvement from data augmentation 415 14.7 Conclusion 417 14.8 Exercises 417 15 Using segmentation to find suspected nodules 420 15.1 Utilizing a second model in our project 421 15.2 Various types of segmentation 423 15.3 Semantic segmentation: Per-pixel classification 423 The Segment Anything model (SAM) 425 15.4 SAM architecture 426 Trying out an off-the-shelf model for our project 428 15.5 Using the SAM model directly 430 15.6 Updating the dataset for segmentation 432 Working around SAM’s limitation on 2D data 432 ■ Building the segmentation dataset 434 ■ Training a model to flag potential candidates 436 15.7 Updating our training for fine-tuning 439 How to fine-tune a model 440 ■ Using the AdamW optimizer 442 Designing our training loop 442 ■ Saving our model 444 Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

CONTENTS xv 15.8 Inference and results 445 15.9 Conclusion 446 15.10 Exercises 447 16 Training models on multiple GPUs 449 16.1 Introduction to parallel programming 450 Distributed computing terminology 451 ■ Hardware requirements 453 ■ Initializing a distributed program 453 16.2 Collective communication 455 16.3 Introduction to parallelisms 458 16.4 Data parallelism 459 16.5 Model parallelism 461 Pipeline parallelism 462 ■ Tensor parallelism 464 Deciding between pipeline and tensor parallelism 465 16.6 n-dimensional parallelism 466 16.7 Fully sharded data parallelism 468 16.8 Large language model–specific parallelisms 470 Context parallelism 470 ■ Expert Parallelism 470 16.9 Tying all parallelisms together 471 16.10 Conclusion 471 16.11 Exercises 471 17 Deploying to production 474 17.1 Serving PyTorch models 475 Our model served by Gradio 476 ■ Our model behind a FastAPI server 477 ■ What we want from deployment 481 ■ Request batching and streaming responses 482 ■ How to make PyTorch models even faster 486 17.2 Exporting models 490 Interoperability beyond PyTorch with ONNX 491 ■ PyTorch’s own export: torch.export 492 17.3 Expanding on torch.compile 495 Full graph capture vs. disjoint graphs 496 17.4 Understanding execution with torch.profiler 499 17.5 Using PyTorch outside of Python 501LibTorch: PyTorch in C++ 501 Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

CONTENTSxvi 17.6 Going mobile: ExecuTorch 504 17.7 Conclusion 505 17.8 Exercises 505 index 507Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

preface When I first started exploring machine learning, the idea that a computer could learn the algorithms I needed instead of painstakingly coding them felt magical. Leaning into my laziness, I loved that I could define a model and let the system discover the internals for me. I had no idea then how far things would go. Today, models can even code themselves, and we’re limited more by the clarity of our prompts than by the machinery itself. The pace of progress has been astounding, and part of what I want to do in this book is peel back that sense of mystery. My first experience with AI was back in college. In 2016, I purchased a few tech- nical books, much like this one, on machine learning and deep learning. At the time, I tried learning Scikit-learn and TensorFlow but found the learning curve quite steep. Whether due to the libraries themselves or my own inexperience, I struggled to get past the basics, barely managing to build simple models and often feeling stuck. Then PyTorch was released on January 18, 2017, and it immediately struck me as different. It was the first framework that hit the right balance between ease of use and power. When I finally had the opportunity to contribute to PyTorch, I jumped at the chance. It took me a while to build the confidence to make contributions to such a large open source project—I worried that what I was doing was wrong, too small, or insignificant. But after working as a core contributor over the past several years, I can say that any contribution, whether big or small, is deeply appreciated and can truly make a difference.xvii Licensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

PREFACExviii Writing this book, I kept returning to my younger self: I wanted something that balanced theory and practice, nudged you forward without overwhelming you, and made hard ideas feel tractable. I hope these pages do that for you. While PyTorch has changed significantly over the years, its ethos has remained the same: to provide a deep learning library that’s easy to use yet powerful enough to tackle cutting-edge problems. I hope this book stays true to that spirit. From tackling the basics to building real-world projects, PyTorch has been my go-to framework, and I’m excited to share it with you. I can’t wait to see what you’ll build with it. —HOWARD HUANGLicensed to THIAGO BANDEIRA <thiago@lar.ifce.edu.br>

Statistics

Uploader

Deep Learning with PyTorch Training and applying deep learning and generative AI models, 2nd Edition (Howard Huang, Eli Stevens, Luca Antiga etc.)（Z-Library）

AI Reading Assistant

Passage locations

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Blog & Notes

Recommended for You

Statistics

Uploader

Deep Learning with PyTorch Training and applying deep learning and generative AI models, 2nd Edition (Howard Huang, Eli Stevens, Luca Antiga etc.)（Z-Library）

AI Reading Assistant

Passage locations

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment

Blog & Notes

Recommended for You