Ian Pointer Programming PyTorch for Deep Learning Creating and Deploying Deep Learning Applications
(This page has no text content)
Ian Pointer Programming PyTorch for Deep Learning Creating and Deploying Deep Learning Applications Boston Farnham Sebastopol TokyoBeijing
978-1-492-04535-9 [LSI] Programming PyTorch for Deep Learning by Ian Pointer Copyright © 2019 Ian Pointer. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Development Editor: Melissa Potter Acquisitions Editor: Jonathan Hassell Production Editor: Katherine Tozer Copyeditor: Sharon Wilkey Proofreader: Christina Edwards Indexer: WordCo Indexing Services, Inc. Interior Designer: David Futato Cover Designer: Susan Thompson Illustrator: Rebecca Demarest September 2019: First Edition Revision History for the First Edition 2019-09-20: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781492045359 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Programming PyTorch for Deep Learn‐ ing, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. Getting Started with PyTorch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Building a Custom Deep Learning Machine 1 GPU 2 CPU/Motherboard 2 RAM 2 Storage 2 Deep Learning in the Cloud 3 Google Colaboratory 3 Cloud Providers 5 Which Cloud Provider Should I Use? 7 Using Jupyter Notebook 7 Installing PyTorch from Scratch 8 Download CUDA 8 Anaconda 9 Finally, PyTorch! (and Jupyter Notebook) 9 Tensors 10 Tensor Operations 11 Tensor Broadcasting 13 Conclusion 14 Further Reading 14 2. Image Classification with PyTorch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Our Classification Problem 15 Traditional Challenges 17 But First, Data 17 PyTorch and Data Loaders 18 iii
Building a Training Dataset 18 Building Validation and Test Datasets 20 Finally, a Neural Network! 21 Activation Functions 22 Creating a Network 22 Loss Functions 23 Optimizing 24 Training 26 Making It Work on the GPU 27 Putting It All Together 27 Making Predictions 28 Model Saving 29 Conclusion 30 Further Reading 31 3. Convolutional Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 Our First Convolutional Model 33 Convolutions 34 Pooling 37 Dropout 38 History of CNN Architectures 39 AlexNet 39 Inception/GoogLeNet 40 VGG 41 ResNet 43 Other Architectures Are Available! 43 Using Pretrained Models in PyTorch 44 Examining a Model’s Structure 44 BatchNorm 47 Which Model Should You Use? 48 One-Stop Shopping for Models: PyTorch Hub 48 Conclusion 49 Further Reading 49 4. Transfer Learning and Other Tricks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 Transfer Learning with ResNet 51 Finding That Learning Rate 53 Differential Learning Rates 56 Data Augmentation 57 Torchvision Transforms 58 Color Spaces and Lambda Transforms 63 Custom Transform Classes 64 iv | Table of Contents
Start Small and Get Bigger! 65 Ensembles 66 Conclusion 67 Further Reading 67 5. Text Classification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Recurrent Neural Networks 69 Long Short-Term Memory Networks 71 Gated Recurrent Units 73 biLSTM 73 Embeddings 74 torchtext 76 Getting Our Data: Tweets! 77 Defining Fields 78 Building a Vocabulary 80 Creating Our Model 82 Updating the Training Loop 83 Classifying Tweets 84 Data Augmentation 84 Random Insertion 85 Random Deletion 85 Random Swap 86 Back Translation 86 Augmentation and torchtext 87 Transfer Learning? 88 Conclusion 88 Further Reading 89 6. A Journey into Sound. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 Sound 91 The ESC-50 Dataset 93 Obtaining the Dataset 93 Playing Audio in Jupyter 93 Exploring ESC-50 94 SoX and LibROSA 95 torchaudio 95 Building an ESC-50 Dataset 96 A CNN Model for ESC-50 98 This Frequency Is My Universe 99 Mel Spectrograms 100 A New Dataset 102 A Wild ResNet Appears 104 Table of Contents | v
Finding a Learning Rate 105 Audio Data Augmentation 107 torchaudio Transforms 107 SoX Effect Chains 107 SpecAugment 108 Further Experiments 113 Conclusion 113 Further Reading 114 7. Debugging PyTorch Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 It’s 3 a.m. What Is Your Data Doing? 115 TensorBoard 116 Installing TensorBoard 116 Sending Data to TensorBoard 117 PyTorch Hooks 120 Plotting Mean and Standard Deviation 121 Class Activation Mapping 122 Flame Graphs 125 Installing py-spy 127 Reading Flame Graphs 128 Fixing a Slow Transformation 129 Debugging GPU Issues 132 Checking Your GPU 132 Gradient Checkpointing 134 Conclusion 136 Further Reading 136 8. PyTorch in Production. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137 Model Serving 137 Building a Flask Service 138 Setting Up the Model Parameters 140 Building the Docker Container 141 Local Versus Cloud Storage 144 Logging and Telemetry 145 Deploying on Kubernetes 147 Setting Up on Google Kubernetes Engine 147 Creating a k8s Cluster 148 Scaling Services 149 Updates and Cleaning Up 149 TorchScript 150 Tracing 150 Scripting 153 vi | Table of Contents
TorchScript Limitations 154 Working with libTorch 156 Obtaining libTorch and Hello World 156 Importing a TorchScript Model 157 Conclusion 159 Further Reading 160 9. PyTorch in the Wild. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 Data Augmentation: Mixed and Smoothed 161 mixup 161 Label Smoothing 165 Computer, Enhance! 166 Introduction to Super-Resolution 167 An Introduction to GANs 169 The Forger and the Critic 170 Training a GAN 171 The Dangers of Mode Collapse 172 ESRGAN 173 Further Adventures in Image Detection 173 Object Detection 173 Faster R-CNN and Mask R-CNN 175 Adversarial Samples 177 Black-Box Attacks 180 Defending Against Adversarial Attacks 180 More Than Meets the Eye: The Transformer Architecture 181 Paying Attention 181 Attention Is All You Need 182 BERT 183 FastBERT 183 GPT-2 185 Generating Text with GPT-2 185 ULMFiT 187 What to Use? 189 Conclusion 190 Further Reading 190 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 Table of Contents | vii
(This page has no text content)
1 See “Approximation by Superpositions of Sigmoidal Functions”, by George Cybenko (1989). Preface Deep Learning in the World Today Hello and welcome! This book will introduce you to deep learning via PyTorch, an open source library released by Facebook in 2017. Unless you’ve had your head stuck in the ground in a very good impression of an ostrich the past few years, you can’t have helped but notice that neural networks are everywhere these days. They’ve gone from being the really cool bit of computer science that people learn about and then do nothing with to being carried around with us in our phones every day to improve our pictures or listen to our voice commands. Our email software reads our email and produces context-sensitive replies, our speakers listen out for us, cars drive by them‐ selves, and the computer has finally bested humans at Go. We’re also seeing the tech‐ nology being used for more nefarious ends in authoritarian countries, where neural network–backed sentinels can pick faces out of crowds and make a decision on whether they should be apprehended. And yet, despite the feeling that this has all happened so fast, the concepts of neural networks and deep learning go back a long way. The proof that such a network could function as a way of replacing any mathematical function in an approximate way, which underpins the idea that neural networks can be trained for many different tasks, dates back to 1989,1 and convolutional neural networks were being used to rec‐ ognize digits on check in the late ’90s. There’s been a solid foundation building up all this time, so why does it feel like an explosion occurred in the last 10 years? There are many reasons, but prime among them has to be the surge in graphical pro‐ cessing units (GPUs) performance and their increasing affordability. Designed origi‐ nally for gaming, GPUs need to perform countless millions of matrix operations per second in order to render all the polygons for the driving or shooting game you’re playing on your console or PC, operations that a standard CPU just isn’t optimized ix
for. A 2009 paper, “Large-Scale Deep Unsupervised Learning Using Graphics Process‐ ors” by Rajat Raina et al., pointed out that training neural networks was also based on performing lots of matrix operations, and so these add-on graphics cards could be used to speed up training as well as make larger, deeper neural network architectures feasible for the first time. Other important techniques such as Dropout (which we will look at in Chapter 3) were also introduced in the last decade as ways to not just speed up training but make training more generalized (so that the network doesn’t just learn to recognize the training data, a problem called overfitting that we’ll encounter in the next chapter). In the last couple of years, companies have taken this GPU-based approach to the next level, with Google creating what it describes as tensor processing units (TPUs), which are devices custom-built for performing deep learning as fast as possible, and are even available to the general public as part of their Google Cloud ecosystem. Another way to chart deep learning’s progress over the past decade is through the ImageNet competition. A massive database of over 14 million pictures, manually labeled into 20,000 categories, ImageNet is a treasure trove of labeled data for machine learning purposes. Since 2010, the yearly ImageNet Large Scale Visual Rec‐ ognition Challenge has sought to test all comers against a 1,000-category subset of the database, and until 2012, error rates for tackling the challenge rested around 25%. That year, however, a deep convolutional neural network won the competition with an error of 16%, massively outperforming all other entrants. In the years that fol‐ lowed, that error rate got pushed down further and further, to the point that in 2015, the ResNet architecture obtained a result of 3.6%, which beat the average human per‐ formance on ImageNet (5%). We had been outclassed. But What Is Deep Learning Exactly, and Do I Need a PhD to Understand It? Deep learning’s definition often is more confusing than enlightening. A way of defin‐ ing it is to say that deep learning is a machine learning technique that uses multiple and numerous layers of nonlinear transforms to progressively extract features from raw input. Which is true, but it doesn’t really help, does it? I prefer to describe it as a technique to solve problems by providing the inputs and desired outputs and letting the computer find the solution, normally using a neural network. One thing about deep learning that scares off a lot of people is the mathematics. Look at just about any paper in the field and you’ll be subjected to almost impenetrable amounts of notation with Greek letters all over the place, and you’ll likely run screaming for the hills. Here’s the thing: for the most part, you don’t need to be a math genius to use deep learning techniques. In fact, for most day-to-day basic uses of the technology, you don’t need to know much at all, and to really understand what’s going on (as you’ll see in Chapter 2), you only have to stretch a little to understand x | Preface
2 Note that PyTorch borrows ideas from Chainer, but not actual code. concepts that you probably learned in high school. So don’t be too scared about the math. By the end of Chapter 3, you’ll be able to put together an image classifier that rivals what the best minds in 2015 could offer with just a few lines of code. PyTorch As I mentioned back at the start, PyTorch is an open source offering from Facebook that facilitates writing deep learning code in Python. It has two lineages. First, and perhaps not entirely surprisingly given its name, it derives many features and con‐ cepts from Torch, which was a Lua-based neural network library that dates back to 2002. Its other major parent is Chainer, created in Japan in 2015. Chainer was one of the first neural network libraries to offer an eager approach to differentiation instead of defining static graphs, allowing for greater flexibility in the way networks are cre‐ ated, trained, and operated. The combination of the Torch legacy plus the ideas from Chainer has made PyTorch popular over the past couple of years.2 The library also comes with modules that help with manipulating text, images, and audio (torchtext, torchvision, and torchaudio), along with built-in variants of popular architectures such as ResNet (with weights that can be downloaded to pro‐ vide assistance with techniques like transfer learning, which you’ll see in Chapter 4). Aside from Facebook, PyTorch has seen quick acceptance by industry, with compa‐ nies such as Twitter, Salesforce, Uber, and NVIDIA using it in various ways for their deep learning work. Ah, but I sense a question coming…. What About TensorFlow? Yes, let’s address the rather large, Google-branded elephant in the corner. What does PyTorch offer that TensorFlow doesn’t? Why should you learn PyTorch instead? The answer is that traditional TensorFlow works in a different way than PyTorch that has major implications for code and debugging. In TensorFlow, you use the library to build up a graph representation of the neural network architecture and then you exe‐ cute operations on that graph, which happens within the TensorFlow library. This method of declarative programming is somewhat at odds with Python’s more impera‐ tive paradigm, meaning that Python TensorFlow programs can look and feel some‐ what odd and difficult to understand. The other issue is that the static graph declaration can make dynamically altering the architecture during training and infer‐ ence time a lot more complicated and stuffed with boilerplate than with PyTorch’s approach. Preface | xi
For these reasons, PyTorch has become popular in research-oriented communities. The number of papers submitted to the International Conference on Learning Repre‐ sentations that mention PyTorch has jumped 200% in the past year, and the number of papers mentioning TensorFlow has increased almost equally. PyTorch is definitely here to stay. However, things are changing in more recent versions of TensorFlow. A new feature called eager execution has been recently added to the library that allows it to work similarly to PyTorch and will be the paradigm promoted in TensorFlow 2.0. But as it’s new resources outside of Google that help you learn this new method of working with TensorFlow are thin on the ground, plus you’d need years of work out there to under‐ stand the other paradigm in order to get the most out of the library. But none of this should make you think poorly of TensorFlow; it remains an industry-proven library with support from one of the biggest companies on the planet. PyTorch (backed, of course, by a different biggest company on the planet) is, I would say, a more streamlined and focused approach to deep learning and differential programming. Because it doesn’t have to continue supporting older, crustier APIs, it is easier to teach and become productive in PyTorch than in TensorFlow. Where does Keras fit in with this? So many good questions! Keras is a high-level deep learning library that originally supported Theano and TensorFlow, and now also sup‐ ports certain other frames such as Apache MXNet. It provides certain features such as training, validation, and test loops that the lower-level frameworks leave as an exer‐ cise for the developer, as well as simple methods of building up neural network archi‐ tectures. It has contributed hugely to the take-up of TensorFlow, and is now part of TensorFlow itself (as tf.keras) as well as continuing to be a separate project. PyTorch, in comparison, is something of a middle ground between the low level of raw TensorFlow and Keras; we will have to write our own training and inference rou‐ tines, but creating neural networks is almost as straightforward (and I would say that PyTorch’s approach to making and reusing architectures is much more logical to a Python developer than some of Keras’s magic). As you’ll see in this book, although PyTorch is common in more research-oriented positions, with the advent of PyTorch 1.0, it’s perfectly suited to production use cases. xii | Preface
Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a tip or suggestion. This element signifies a general note. This element indicates a warning or caution. Using Code Examples Supplemental material (including code examples and exercises) is available for down‐ load at https://oreil.ly/pytorch-github. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this Preface | xiii
book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a signifi‐ cant amount of example code from this book into your product’s documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Programming PyTorch for Deep Learning by Ian Pointer (O’Reilly). Copyright 2019 Ian Pointer, 978-1-492-04535-9.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For almost 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help compa‐ nies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, conferences, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in- depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, please visit http://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/prgrming-pytorch-for-dl. Email bookquestions@oreilly.com to comment or ask technical questions about this book. xiv | Preface
For more information about our books, courses, conferences, and news, see our web‐ site at http://www.oreilly.com. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://www.youtube.com/oreillymedia Acknowledgments A big thank you to my editor, Melissa Potter, my family, and Tammy Edlund for all their help in making this book possible. Thank you, also, to the technical reviewers who pro‐ vided valuable feedback throughout the writing process, including Phil Rhodes, David Mertz, Charles Givre, Dominic Monn, Ankur Patel, and Sarah Nagy. Preface | xv
(This page has no text content)
CHAPTER 1 Getting Started with PyTorch In this chapter we set up all we need for working with PyTorch. Once we’ve done that, every chapter following will build on this initial foundation, so it’s important that we get it right. This leads to our first fundamental question: should you build a custom deep learning computer or just use one of the many cloud-based resources available? Building a Custom Deep Learning Machine There is an urge when diving into deep learning to build yourself a monster for all your compute needs. You can spend days looking over different types of graphics cards, learning the memory lanes possible CPU selections will offer you, the best sort of memory to buy, and just how big an SSD drive you can purchase to make your disk access as fast as possible. I am not claiming any immunity from this; I spent a month a couple of years ago making a list of parts and building a new computer on my din‐ ing room table. My advice, especially if you’re new to deep learning, is this: don’t do it. You can easily spend several thousands of dollars on a machine that you may not use all that much. Instead, I recommend that you work through this book by using cloud resources (in either Amazon Web Services, Google Cloud, or Microsoft Azure) and only then start thinking about building your own machine if you feel that you require a single machine for 24/7 operation. You do not need to make a massive investment in hard‐ ware to run any of the code in this book. You might not ever need to build a custom machine for yourself. There’s something of a sweet spot, where it can be cheaper to build a custom rig if you know your calcu‐ lations are always going to be restricted to a single machine (with at most a handful of GPUs). However, if your compute starts to require spanning multiple machines and 1
GPUs, the cloud becomes appealing again. Given the cost of putting a custom machine together, I’d think long and hard before diving in. If I haven’t managed to put you off from building your own, the following sections provide suggestions for what you would need to do so. GPU The heart of every deep learning box, the GPU, is what is going to power the majority of PyTorch’s calculations, and it’s likely going to be the most expensive component in your machine. In recent years, the prices of GPUs have increased, and the supplies have dwindled, because of their use in mining cryptocurrency like Bitcoin. Thank‐ fully, that bubble seems to be receding, and supplies of GPUs are back to being a little more plentiful. At the time of this writing, I recommend obtaining the NVIDIA GeForce RTX 2080 Ti. For a cheaper option, feel free to go for the 1080 Ti (though if you are weighing the decision to get the 1080 Ti for budgetary reasons, I again suggest that you look at cloud options instead). Although AMD-manufactured GPU cards do exist, their sup‐ port in PyTorch is currently not good enough to recommend anything other than an NVIDIA card. But keep a lookout for their ROCm technology, which should eventu‐ ally make them a credible alternative in the GPU space. CPU/Motherboard You’ll probably want to spring for a Z370 series motherboard. Many people will tell you that the CPU doesn’t matter for deep learning and that you can get by with a lower-speed CPU as long as you have a powerful GPU. In my experience, you’ll be surprised at how often the CPU can become a bottleneck, especially when working with augmented data. RAM More RAM is good, as it means you can keep more data inside without having to hit the much slower disk storage (especially important during your training stages). You should be looking at a minimum of 64GB DDR4 memory for your machine. Storage Storage for a custom rig should be installed in two classes: first, an M2-interface solid-state drive (SSD)—as big as you can afford—for your hot data to keep access as fast as possible when you’re actively working on a project. For the second class of storage, add in a 4TB Serial ATA (SATA) drive for data that you’re not actively work‐ ing on, and transfer to hot and cold storage as required. 2 | Chapter 1: Getting Started with PyTorch
Comments 0
Loading comments...
Reply to Comment
Edit Comment