📄 Page
1
M A N N I N G François Chollet with Tomasz Kalinowski and J. J. Allaire SECOND EDITION
📄 Page
2
Praise for the First Edition “The clearest explanation of deep learning I have come across . . . it was a joy to read.” —Richard Tobias, Cephasonics “Bridges the gap between the hype and a functioning deep-learning system.” —Peter Rabinovitch, Akamai “All major topics and concepts of deep learning are covered and well-explained, using code examples and diagrams instead of mathematical formulas.” —Srdjan Santic, Springboard.com
📄 Page
3
(This page has no text content)
📄 Page
4
Deep Learning with R SECOND EDITION FRANÇOIS CHOLLET WITH TOMASZ KALINOWSKI AND J.J. ALLAIRE M A N N I N G SHELTER ISLAND
📄 Page
5
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2022 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. The author and publisher have made every effort to ensure that the information in this book was correct at press time. The author and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Manning Publications Co. Development editor: Jennifer Stout 20 Baldwin Road Review editor: Aleksandar Dragosavljević PO Box 761 Production editor: Andy Marinkovich Shelter Island, NY 11964 Copy editor: Pamela Hunt Proofreader: Keri Hales Technical proofreader: Ninoslav Cerkez Typesetter: Gordan Salinovic Cover designer: Marija Tudor ISBN 9781633439849 Printed in the United States of America
📄 Page
6
v contents preface xii acknowledgments xiv about this book xv about the authors xviii 1 What is deep learning? 1 1.1 Artificial intelligence, machine learning, and deep learning 2 Artificial intelligence 2 ■ Machine learning 3 ■ Learning rules and representations from data 4 ■ The “deep” in “deep learning” 7 ■ Understanding how deep learning works, in three figures 8 ■ What deep learning has achieved so far 10 Don’t believe the short-term hype 11 ■ The promise of AI 12 1.2 Before deep learning: A brief history of machine learning 13 Probabilistic modeling 13 ■ Early neural networks 13 Kernel methods 14 ■ Decision trees, random forests, and gradient-boosting machines 15 ■ Back to neural networks 16 What makes deep learning different? 17 ■ The modern machine learning landscape 17
📄 Page
7
CONTENTSvi 1.3 Why deep learning? Why now? 20 Hardware 20 ■ Data 21 ■ Algorithms 22 ■ A new wave of investment 22 ■ The democratization of deep learning 23 Will it last? 24 2 The mathematical building blocks of neural networks 26 2.1 A first look at a neural network 27 2.2 Data representations for neural networks 31 Scalars (rank 0 tensors) 31 ■ Vectors (rank 1 tensors) 31 Matrices (rank 2 tensors) 32 ■ Rank 3 and higher-rank tensors 32 ■ Key attributes 33 ■ Manipulating tensors in R 34 ■ The notion of data batches 35 ■ Real-world examples of data tensors 35 ■ Vector data 35 ■ Time-series data or sequence data 36 ■ Image data 36 ■ Video data 37 2.3 The gears of neural networks: Tensor operations 37 Element-wise operations 38 ■ Broadcasting 40 ■ Tensor product 41 ■ Tensor reshaping 43 ■ Geometric interpretation of tensor operations 44 ■ A geometric interpretation of deep learning 47 2.4 The engine of neural networks: Gradient-based optimization 48 What’s a derivative? 49 ■ Derivative of a tensor operation: The gradient 50 ■ Stochastic gradient descent 51 ■ Chaining derivatives: The backpropagation algorithm 54 2.5 Looking back at our first example 59 Reimplementing our first example from scratch in TensorFlow 61 Running one training step 63 ■ The full training loop 65 Evaluating the model 66 3 Introduction to Keras and TensorFlow 68 3.1 What’s TensorFlow? 69 3.2 What’s Keras? 69 3.3 Keras and TensorFlow: A brief history 71 3.4 Python and R interfaces: A brief history 71 3.5 Setting up a deep learning workspace 72 Installing Keras and TensorFlow 73 3.6 First steps with TensorFlow 74 TensorFlow tensors 74
📄 Page
8
CONTENTS vii 3.7 Tensor attributes 75 Tensor shape and reshaping 77 ■ Tensor slicing 78 ■ Tensor broadcasting 79 ■ The tf module 80 ■ Constant tensors and variables 81 ■ Tensor operations: Doing math in TensorFlow 82 A second look at the GradientTape API 83 ■ An end-to-end example: A linear classifier in pure TensorFlow 84 3.8 Anatomy of a neural network: Understanding core Keras APIs 89 Layers: The building blocks of deep learning 89 ■ From layers to models 94 ■ The “compile” step: Configuring the learning process 95 ■ Picking a loss function 98 ■ Understanding the fit() method 99 ■ Monitoring loss and metrics on validation data 99 ■ Inference: Using a model after training 101 4 Getting started with neural networks: Classification and regression 103 4.1 Classifying movie reviews: A binary classification example 105 The IMDB dataset 105 ■ Preparing the data 107 ■ Building your model 108 ■ Validating your approach 110 ■ Using a trained model to generate predictions on new data 113 ■ Further experiments 113 ■ Wrapping up 113 4.2 Classifying newswires: A multiclass classification example 114 The Reuters dataset 114 ■ Preparing the data 116 ■ Building your model 116 ■ Validating your approach 117 ■ Generating predictions on new data 119 ■ A different way to handle the labels and the loss 120 ■ The importance of having sufficiently large intermediate layers 120 ■ Further experiments 121 Wrapping up 121 4.3 Predicting house prices: A regression example 122 The Boston housing price dataset 122 ■ Preparing the data 123 Building your model 123 ■ Validating your approach using K-fold validation 124 ■ Generating predictions on new data 128 Wrapping up 128 5 Fundamentals of machine learning 130 5.1 Generalization: The goal of machine learning 130 Underfitting and overfitting 131 ■ The nature of generalization in deep learning 136 5.2 Evaluating machine learning models 142 Training, validation, and test sets 142 ■ Beating a common-sense baseline 145 ■ Things to keep in mind about model evaluation 146
📄 Page
9
CONTENTSviii 5.3 Improving model fit 146 Tuning key gradient descent parameters 147 ■ Leveraging better architecture priors 149 ■ Increasing model capacity 150 5.4 Improving generalization 152 Dataset curation 152 ■ Feature engineering 153 ■ Using early stopping 154 ■ Regularizing your model 155 6 The universal workflow of machine learning 166 6.1 Define the task 168 Frame the problem 168 ■ Collect a dataset 169 ■ Understand your data 173 ■ Choose a measure of success 173 6.2 Develop a model 174 Prepare the data 174 ■ Choose an evaluation protocol 175 Beat a baseline 176 ■ Scale up: Develop a model that overfits 177 ■ Regularize and tune your model 177 6.3 Deploy the model 178 Explain your work to stakeholders and set expectations 178 ■ Ship an inference model 179 ■ Monitor your model in the wild 182 Maintain your model 183 7 Working with Keras: A deep dive 185 7.1 A spectrum of workflows 186 7.2 Different ways to build Keras models 186 The Sequential model 187 ■ The Functional API 189 Subclassing the Model class 196 ■ Mixing and matching different components 199 ■ Remember: Use the right tool for the job 200 7.3 Using built-in training and evaluation loops 201 Writing your own metrics 202 ■ Using callbacks 204 ■ Writing your own callbacks 205 ■ Monitoring and visualization with TensorBoard 208 7.4 Writing your own training and evaluation loops 210 Training vs. inference 210 ■ Low-level usage of metrics 211 A complete training and evaluation loop 212 ■ Make it fast with tf_function() 215 ■ Leveraging fit() with a custom training loop 216 8 Introduction to deep learning for computer vision 220 8.1 Introduction to convnets 221 The convolution operation 223 ■ The max-pooling operation 228
📄 Page
10
CONTENTS ix 8.2 Training a convnet from scratch on a small dataset 230 The relevance of deep learning for small data problems 230 Downloading the data 231 ■ Building the model 234 ■ Data preprocessing 235 ■ Using data augmentation 241 8.3 Leveraging a pretrained model 245 Feature extraction with a pretrained model 246 ■ Fine-tuning a pretrained model 254 9 Advanced deep learning for computer vision 258 9.1 Three essential computer vision tasks 259 9.2 An image segmentation example 260 9.3 Modern convnet architecture patterns 269 Modularity, hierarchy, and reuse 269 ■ Residual connections 272 ■ Batch normalization 275 ■ Depthwise separable convolutions 278 ■ Putting it together: A mini Xception-like model 280 9.4 Interpreting what convnets learn 282 Visualizing intermediate activations 283 ■ Visualizing convnet filters 289 ■ Visualizing heatmaps of class activation 294 10 Deep learning for time series 301 10.1 Different kinds of time-series tasks 301 10.2 A temperature-forecasting example 302 Preparing the data 306 ■ A common-sense, non–machine learning baseline 310 ■ Let’s try a basic machine learning model 311 Let’s try a 1D convolutional model 314 ■ A first recurrent baseline 316 10.3 Understanding recurrent neural networks 317 A recurrent layer in Keras 320 10.4 Advanced use of recurrent neural networks 324 Using recurrent dropout to fight overfitting 324 ■ Stacking recurrent layers 327 ■ Using bidirectional RNNs 329 Going even further 332 11 Deep learning for text 334 11.1 Natural language processing: The bird’s-eye view 334 11.2 Preparing text data 336 Text standardization 337 ■ Text splitting (tokenization) 338 Vocabulary indexing 339 ■ Using layer_text_vectorization 340
📄 Page
11
CONTENTSx 11.3 Two approaches for representing groups of words: Sets and sequences 344 Preparing the IMDB movie reviews data 345 ■ Processing words as a set: The bag-of-words approach 347 ■ Processing words as a sequence: The sequence model approach 355 11.4 The Transformer architecture 366 Understanding self-attention 366 ■ Multi-head attention 371 The Transformer encoder 372 ■ When to use sequence models over bag-of-words models 381 11.5 Beyond text classification: Sequence-to-sequence learning 382 A machine translation example 383 ■ Sequence-to-sequence learning with RNNs 387 ■ Sequence-to-sequence learning with Transformer 392 12 Generative deep learning 399 12.1 Text generation 401 A brief history of generative deep learning for sequence generation 401 ■ How do you generate sequence data? 402 The importance of the sampling strategy 402 ■ Implementing text generation with Keras 404 ■ A text-generation callback with variable-temperature sampling 408 ■ Wrapping up 413 12.2 DeepDream 414 Implementing DeepDream in Keras 415 ■ Wrapping up 421 12.3 Neural style transfer 422 The content loss 423 ■ The style loss 424 ■ Neural style transfer in Keras 424 ■ Wrapping up 431 12.4 Generating images with variational autoencoders 432 Sampling from latent spaces of images 432 ■ Concept vectors for image editing 433 ■ Variational autoencoders 434 Implementing a VAE with Keras 436 ■ Wrapping up 442 12.5 Introduction to generative adversarial networks 442 A schematic GAN implementation 443 ■ A bag of tricks 444 ■ Getting our hands on the CelebA dataset 445 The discriminator 447 ■ The generator 447 ■ The adversarial network 448 ■ Wrapping up 452 13 Best practices for the real world 454 13.1 Getting the most out of your models 455 Hyperparameter optimization 455 ■ Model ensembling 462
📄 Page
12
CONTENTS xi 13.2 Scaling-up model training 464 Speeding up training on GPU with mixed precision 465 Multi-GPU training 467 ■ TPU training 471 14 Conclusions 473 14.1 Key concepts in review 474 Various approaches to AI 474 ■ What makes deep learning special within the field of machine learning 474 ■ How to think about deep learning 475 ■ Key enabling technologies 476 ■ The universal machine learning workflow 477 ■ Key network architectures 478 ■ The space of possibilities 482 14.2 The limitations of deep learning 484 The risk of anthropomorphizing machine learning models 485 Automatons vs. intelligent agents 487 ■ Local generalization vs. extreme generalization 488 ■ The purpose of intelligence 490 Climbing the spectrum of generalization 491 14.3 Setting the course toward greater generality in AI 492 On the importance of setting the right objective: The shortcut rule 492 ■ A new target 494 14.4 Implementing intelligence: The missing ingredients 495 Intelligence as sensitivity to abstract analogies 496 ■ The two poles of abstraction 497 ■ The two poles of abstraction 500 ■ The missing half of the picture 500 14.5 The future of deep learning 501 Models as programs 502 ■ Machine learning vs. program synthesis 503 ■ Blending together deep learning and program synthesis 503 ■ Lifelong learning and modular subroutine reuse 505 ■ The long-term vision 506 14.6 Staying up-to-date in a fast-moving field 507 Practice on real-world problems using Kaggle 508 ■ Read about the latest developments on arXiv 508 ■ Explore the Keras ecosystem 508 14.7 Final words 509 appendix Python primer for R users 511 index 535
📄 Page
13
xii preface If you’ve picked up this book, you’re probably aware of the extraordinary progress that deep learning has represented for the field of artificial intelligence in the recent past. We went from near-unusable computer vision and natural language processing to highly performant systems deployed at scale in products you use every day. The consequences of this sudden progress extend to almost every industry. We’re already applying deep learning to an amazing range of important problems across domains as different as medical imaging, agriculture, autonomous driving, education, disaster prevention, and manufacturing. Yet, I believe deep learning is still in its early days. It has realized only a small frac- tion of its potential so far. Over time, it will make its way to every problem where it can help—a transformation that will take place over multiple decades. To begin deploying deep learning technology to every problem that it could solve, we need to make it accessible to as many people as possible, including non-experts— people who aren’t researchers or graduate students. For deep learning to reach its full potential, we need to radically democratize it. And today, I believe that we’re at the cusp of a historical transition, where deep learning is moving out of academic labs and the R&D departments of large tech companies to become a ubiquitous part of the toolbox of every developer out there—not unlike the trajectory of web development in the late 1990s. Almost anyone can now build a website or web app for their business or community of a kind that would have required a small team of specialist engineers in 1998. In the not-so-distant future, anyone with an idea and basic coding skills will be able to build smart applications that learn from data.
📄 Page
14
PREFACE xiii When I released the first version of the Keras deep learning framework in March 2015, the democratization of AI wasn’t what I had in mind. I had been doing research in machine learning for several years and had built Keras to help me with my own experiments. But since 2015, hundreds of thousands of newcomers have entered the field of deep learning; many of them picked up Keras as their tool of choice. As I watched scores of smart people use Keras in unexpected, powerful ways, I came to care deeply about the accessibility and democratization of AI. I realized that the fur- ther we spread these technologies, the more useful and valuable they become. Accessi- bility quickly became an explicit goal in the development of Keras, and over a few short years, the Keras developer community has made fantastic achievements on this front. We’ve put deep learning into the hands of hundreds of thousands of people, who in turn are using it to solve problems that were until recently thought to be unsolvable. The book you’re holding is another step on the way to making deep learning avail- able to as many people as possible. Keras had always needed a companion course to simultaneously cover the fundamentals of deep learning, deep learning best practices, and Keras usage patterns. In 2016 and 2017, I did my best to produce such a course, which became the first edition of this book, released in December 2017. It quickly became a machine learning best seller that sold over 50,000 copies and was translated into 12 languages. However, the field of deep learning advances fast. Since the release of the first edi- tion, many important developments have taken place—the release of TensorFlow 2, the growing popularity of the Transformer architecture, and more. And so, in late 2019, I set out to update my book. I originally thought, quite naively, that it would fea- ture about 50% new content and would end up being roughly the same length as the first edition. In practice, after two years of work, it turned out to be over a third lon- ger, with about 75% novel content. More than a refresh, it is a whole new book. I wrote it with a focus on making the concepts behind deep learning, and their implementation, as approachable as possible. Doing so didn’t require me to dumb down anything—I strongly believe that there are no difficult ideas in deep learning. I hope you’ll find this book valuable and that it will enable you to begin building intelli- gent applications and solve the problems that matter to you.
📄 Page
15
xiv acknowledgments First, I’d like to thank the Keras community for making this book possible. Over the past six years, Keras has grown to have hundreds of open source contributors and more than one million users. Your contributions and feedback have turned Keras into what it is today. On a more personal note, I’d like to thank my wife for her endless support during the development of Keras and the writing of this book. I’d also like to thank Google for backing the Keras project. It has been fantastic to see Keras adopted as TensorFlow’s high-level API. A smooth integration between Keras and TensorFlow greatly benefits both TensorFlow users and Keras users and makes deep learning accessible to most. I want to thank the people at Manning who made this book possible: publisher Marjan Bace and everyone on the editorial and production teams, including Michael Stephens, Jennifer Stout, Aleksandar Dragosavljević, Andy Marinkovich, Pamela Hunt, Susan Honeywell, Keri Hales, Paul Wells, and many others who worked behind the scenes. Many thanks go to all the reviewers: Arnaldo Ayala Meyer, Davide Cremonesi, Dhi- nakaran Venkat, Edward Lee, Fernando García Sedano, Joel Kotarski, Marcio Nicolau, Michael Petrey, Peter Henstock, Shahnawaz Ali, Sourav Biswas, Thiago Britto Borges, Tony Dubitsky, Vlad Navitski, and all the other people who sent us feedback. Your sug- gestions helped make this a better book. And on the technical side, special thanks go to Ninoslav Cerkez, who served as the book’s technical proofreader.
📄 Page
16
xv about this book This book was written for anyone who wishes to explore deep learning from scratch or broaden their understanding of deep learning. Whether you’re a practicing machine learning engineer, a data scientist, or a college student, you’ll find value in these pages. You’ll explore deep learning in an approachable way—starting simply, then work- ing up to state-of-the-art techniques. You’ll find that this book strikes a balance between intuition, theory, and hands-on practice. It avoids mathematical notation, preferring instead to explain the core ideas of machine learning and deep learning via detailed code snippets and intuitive mental models. You’ll learn from abundant code examples that include extensive commentary, practical recommendations, and simple high-level explanations of everything you need to know to start using deep learning to solve concrete problems. The code examples use the deep learning framework Keras, with TensorFlow 2 as its numerical engine. They demonstrate modern Keras and TensorFlow 2 best prac- tices as of 2022. After reading this book, you’ll have a solid understand of what deep learning is, when it’s applicable, and what its limitations are. You’ll be familiar with the standard workflow for approaching and solving machine learning problems, and you’ll know how to address commonly encountered issues. You’ll be able to use Keras to tackle real-world problems ranging from computer vision to natural language processing: image classification, image segmentation, time-series forecasting, text classification, machine translation, text generation, and more.
📄 Page
17
ABOUT THIS BOOKxvi Who should read this book? This book is written for people with R programming experience who want to get started with machine learning and deep learning. But this book can also be valuable to many different types of readers: ■ If you’re a data scientist familiar with machine learning, this book will provide you with a solid, practical introduction to deep learning, the fastest growing and most significant subfield of machine learning. ■ If you’re a deep learning researcher or practitioner looking to get started with the Keras framework, you’ll find this book to be the ideal Keras crash course. ■ If you’re a graduate student studying deep learning in a formal setting, you’ll find this book to be a practical complement to your education, helping you build intuition around the behavior of deep neural networks and familiarizing you with key best practices. Even technically-minded people who don’t code regularly will find this book useful as an introduction to both basic and advanced deep learning concepts. To understand the code examples, you’ll need reasonable R proficiency. You don’t need previous experience with machine learning or deep learning: this book covers, from scratch, all the necessary basics. You don’t need an advanced mathematics back- ground, either—high school–level mathematics should suffice to follow along. About the code This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Output from running code is similarly for- matted in fixed-width font, but is also adorned with a vertical gray bar on the left. Throughout the book you'll find code and code outputs interleaved like this: print("R is awesome!") [1] "R is awesome!" In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and listings include line-continuation markers (➥). Additionally, comments in the source code have often been removed from the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts. You can get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/deep-learning-with-r-second-edition/, and as R scripts on GitHub at https://github.com/t-kalinowski/deep-learning-with-R- 2nd-edition-code.
📄 Page
18
ABOUT THIS BOOK xvii liveBook discussion forum Purchase of Deep Learning with R, Second Edition, includes free access to liveBook, Man- ning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the author and other users. To access the forum, go to at https://livebook.manning .com/book/deep-learning-with-r-second-edition/. You can also learn more about Man- ning’s forums and the rules of conduct at https://livebook.manning.com/discussion. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We sug- gest you try asking the author some challenging questions lest their interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print. About the cover illustration The figure on the cover of Deep Learning with R, Second Edition, “Habit of a Chinese Lady in 1700,” is taken from a book by Thomas Jefferys, published between 1757 and 1772. In those days, it was easy to identify where people lived and what their trade or sta- tion in life was just by their dress. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional cul- ture centuries ago, brought back to life by pictures from collections such as this one.
📄 Page
19
xviii about the authors FRANÇOIS CHOLLET is the creator of Keras, one of the most widely used deep learning frameworks. He is currently a software engineer at Google, where he leads the Keras team. In addition, he does research on abstraction, reasoning, and how to achieve greater generality in artificial intelligence. TOMASZ KALINOWSKI is a software engineer at RStudio, where he serves as maintainer of the TensorFlow and Keras R packages. In prior roles, he worked as a scientist and engineer, applying machine learning to a wide variety of datasets and domains. J.J. ALLAIRE is the founder of RStudio and the creator of the RStudio IDE. J.J. is the author of the R interfaces to TensorFlow and Keras.
📄 Page
20
1 What is deep learning? In the past few years, artificial intelligence (AI) has been a subject of intense media hype. Machine learning, deep learning, and AI come up in countless articles, often outside of technology-minded publications. We’re promised a future of intelligent chatbots, self-driving cars, and virtual assistants—a future sometimes painted in a grim light and other times as utopian, where human jobs will be scarce and most economic activity will be handled by robots or AI agents. For a future or current practitioner of machine learning, it’s important to be able to recognize the signal amid the noise, so that you can tell world-changing developments from overhyped press releases. Our future is at stake, and it’s a future in which you have an active role to play: after reading this book, you’ll be one of those who develop those AI systems. So let’s tackle these questions: What has deep learning achieved so far? How significant is it? Where are we headed next? Should you believe the hype? This chapter provides essential context around artificial intelligence, machine learning, and deep learning. This chapter covers High-level definitions of fundamental concepts Time line of the development of machine learning Key factors behind deep learning’s rising popularity and future potential