Statistics
59
Views
0
Downloads
0
Donations
Support
Share
Uploader

高宏飞

Shared on 2025-11-23

AuthorO'Reilly Media

OUTDATED! get the 2nd edition just uploaded to zlib. a LOT happened in the last three years in deep learning Generative modeling is one of the hottest topics in artificial intelligence. Recent advances in the field have shown how it's possible to teach a machine to excel at human endeavors--such as drawing, composing music, and completing tasks--by generating an understanding of how its actions affect its environment. With this practical book, machine learning engineers and data scientists will learn how to recreate some of the most famous examples of generative deep learning models, such as variational autoencoders and generative adversarial networks (GANs). You'll also learn how to apply the techniques to your own datasets. David Foster, cofounder of Applied Data Science, demonstrates the inner workings of each technique, starting with the basics of deep learning before advancing to the most cutting-edge algorithms in the field. Through tips and tricks, you'll learn how to make your models learn more efficiently and become more creative. Get a fundamental overview of generative modeling Learn how to use the Keras and TensorFlow libraries for deep learning Discover how variational autoencoders (VAEs) work Get practical examples of generative adversarial networks (GANs) Understand how to build generative models that learn how to paint, write, and compose Apply generative models within a reinforcement learning setting to accomplish tasks

Tags
No tags
ISBN: 1492041947
Publisher: O'Reilly Media
Publish Year: 2019
Language: 英文
Pages: 300
File Format: PDF
File Size: 29.2 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

David Foster Generative Deep Learning Teaching Machines to Paint, Write, Compose and Play
(This page has no text content)
David Foster Generative Deep Learning Teaching Machines to Paint, Write, Compose, and Play Boston Farnham Sebastopol TokyoBeijing
978-1-492-04194-8 [LSI] Generative Deep Learning by David Foster Copyright © 2019 Applied Data Science Partners Ltd. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Development Editor: Michele Cronin Acquisitions Editor: Jonathan Hassell Production Editor: Katherine Tozer Copyeditor: Rachel Head Proofreader: Charles Roumeliotis Indexer: Judith McConville Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Rebecca Demarest July 2019: First Edition Revision History for the First Edition 2019-06-26: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781492041948 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Generative Deep Learning, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. The views expressed in this work are those of the author, and do not represent the publisher’s views. While the publisher and the author have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the author disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix Part I. Introduction to Generative Deep Learning 1. Generative Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 What Is Generative Modeling? 1 Generative Versus Discriminative Modeling 2 Advances in Machine Learning 4 The Rise of Generative Modeling 5 The Generative Modeling Framework 7 Probabilistic Generative Models 10 Hello Wrodl! 13 Your First Probabilistic Generative Model 14 Naive Bayes 17 Hello Wrodl! Continued 20 The Challenges of Generative Modeling 22 Representation Learning 23 Setting Up Your Environment 27 Summary 29 2. Deep Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Structured and Unstructured Data 31 Deep Neural Networks 33 Keras and TensorFlow 34 Your First Deep Neural Network 35 Loading the Data 35 iii
Building the Model 37 Compiling the Model 41 Training the Model 43 Evaluating the Model 44 Improving the Model 46 Convolutional Layers 46 Batch Normalization 51 Dropout Layers 54 Putting It All Together 55 Summary 59 3. Variational Autoencoders. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 The Art Exhibition 61 Autoencoders 64 Your First Autoencoder 66 The Encoder 66 The Decoder 68 Joining the Encoder to the Decoder 71 Analysis of the Autoencoder 72 The Variational Art Exhibition 75 Building a Variational Autoencoder 78 The Encoder 78 The Loss Function 84 Analysis of the Variational Autoencoder 85 Using VAEs to Generate Faces 86 Training the VAE 87 Analysis of the VAE 91 Generating New Faces 92 Latent Space Arithmetic 93 Morphing Between Faces 94 Summary 95 4. Generative Adversarial Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 Ganimals 97 Introduction to GANs 99 Your First GAN 100 The Discriminator 101 The Generator 103 Training the GAN 107 GAN Challenges 112 Oscillating Loss 112 iv | Table of Contents
Mode Collapse 113 Uninformative Loss 114 Hyperparameters 114 Tackling the GAN Challenges 115 Wasserstein GAN 115 Wasserstein Loss 115 The Lipschitz Constraint 117 Weight Clipping 118 Training the WGAN 119 Analysis of the WGAN 120 WGAN-GP 121 The Gradient Penalty Loss 121 Analysis of WGAN-GP 125 Summary 127 Part II. Teaching Machines to Paint, Write, Compose, and Play 5. Paint. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 Apples and Organges 132 CycleGAN 135 Your First CycleGAN 137 Overview 137 The Generators (U-Net) 139 The Discriminators 142 Compiling the CycleGAN 144 Training the CycleGAN 146 Analysis of the CycleGAN 147 Creating a CycleGAN to Paint Like Monet 149 The Generators (ResNet) 150 Analysis of the CycleGAN 151 Neural Style Transfer 153 Content Loss 154 Style Loss 156 Total Variance Loss 160 Running the Neural Style Transfer 160 Analysis of the Neural Style Transfer Model 161 Summary 162 6. Write. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 The Literary Society for Troublesome Miscreants 166 Table of Contents | v
Long Short-Term Memory Networks 167 Your First LSTM Network 168 Tokenization 168 Building the Dataset 171 The LSTM Architecture 172 The Embedding Layer 172 The LSTM Layer 174 The LSTM Cell 176 Generating New Text 179 RNN Extensions 183 Stacked Recurrent Networks 183 Gated Recurrent Units 185 Bidirectional Cells 187 Encoder–Decoder Models 187 A Question and Answer Generator 190 A Question-Answer Dataset 191 Model Architecture 192 Inference 196 Model Results 198 Summary 200 7. Compose. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201 Preliminaries 202 Musical Notation 202 Your First Music-Generating RNN 205 Attention 206 Building an Attention Mechanism in Keras 208 Analysis of the RNN with Attention 213 Attention in Encoder–Decoder Networks 217 Generating Polyphonic Music 221 The Musical Organ 221 Your First MuseGAN 223 The MuseGAN Generator 226 Chords, Style, Melody, and Groove 227 The Bar Generator 229 Putting It All Together 230 The Critic 232 Analysis of the MuseGAN 233 Summary 235 vi | Table of Contents
8. Play. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 Reinforcement Learning 238 OpenAI Gym 239 World Model Architecture 241 The Variational Autoencoder 242 The MDN-RNN 243 The Controller 243 Setup 244 Training Process Overview 245 Collecting Random Rollout Data 245 Training the VAE 248 The VAE Architecture 249 Exploring the VAE 252 Collecting Data to Train the RNN 255 Training the MDN-RNN 257 The MDN-RNN Architecture 258 Sampling the Next z and Reward from the MDN-RNN 259 The MDN-RNN Loss Function 259 Training the Controller 261 The Controller Architecture 262 CMA-ES 262 Parallelizing CMA-ES 265 Output from the Controller Training 267 In-Dream Training 268 In-Dream Training the Controller 270 Challenges of In-Dream Training 272 Summary 273 9. The Future of Generative Modeling. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Five Years of Progress 275 The Transformer 277 Positional Encoding 279 Multihead Attention 280 The Decoder 283 Analysis of the Transformer 283 BERT 285 GPT-2 285 MuseNet 286 Advances in Image Generation 287 ProGAN 287 Self-Attention GAN (SAGAN) 289 Table of Contents | vii
BigGAN 291 StyleGAN 292 Applications of Generative Modeling 296 AI Art 296 AI Music 297 10. Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303 viii | Table of Contents
Preface What I cannot create, I do not understand. —Richard Feynman An undeniable part of the human condition is our ability to create. Since our earliest days as cave people, we have sought opportunities to generate original and beautiful creations. For early man, this took the form of cave paintings depicting wild animals and abstract patterns, created with pigments placed carefully and methodically onto rock. The Romantic Era gave us the mastery of Tchaikovsky symphonies, with their ability to inspire feelings of triumph and tragedy through sound waves, woven together to form beautiful melodies and harmonies. And in recent times, we have found ourselves rushing to bookshops at midnight to buy stories about a fictional wizard, because the combination of letters creates a narrative that wills us to turn the page and find out what happens to our hero. It is therefore not surprising that humanity has started to ask the ultimate question of creativity: can we create something that is in itself creative? This is the question that generative modeling aims to answer. With recent advances in methodology and technology, we are now able to build machines that can paint origi‐ nal artwork in a given style, write coherent paragraphs with long-term structure, compose music that is pleasant to listen to, and develop winning strategies for com‐ plex games by generating imaginary future scenarios. This is just the start of a gener‐ ative revolution that will leave us with no choice but to find answers to some of the biggest questions about the mechanics of creativity, and ultimately, what it means to be human. In short, there has never been a better time to learn about generative modeling—so let’s get started! ix
Objective and Approach This book covers the key techniques that have dominated the generative modeling landscape in recent years and have allowed us to make impressive progress in creative tasks. As well as covering core generative modeling theory, we will be building full working examples of some of the key models from the literature and walking through the codebase for each, step by step. Throughout the book, you will find short, allegorical stories that help explain the mechanics of some of the models we will be building. I believe that one of the best ways to teach a new abstract theory is to first convert it into something that isn’t quite so abstract, such as a story, before diving into the technical explanation. The individ‐ ual steps of the theory are clearer within this context because they involve people, actions, and emotions, all of which are well understood, rather than neural networks, backpropagation, and loss functions, which are abstract constructs. The story and the model explanation are just the same mechanics explained in two different domains. You might therefore find it useful to refer back to the relevant story while learning about each model. If you are already familiar with a particular technique, then have fun finding the parallels of each model element within the story! In Part I of this book I shall introduce the key techniques that we will be using to build generative models, including an overview of deep learning, variational autoen‐ coders, and generative adversarial networks. In Part II, we will be building on these techniques to tackle several creative tasks, such as painting, writing, and composing music through models such as CycleGAN, encoder–decoder models, and MuseGAN. In addition, we shall see how generative modeling can be used to optimize playing strategy for a game (World Models) and take a look at the most cutting-edge genera‐ tive architectures available today, such as StyleGAN, BigGAN, BERT, GPT-2, and MuseNet. Prerequisites This book assumes that you have experience coding in Python. If you are not familiar with Python, the best place to start is through LearningPython.org. There are many free resources online that will allow you to develop enough Python knowledge to work with the examples in this book. Also, since some of the models are described using mathematical notation, it will be useful to have a solid understanding of linear algebra (for example, matrix multiplica‐ tion, etc.) and general probability theory. Finally, you will need an environment in which to run the code examples from the book’s GitHub repository. I have deliberately ensured that all of the examples in this book do not require prohibitively large amounts of computational resources to train. x | Preface
There is a myth that you need a GPU in order to start training deep learning models —while this is of course helpful and will speed up training, it is not essential. In fact, if you are new to deep learning, I encourage you to first get to grips with the essen‐ tials by experimenting with small examples on your laptop, before spending money and time researching hardware to speed up training. Other Resources Two books I highly recommend as a general introduction to machine learning and deep learning are as follows: • Hands-on Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Geron (O’Reilly) • Deep Learning with Python by Francois Chollet (Manning) Most of the papers in this book are sourced through arXiv, a free repository of scien‐ tific research papers. It is now common for authors to post papers to arXiv before they are fully peer-reviewed. Reviewing the recent submissions is a great way to keep on top of the most cutting-edge developments in the field. I also highly recommend the website Papers with Code, where you can find the latest state-of-the-art results in a variety of machine learning tasks, alongside links to the papers and official GitHub repositories. It is an excellent resource for anyone wanting to quickly understand which techniques are currently achieving the highest scores in a range of tasks and has certainly helped me to decide which techniques to cover in this book. Finally, a useful resource for training deep learning models on accelerated hardware is Google Colaboratory. This is a free Jupyter Notebook environment that requires no setup and runs entirely in the cloud. You can tell the notebook to run on a GPU that is provided for free, for up to 12 hours of runtime. While it is not essential to run the examples in this book on a GPU, it may help to speed up the training process. Either way, Colab is a great way to access GPU resources for free. Preface | xi
Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program ele‐ ments such as variable or function names, databases, data types, environment variables, statements, and keywords. Constant width bold Shows commands or other text that should be typed literally by the user. Constant width italic Shows text that should be replaced with user-supplied values or by values deter‐ mined by context. This element signifies a general note. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/davidADSP/GDL_code. This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing a CD-ROM of examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a signifi‐ cant amount of example code from this book into your product’s documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Generative Deep Learning by David Foster (O’Reilly). Copyright 2019 Applied Data Science Partners Ltd., 978-1-492-04194-8.” xii | Preface
If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For almost 40 years, O’Reilly Media has provided technology and business training, knowledge, and insight to help compa‐ nies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, conferences, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in- depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, please visit http://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) We have a web page for this book, where we list errata, examples, and any additional information. You can access this page at https://oreil.ly/generative-dl. To comment or ask technical questions about this book, send email to bookques‐ tions@oreilly.com. For more information about our books, courses, conferences, and news, see our web‐ site at http://www.oreilly.com. Find us on Facebook: http://facebook.com/oreilly Follow us on Twitter: http://twitter.com/oreillymedia Watch us on YouTube: http://www.youtube.com/oreillymedia Preface | xiii
Acknowledgments There are so many people I would like to thank for helping me write this book. First, I would like to thank everyone who has taken time to technically review the book—in particular, Luba Elliott, Darren Richardson, Eric George, Chris Schon, Sigurður Skúli Sigurgeirsson, Hao-Wen Dong, David Ha, and Lorna Barclay. Also, a huge thanks to my colleagues at Applied Data Science Partners, Ross Witeszc‐ zak, Chris Schon, Daniel Sharp, and Amy Bull. Your patience with me while I have taken time to finish the book is hugely appreciated, and I am greatly looking forward to all the machine learning projects we will complete together in the future! Particular thanks to Ross—had we not decided to start a business together, this book might never have taken shape, so thank you for believing in me as your business partner! I also want to thank anyone who has ever taught me anything mathematical—I was extremely fortunate to have fantastic math teachers at school, who developed my interest in the subject and encouraged me to pursue it further at university. I would like to thank you for your commitment and for going out of your way to share your knowledge of the subject with me. A huge thank you goes to the staff at O’Reilly for guiding me through the process of writing this book. A special thanks goes to Michele Cronin, who has been there at each step, providing useful feedback and sending me friendly reminders to keep com‐ pleting chapters! Also to Katie Tozer, Rachel Head, and Melanie Yarbrough for get‐ ting the book into production, and Mike Loukides for first reaching out to ask if I’d be interested in writing a book. You have all been so supportive of this project from the start, and I want to thank you for providing me with a platform on which to write about something that I love. Throughout the writing process, my family has been a constant source of encourage‐ ment and support. A huge thank you goes to my mum, Gillian Foster, for checking every single line of text for typos and for teaching me how to add up in the first place! Your attention to detail has been extremely helpful while proofreading this book, and I’m really grateful for all the opportunities that both you and dad have given me. My dad, Clive Foster, originally taught me how to program a computer—this book is full of practical examples, and that’s thanks to his early patience while I fumbled around in BASIC trying to make football games as a teenager. My brother, Rob Foster, is the most modest genius you will ever find, particularly within linguistics—chatting with him about AI and the future of text-based machine learning has been amazingly helpful. Last, I would like to thank my Nana, who is a constant source of inspiration and fun for all of us. Her love of literature is one of the reasons I first decided that writing a book would be an exciting thing to do. xiv | Preface
Finally, I would like to thank my fiancée (and soon to be wife) Lorna Barclay. As well as technically reviewing every word of this book, she has provided endless support to me throughout the writing process, making me tea, bringing me various snacks, and generally helping me to make this a better guide to generative modeling through her meticulous attention to detail and deep expertise in statistics and machine learning. I certainly couldn’t have completed this project without you, and I’m grateful for the time you have invested in helping me restructure and expand parts of the book that needed more explanation. I promise I won’t talk about generative modeling at the dinner table for at least a few weeks after it is published. Preface | xv
(This page has no text content)
PART I Introduction to Generative Deep Learning The first four chapters of this book aim to introduce the core techniques that you’ll need to start building generative deep learning models. In Chapter 1, we shall first take a broad look at the field of generative modeling and consider the type of problem that we are trying to solve from a probabilistic perspec‐ tive. We will then explore our first example of a basic probabilistic generative model and analyze why deep learning techniques may need to be deployed as the complexity of the generative task grows. Chapter 2 provides a guide to the deep learning tools and techniques that you will need to start building more complex generative models. This is intended to be a prac‐ tical guide to deep learning rather than a theoretical analysis of the field. In particular, I will introduce Keras, a framework for building neural networks that can be used to construct and train some of the most cutting-edge deep neural network architectures published in the literature. In Chapter 3, we shall take a look at our first generative deep learning model, the var‐ iational autoencoder. This powerful technique will allow us to not only generate real‐ istic faces, but also alter existing images—for example, by adding a smile or changing the color of someone’s hair.
Chapter 4 explores one of the most successful generative modeling techniques of recent years, the generative adversarial network. This elegant framework for structur‐ ing a generative modeling problem is the underlying engine behind most state-of- the-art generative models. We shall see the ways that it has been fine-tuned and adapted to continually push the boundaries of what generative modeling is able to achieve.