Statistics
53
Views
0
Downloads
0
Donations
Support
Share
Uploader

高宏飞

Shared on 2025-12-07

AuthorStevens E.

No description

Tags
No tags
Publish Year: 2024
Language: 英文
File Format: PDF
File Size: 17.4 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

Qi MANNING Eli Stevens Luca Antiga Essential Excerpts
Deep Learning with PyTorch Essential Excerpts Eli Stevens and Luca Antiga Manning Author Picks Copyright 2019 Manning Publications To pre-order or learn more about this book go to www.manning.com
For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on these books when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: Erin Twohey, corp-sales@manning.com ©2019 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. ® Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Mfl Manning Publications Co. 20 Baldwin Road Technical PO Box 761 Shelter Island, NY 11964 Cover designer: Leslie Haimes ISBN: 9781617297120 Printed in the United States of America 1 2 3 4 5 6 7 8 9 10 - EBM - 24 23 22 21 20 19
contents 1 Introducing deep learning and the PyTorch library 1 1.1 What is PyTorch? 2 1.2 What is this book? 2 1.3 Why PyTorch? 3 1.4 PyTorch has the batteries included 10 2It starts with a tensor 15 2.1 Tensor fundamentals 18 2.2 Tensors and storages 22 2.3 Size, storage offset, and strides 24 2.4 Numeric types 30 2.5 Indexing tensors 31 2.6 NumPy interoperability 31 2.7 Serializing tensors 32 2.8 Moving tensors to the GPU 34 2.9 The tensor API 35 3 Real-world data representation with tensors 39 3.1 Tabular data 40 3.2 Time series 49 3.3 Text 54 3.4 Images 60 3.5 Volumetric data 63 iii
CONTENTS 4The mechanics of learning 67 4.1 Learning is parameter estimation 70 4.2 PyTorch’s autograd: Backpropagate all things 83 5 Using a neural network to fit your data 101 5.1 Artificial neurons 102 5.2 The PyTorch nn module 110 5.3 Subclassing nn.Module 120 index 127 iv
v
about the authors Eli Stevens has worked in Silicon Valley for the past 15 years as a software engineer, and the past 7 years as Chief Technical Officer of a startup making medical device software. Luca Antiga is co-founder and CEO of an AI engineering company located in Bergamo, Italy, and a regular contributor to PyTorch. vi
Save 50% on the full book - eBook, pBook, and MEAP. Enter ebstevens50 in the Promotional Code box when you checkout. Only at manning.com. Deep Learning with PyTorch by Eli Stevens and Luca Antiga ISBN 9781617295263 400 pages (estimated) $49.99 Publication in Winter, 2019 (estimated)
Introducing deep learning and the PyTorch library This chapter covers ■ What this book will teach you ■ PyTorch’s role as a library for building deep learning projects ■ The strengths and weaknesses of PyTorch ■ The hardware you’ll need to follow along with the examples We’re living through exciting times. The landscape of what computers can do is changing by the week. Tasks that only a few years ago were thought to require higher cognition are getting solved by machines at near-superhuman levels of per­ formance. Tasks such as describing a photographic image with a sentence in idiom­ atic English, playing complex strategy game, and diagnosing a tumor from a radiological scan are all approachable now by a computer. Even more impressively, computers acquire the ability to solve such tasks through examples, rather than human-encoded of handcrafted rules. It would be disingenuous to assert that machines are learning to “think” in any human sense of the word. Rather, a general class of algorithms is able to approxi- 1
2 Chapter 1 Introducing deep learning and the PyTorch library mate complicated, nonlinear processes extremely effectively. In a way, we’re learning that intelligence, as we subjectively perceive it, is a notion that’s often conflated with self-awareness, and self-awareness definitely isn’t required to solve or carry out these kinds of problems. In the end, the question of computer intelligence may not even be important. As pioneering computer scientist Edsger W. Dijkstra said in “The Threats to Computing Science,” Alan M. Turing thought about . . . the question of whether Machines Can Think, a question . . . about as relevant as the question of whether Submarines Can Swim. That general class of algorithms we’re talking about falls under the category of deep learning, which deals with training mathematical entities named deep neural networks on the basis of examples. Deep learning leverages large amounts of data to approximate complex functions whose inputs and outputs are far apart, such as an image (input) and a line of text describing the input (output); a written script (input) and a natural­ sounding voice reciting the script (output); or, even more simply, associating an image of a golden retriever with a flag that indicates that a golden retriever is present. This capability allows developers to create programs with functionality that until recently was the exclusive domain of human beings. 1.1 What is PyTorch? PyTorch is a library for Python programs that facilitates building deep learning proj­ ects. It emphasizes flexibility and allows deep learning models to be expressed in idi­ omatic Python. This approachability and ease of use found early adopters in the research community, and in the years since the library’s release, it has grown into one of the most prominent deep learning tools for a broad range of applications. PyTorch provides a core data structure, the Tensor, a multidimensional array that has many similarities with NumPy arrays. From that foundation, a laundry list of fea­ tures was built to make it easy to get a project up and running, or to design and train investigation into a new neural network architecture. Tensors accelerate mathematical operations (assuming that the appropriate combination of hardware and software is present), and PyTorch has packages for distributed training, worker processes for effi­ cient data loading, and an extensive library of common deep learning functions. As Python is for programming, PyTorch is both an excellent introduction to deep learning and a tool usable in professional contexts for real-world, high-level work. We believe that PyTorch should be the first deep learning library you learn. Whether it should be the last is a decision that we’ll leave to you. 1.2 What is this book? This book is intended to be a starting point for software engineers, data scientists, and motivated students who are fluent in Python and want to become comfortable using PyTorch to build deep learning projects. To that end, we take a hands-on approach; we encourage you to keep your computer at the ready so that you can play with the examples and take them a step further.
Why PyTorch? 3 Though we stress the practical applications, we also believe that providing an accessible introduction to foundational deep learning tools like PyTorch is more than a way to facilitate the acquisition of new technical skills. It is also a step toward equip­ ping a new generation of scientists, engineers, and practitioners from a wide range of disciplines with a working knowledge of the tools that will be the backbone of many software projects during the decades to come. To get the most out of this book, you need two things: ■ Some experience programming in Python—We’re not going to pull any punches on that one: you’ll need to be up on Python data types, classes, floating-point num­ bers, and the like. ■ Willingness to dive in and get your hands dirty—It’ll be much easier for you to learn if you follow along with us. Deep learning is a huge space. In this book, we’ll be covering a tiny part of that space—specifically, using PyTorch for smaller-scope projects. Most of the motivating examples use image processing of 2D and 3D data sets. We focus on practical PyTorch, with the aim of covering enough ground to allow you to solve realistic problems with deep learning or explore new models as they pop up in research literature. A great resource for the latest publications related to deep learning research is the ArXiV public preprint repository, hosted at https://arxiv.org.1 We also recommed http://www.arxiv-sanity.com to help organize research papers of interest. 1.3 Why PyTorch? As we’ve said, deep learning allows you to carry out a wide range of complicated tasks—such as performing machine translation, playing strategy games, and identify­ ing objects in cluttered scenes—by exposing your model to illustrative examples. To do so in practice, you need tools that are flexible so that they can be adapted to your specific problem and efficient, to allow training to occur over large amounts of data in reasonable times. You also need the trained network to perform correctly in the pres­ ence of uncertainty in the inputs. In this section, we take a look at some of the reasons why we decided to use PyTorch. PyTorch is easy to recommend because of its simplicity. Many researchers and prac­ titioners find it easy to learn, use, extend, and debug. It’s Pythonic, and although (like any complicated domain) it has caveats and best practices, using the library generally feels familiar to developers who have used Python previously. For users who are familiar with NumPy arrays, the PyTorch Tensor class will be immediately familiar. PyTorch feels like NumPy, but with GPU acceleration and auto­ matic computation of gradients, which makes it suitable for calculating backward pass data automatically starting from a forward expression. The Tensor API is such that the additional features of the class relevant to deep learning are unobtrusive; the user is mostly free to pretend that those features don’t exist until need for them arises. * 1
4 Chapter 1 Introducing deep learning and the PyTorch library A design driver for PyTorch is expressivity, allowing a developer to implement com­ plicated models without undue complexity being imposed by the library. (The library isn’t a framework!) PyTorch arguably offers one of the most seamless translations of ideas into Python code in the deep learning landscape. For this reason, PyTorch has seen widespread adoption in research, as witnessed by the high citation counts in international conferences.2 PyTorch also has a compelling story for the transition from research and develop­ ment to production. Although it initially focused on research workflows, PyTorch has been equipped with a high-performance C++ runtime that users can leverage to deploy models for inference without relying on Python, keeping most of the flexibility of PyTorch without paying the overhead of the Python runtime. Claims of ease of use and high performance are trivial to make, of course. We hope that by the time you’re in the thick of this book, you’ll agree that our claims here are well founded. 1.3.1 The deep learning revolution In this section, we take a step back and provide some context for where PyTorch fits into the current and historical landscape of deep learning tools. Until the late 2000s, the broader class of systems that fell into the category “machine learning” relied heavily on feature engineering. Features are transformations of input data resulting in numerical features that facilitate a downstream algorithm, such as a classifier, to produce correct outcomes on new data. Feature engineering aims to take the original data and come up with representations of the same data that can be fed to an algorithm to solve a problem. To tell ones from zeros in images of handwritten digits, for example, you’d come up with a set of filters to estimate the direction of edges over the image and then train a classifier to predict the correct digit, given a dis­ tribution of edge directions. Another useful feature could be the number of enclosed holes in a zero, an eight, or particularly loopy twos. Deep learning, on the other hand, deals with finding such representations auto­ matically, from raw data, to perform a task successfully. In the ones-versus-zeros exam­ ple, filters would be refined during training by iteratively looking at pairs of examples and target labels. This isn’t to say that feature engineering has no place in deep learn­ ing; developers often need to inject some form of knowledge into a learning system. The ability of a neural network to ingest data and extract useful representations on the basis of examples, however, is what makes deep learning so powerful. The focus of deep learning practitioners is not so much on handcrafting those representations but on operating on a mathematical entity so that it discovers representations from the training data autonomously. Often, these automatically created features are better than those that are handcrafted! As in many disruptive technologies, this fact has led to a change in perspective. 2 At ICLR 2019, PyTorch appeared as a citation in 252 papers, up from 87 the previous year and at the same level as TensorFlow, which appeared in 266 papers.
Why PyTorch? 5 On the left side of figure 1.1, a practitioner is busy defining engineering features and feeding them to a learning algorithm. The results of the task will be as good as the features he engineers. On the right side of the figure, with deep learning, the raw data is fed to an algorithm that extracts hierarchical features automatically, based on opti­ mizing the performance of the algorithm on the task. The results will be as good as the practitioner’s ability to drive the algorithm toward its goal. Figure 1.1 The change in perspective brought by deep learning 1.3.2 Immediate versus deferred execution One key differentiator for deep learning libraries is immediate versus deferred execu­ tion. Much of PyTorch’s ease of use is due to how it implements immediate execution, so we briefly cover that implementation here. Consider the expression (a**2 + b**2) ** 0.5 that implements the Pythagorean theorem. If you want to execute this expression, you need to have an a and b handy, like so: > >> a = 3 > >> b = 4 > >> c = (a**2 + b**2) ** 0.5 >>> c 5.0 Immediate execution like this consumes inputs and produces an output value (c here). PyTorch, like Python in general, defaults to immediate execution (referred to as eager mode in the PyTorch documentation). Immediate execution is useful
6 Chapter 1 Introducing deep learning and the PyTorch library because if problems arise in executing the expression, the Python interpreter, debug­ ger, and similar tools have direct access to the Python objects involved. Exceptions can be raised directly at the point where the issue occurred. Alternatively, you could define the Pythagorean expression even before knowing what the inputs are and use that definition to produce the output when the inputs are available. That callable function that you define can be used later, repeatedly, with var­ ied inputs: > >> p = lambda a, b: (a**2 + b**2) ** 0.5 > >> p(1, 2) 2.23606797749979 > >> p(3, 4) 5.0 In the second case, you defined a series of operations to perform, resulting in a out­ put function (p in this case). You didn’t execute anything until later, when you passed in the inputs—an example of deferred execution. Deferred execution means that most exceptions are be raised when the function is called, not when it’s defined. For normal Python (as you see here), that’s fine, because the interpreter and debuggers have full access to the Python state at the time when the error occurred. Things get tricky when specialized classes that have heavy operator overloading are used, allowing what looks like immediate execution to be deferred under the hood. These classes can look like the following: > >> a = InputParameterPlaceholder() > >> b = InputParameterPlaceholder() > >> c = (a**2 + b**2) ** 0.5 >>> callable(c) True > >> c(3, 4) 5.0 Often in libraries that use this form of function definition, the operations of squaring a and b, adding, and taking the square root aren’t recorded as high-level Python byte code. Instead, the point usually is to compile the expression into a static computation graph (a graph of basic operations) that has some advantage over pure Python (such as compiling the math directly to machine code for performance reasons). The fact that the computation graph is built in one place and used in another makes debugging more difficult, because exceptions often lack specificity about what went wrong and Python debugging tools don’t have any visibility into the intermediate states of the data. Also, static graphs usually don’t mix well with standard Python flow control: they’re de-facto domain-specific languages implemented on top of a host lan­ guage (Python in this case). Next, we take a more concrete look at the differences between immediate and deferred execution, specifically regarding issues that are relevant to neural networks. We won’t be teaching these concepts in any depth here, instead giving you a high-level introduction to the terminology and the relationships among these concepts. Under­ standing those concepts and relationships lays the groundwork for understand how
Why PyTorch? 7 libraries like PyTorch that use immediate execution differ from deferred-execution frameworks, even though the underlying math is the same for both types. The fundamental building block of a neural network is a neuron. Neurons are strung together in large numbers to form the network. You see a typical mathematical expression for a single neuron in the first row of figure 1.2: o = tanh(w * x + b). As we explain the execution modes in the following figures, keep these facts in mind: ■ x is the input to the single-neuron computation. ■ w and b are the parameters or weights of the neuron and can be changed as needed. ■ To update the parameters (to produce output that more closely matches what we desire), we assign error to each of the weights via backpropagation and then tweak the weights accordingly. ■ Backpropagation requires computing the gradient of the output with respect to the weights (among other things). ■ We use automatic differentiation to compute the gradient automatically, saving us the trouble of writing the calculations by hand. In figure 1.2, the neuron gets compiled into a symbolic graph in which each node rep­ resents individual operations (second row), using placeholders for inputs and out­ puts. Then the graph is evaluated numerically (third row) when concrete numbers are plugged into the placeholders (in this case, the numbers are the values stored in w, Figure 1.2 Static graph for a simple computation corresponding to a single neuron
8 Chapter 1 Introducing deep learning and the PyTorch library x, and b). The gradient of the output with respect to the weights is constructed sym­ bolically by automatic differentiation, which traverses the graph backward and multi­ plies the gradients at individual nodes (fourth row). The corresponding mathematical expression is shown in the fifth row. One of the major competing deep learning frameworks is TensorFlow, which has a graph mode that uses a similar kind of deferred execution. Graph mode is the default mode of operation in TensorFlow 1.0. By contrast, PyTorch sports a define-by-run dynamic graph engine in which the computation graph is built node by node as the code is eagerly evaluated. The top half of figure 1.3 shows the same calculation running under a dynamic graph engine. The computation is broken into individual expressions, which are greed­ ily evaluated as they’re encountered. The program has no advance notion of the inter­ connection between computations. The bottom half of the figure shows the behind-the- scenes construction of a dynamic computation graph for the same expression. The expression is still broken into individual operations, but here those operations are eagerly evaluated, and the graph is built incrementally. Automatic differentiation is achieved by traversing the resulting graph backward, similar to static computation graphs. Note that this does not mean dynamic graph libraries are inherently more capa- Figure 1.3 Dynamic graph for a simple computation corresponding to a single neuron
Why PyTorch? 9 ble than static graph libraries, just that it’s often easier to accomplish looping or condi­ tional behavior with dynamic graphs. Dynamic graphs can change during successive forward passes. Different nodes can be invoked according to conditions on the outputs of the preceding nodes, for exam­ ple, without a need for such conditions to be represented in the graph itself—a dis­ tinct advantage over static graph approaches. The major frameworks are converging toward supporting both modes of opera­ tion. PyTorch 1.0 gained the ability to record the execution of a model in a static com­ putation graph or define it through a precompiled scripting language, with the goal of improved performance and ease of putting the model into production. TensorFlow has also gained “eager mode,” a new define-by-run API, increasing the library’s flexi­ bility as we have discussed. 1.3.3 The deep learning competitive landscape Although all analogies are flawed, it seems that the release of PyTorch 0.1 in January 2017 marked the transition from a Cambrian Explosion-like proliferation of deep learning libraries, wrappers, and data exchange formats to an era of consolidation and unification. NOTE The deep learning landscape has been moving so quickly lately that by the time you read this book, some aspects may be out of date. If you’re unfa­ miliar with some of the libraries mentioned here, that’s fine. At the time of PyTorch’s first beta release ■ Theano and TensorFlow were the premiere low-level deferred-execution libraries. ■ Lasagne and Keras were high-level wrappers around Theano, with Keras wrap­ ping TensorFlow and CNTK as well. ■ Caffe, Chainer, Dynet, Torch (the Lua-based precursor to PyTorch), mxnet, CNTK, DL4J, and others filled various niches in the ecosystem. In the roughly two years that followed, the landscape changed dramatically. The com­ munity has largely consolidated behind PyTorch or TensorFlow, with the adoption of other libraries dwindling or filling specific niches: ■ Theano, one of the first deep learning frameworks, has ceased active develop­ ment. ■ TensorFlow - Consumed Keras, promoting it to a first-class API - Provided an immediate execution eager mode - Announced that TF 2.0 will enable eager mode by default ■ PyTorch - Consumed Caffe2 for its backend - Replaced most of the low-level code reused from the Lua-based Torch project
10 Chapter 1 Introducing deep learning and the PyTorch library - Added support for ONNX, a vendor-neutral model description and exchange format - Added a delayed execution graph mode runtime called TorchScript - Released version 1.0 TensorFlow has a robust pipeline to production, an extensive industrywide community, and massive mindshare. PyTorch has made huge inroads with the research and teaching community, thanks to its ease of use, and has picked up momentum as researchers and graduates train students and move to industry. Interestingly, with the advent of Torch­ Script and eager mode, both libraries have seen their feature sets start to converge. 1.4 PyTorch has the batteries included We’ve already hinted at a few components of PyTorch. Now we’ll take some time to formalize a high-level map of the main components that form PyTorch. First, PyTorch has the Py from Python, but there’s a lot of non-Python code in it. For performance reasons, most of PyTorch is written in C++ and CUDA3, a C++-like lan­ guage from NVIDIA that can be compiled to run with massive parallelism on NVIDIA GPUs. There are ways to run PyTorch directly from C. One of the main motivations for this capability is providing a reliable strategy for deploying models in production. Most of the time, however, you’ll interact with PyTorch from Python, building models, training them, and using the trained models to solve problems. Depending on a given use case’s requirements for performance and scale, a pure-Python solution can be suf­ ficient to put models into production. It can be perfectly viable to use a Flask web server to wrap a PyTorch model using the Python API, for example. https://www.geforce.com/hardware/technology/cuda Indeed, the Python API is where PyTorch shines in term of usability and integra­ tion with the wider Python ecosystem. Next, we take a peek at the mental model of PyTorch. At its core, PyTorch is a library that provides multidimensional arrays, called tensors in PyTorch parlance, and an extensive library of operations on them is provided by the torch module. Both tensors and related operations can run on the CPU or GPU. Run­ ning on the GPU results in massive speedups compared with CPU (especially if you’re willing to pay for a top-end GPU), and with PyTorch doing so, it doesn’t require more than an additional function call or two. The second core thing that PyTorch provides allows tensors to keep track of the operations performed on them and to compute derivatives of an output with respect to any of its inputs analytically via backpropagation. This capability is provided natively by tensors and further refined in torch.autograd. We could argue that by having tensors and the autograd-enabled tensor standard library, PyTorch could be used for more than neural networks, and we’d be correct: PyTorch can be used for physics, rendering, optimization, simulation, modeling, and so on. We’re likely to see PyTorch being used in creative ways across the spectrum of scientific applications. 3
PyTorch has the batteries included 11 But PyTorch is first and foremost a deep learning library, and as such, it provides all the building blocks needed to build and train neural networks. Figure 1.4 shows a stan­ dard setup that loads data, trains a model, and then deploys that model to production. The core PyTorch modules for building neural networks are located in torch.nn, which provides common neural network layers and other architectural components. Fully connected layers, convolutional layers, activation functions, and loss functions can all be found here. These components can be used to build and initialize the untrained model shown in the center of figure 1.4. Figure 1.4 Basic high-level structure of a PyTorch project, with data loading, training, and deployment to production To train this model, you need a few things (besides the loop itself, which can be a stan­ dard Python for loop): a source of training data, an optimizer to adapt the model to the training data, and a way to get the model and data to the hardware that will be per­ forming the calculations needed for training the model. Utilities for data loading and handling can be found in torch.util.data. The two main classes you’ll work with are Dataset, which acts as the bridge between your cus­ tom data (in whatever format it might be in), and a standardized PyTorch Tensor. The other class you’ll see a lot of is DataLoader, which can spawn child processes to load data from a Dataset in the background so that it’s ready and waiting for the training loop as soon as the loop can use it.
12 Chapter 1 Introducing deep learning and the PyTorch library In the simplest case, the model will be running the required calculations on the local CPU or on a single GPU, so when the training loop has the data, computation can start immediately. It’s more common, however, to want to use specialized hard­ ware such as multiple GPUs or to have multiple machines contribute their resources to training the model. In those cases, torch.nn.DataParallel and torch.distrib- uted can be employed to leverage the additional hardware available. When you have results from running your model on the training data, torch.optim provides standard ways of updating the model so that the output starts to more closely resemble the answers specified in the training data. As mentioned earlier, PyTorch defaults to an immediate execution model (eager mode). Whenever an instruction involving PyTorch is executed by the Python inter­ preter, the corresponding operation is immediately carried out by the underlying C++ or CUDA implementation. As more instructions operate on tensors, more operations are executed by the backend implementation. This process is as fast as it typically can be on the C++ side, but it incurs the cost of calling that implementation through Python. This cost is minute, but it adds up. To bypass the cost of the Python interpreter and offer the opportunity to run mod­ els independently from a Python runtime, PyTorch also provides a deferred execution model named TorchScript. Using TorchScript, PyTorch can serialize a set of instruc­ tions that can be invoked independently from Python. You can think of this model as being a virtual machine with a limited instruction set specific to tensor operations. Besides not incurring the costs of calling into Python, this execution mode gives PyTorch the opportunity to Just in Time (JIT) transform sequences of known opera­ tions into more efficient fused operations. These features are the basis of the produc­ tion deployment capabilities of PyTorch. 1.4.1 Hardware for deep learning Running a pretrained network on new data is within the capabilities of any recent lap­ top or personal computer. Even retraining a small portion of a pretrained network to specialize it on a new data set doesn’t necessarily require specialized hardware. You can follow along with this book on a standard personal computer or laptop. We antici­ pate, however, that completing a full training run for more-advanced examples will require a CUDA-capable graphical processing unit (GPU), such as a GPU with 8GB of RAM (we suggest an NVIDIA GTX 1070 or better). But those parameters can be adjusted if your hardware has less RAM available. To be clear: such hardware isn’t mandatory if you’re willing to wait, but running on a GPU cuts training time by at least an order of magnitude (and usually is 40 to 50 times faster). Taken individually, the operations required to compute parameter updates are fast (from fractions of a second to a few seconds) on modern hardware such as a typical laptop CPU. The issue is that training involves running these operations over and over, many times, incrementally updating the network parameters to minimize training error. Moderately large networks can take hours to days to train from scratch on large, real-world data sets on workstations equipped with good GPUs. That time can be