Math for Deep Learning What You Need to Know to Understand Neural Networks (Ronald T. Kneusel) (Z-Library)

(This page has no text content)

MATH FOR DEEP LEARNING What You Need to Know to Understand Neural Networks by Ronald T. Kneusel San Francisco

MATH FOR DEEP LEARNING. Copyright © 2022 by Ronald T. Kneusel. All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher. ISBN-13: 978-1-7185-0190-4 (print) ISBN-13: 978-1-7185-0191-1 (ebook) Publisher: William Pollock Production Manager: Rachel Monaghan Production Editors: Dapinder Dosanjh and Katrina Taylor Developmental Editor: Alex Freed Cover Illustrator: James L. Barry Cover and Interior Design: Octopod Studios Technical Reviewer: David Gorodetzky Copyeditor: Carl Quesnel Proofreader: Emelie Battaglia For information on book distributors or translations, please contact No Starch Press, Inc. directly: No Starch Press, Inc. 245 8th Street, San Francisco, CA 94103 phone: 415.863.9900; fax: 415.863.9950; info@nostarch.com; www.nostarch.com Library of Congress Control Number: 2021939724 No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The information in this book is distributed on an “As Is” basis, without warranty. While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it.

In memory of Tom “Fitz” Fitzpatrick (1944–2013), the best math teacher I ever had. And to all the math teachers out there—they receive far too little appreciation for all of their hard work.

About the Author Ron Kneusel has been working with machine learning in industry since 2003 and completed a PhD in machine learning from the University of Colorado, Boulder, in 2016. Ron has three other books: Practical Deep Learning: A Python-Based Introduction (No Starch Press), Numbers and Computers (Springer), and Random Numbers and Computers (Springer).

About the Technical Reviewer David Gorodetzky is a research scientist who works at the intersection of remote sensing and machine learning. Since 2011 he has led a small research group within a large government-services engineering firm that develops deep learning solutions for a wide variety of problems in remote sensing. David began his career in planetary geology and geophysics, detoured into environmental consulting, then studied paleoclimate reconstruction from polar ice cores in graduate school, before settling into a career in satellite remote sensing. For more than 15 years he was a principal consultant for a software services group developing image analysis and signal processing algorithms for clients across diverse fields, including aerospace, precision agriculture, reconnaissance, biotech, and cosmetics.

BRIEF CONTENTS Foreword Acknowledgments Introduction Chapter 1: Setting the Stage Chapter 2: Probability Chapter 3: More Probability Chapter 4: Statistics Chapter 5: Linear Algebra Chapter 6: More Linear Algebra Chapter 7: Differential Calculus Chapter 8: Matrix Calculus Chapter 9: Data Flow in Neural Networks Chapter 10: Backpropagation Chapter 11: Gradient Descent Appendix: Going Further Index

CONTENTS IN DETAIL FOREWORD ACKNOWLEDGMENTS INTRODUCTION Who Is This Book For? About This Book 1 SETTING THE STAGE Installing the Toolkits Linux macOS Windows NumPy Defining Arrays Data Types 2D Arrays Zeros and Ones Advanced Indexing Reading and Writing to Disk SciPy Matplotlib Scikit-Learn Summary 2 PROBABILITY Basic Concepts Sample Space and Events Random Variables

Humans Are Bad at Probability The Rules of Probability Probability of an Event Sum Rule Product Rule Sum Rule Revisited The Birthday Paradox Conditional Probability Total Probability Joint and Marginal Probability Joint Probability Tables Chain Rule for Probability Summary 3 MORE PROBABILITY Probability Distributions Histograms and Probabilities Discrete Probability Distributions Continuous Probability Distributions Central Limit Theorem The Law of Large Numbers Bayes’ Theorem Cancer or Not Redux Updating the Prior Bayes’ Theorem in Machine Learning Summary 4 STATISTICS Types of Data Nominal Data Ordinal Data Interval Data Ratio Data

Using Nominal Data in Deep Learning Summary Statistics Means and Median Measures of Variation Quantiles and Box Plots Missing Data Correlation Pearson Correlation Spearman Correlation Hypothesis Testing Hypotheses The t-test The Mann-Whitney U Test Summary 5 LINEAR ALGEBRA Scalars, Vectors, Matrices, and Tensors Scalars Vectors Matrices Tensors Arithmetic with Tensors Array Operations Vector Operations Matrix Multiplication Kronecker Product Summary 6 MORE LINEAR ALGEBRA Square Matrices Why Square Matrices? Transpose, Trace, and Powers Special Square Matrices

The Identity Matrix Determinants Inverses Symmetric, Orthogonal, and Unitary Matrices Definiteness of a Symmetric Matrix Eigenvectors and Eigenvalues Finding Eigenvalues and Eigenvectors Vector Norms and Distance Metrics L-Norms and Distance Metrics Covariance Matrices Mahalanobis Distance Kullback-Leibler Divergence Principal Component Analysis Singular Value Decomposition and Pseudoinverse SVD in Action Two Applications Summary 7 DIFFERENTIAL CALCULUS Slope Derivatives A Formal Definition Basic Rules Rules for Trigonometric Functions Rules for Exponentials and Logarithms Minima and Maxima of Functions Partial Derivatives Mixed Partial Derivatives The Chain Rule for Partial Derivatives Gradients Calculating the Gradient Visualizing the Gradient Summary

8 MATRIX CALCULUS The Formulas A Vector Function by a Scalar Argument A Scalar Function by a Vector Argument A Vector Function by a Vector A Matrix Function by a Scalar A Scalar Function by a Matrix The Identities A Scalar Function by a Vector A Vector Function by a Scalar A Vector Function by a Vector A Scalar Function by a Matrix Jacobians and Hessians Concerning Jacobians Concerning Hessians Some Examples of Matrix Calculus Derivatives Derivative of Element-Wise Operations Derivative of the Activation Function Summary 9 DATA FLOW IN NEURAL NETWORKS Representing Data Traditional Neural Networks Deep Convolutional Networks Data Flow in Traditional Neural Networks Data Flow in Convolutional Neural Networks Convolution Convolutional Layers Pooling Layers Fully Connected Layers Data Flow Through a Convolutional Neural Network Summary

10 BACKPROPAGATION What Is Backpropagation? Backpropagation by Hand Calculating the Partial Derivatives Translating into Python Training and Testing the Model Backpropagation for Fully Connected Networks Backpropagating the Error Calculating Partial Derivatives of the Weights and Biases A Python Implementation Using the Implementation Computational Graphs Summary 11 GRADIENT DESCENT The Basic Idea Gradient Descent in One Dimension Gradient Descent in Two Dimensions Stochastic Gradient Descent Momentum What Is Momentum? Momentum in 1D Momentum in 2D Training Models with Momentum Nesterov Momentum Adaptive Gradient Descent RMSprop Adagrad and Adadelta Adam Some Thoughts About Optimizers Summary Epilogue

APPENDIX: GOING FURTHER Probability and Statistics Linear Algebra Calculus Deep Learning INDEX

FOREWORD Artificial intelligence (AI) is ubiquitous. You need look no further than the device in your pocket for evidence—your phone now offers facial recognition security, obeys simple voice commands, digitally blurs backgrounds in your selfies, and quietly learns your interests to give you a personalized experience. AI models are being used to analyze mountains of data to efficiently create vaccines, improve robotic manipulation, build autonomous vehicles, harness the power of quantum computing, and even adjust to your proficiency in online chess. Industry is adapting to ensure state-of-the-art AI capabilities can be integrated into its domain expertise, and academia is building curriculum that exposes concepts of artificial intelligence to each degree-based discipline. An age of machine-driven cognitive autonomy is upon us, and while we are all consumers of AI, those expressing an interest in its development need to understand what is responsible for its substantial growth over the past decade. Deep learning, a subcategory of machine learning, leverages very deep neural networks to model complicated systems that have historically posed problems for traditional, analytical methods. A newfound practical use of these deep neural networks is directly responsible for this surge in development of AI, a concept that most would attribute to Alan Turing back in the 1950s. But if deep learning is the engine for AI, what is the engine for deep learning? Deep learning draws on many important concepts from science, technology, engineering, and math (STEM) fields. Industry recruiters continue to seek a formal definition of its constituents as they try to attract top talent with more descriptive job requisitions. Similarly, academic program coordinators are tasked with developing the curriculum that builds this skill set as it permeates across disciplines. While inherently interdisciplinary in practice, deep learning is built on a foundation of core mathematical principles from probability and statistics, linear algebra, and calculus. The degree to which an individual must embrace and understand these principles depends on the level of intimacy one expects to have with deep learning technologies. For the implementer, Math for Deep Learning acts as a troubleshooting guide for the inevitable challenges encountered in deep neural network implementation. This individual is typically concerned with efficient implementation of preexisting solutions with tasks including identification and procurement of open source code, setting up a suitable work environment,

running any available unit tests, and finally, retraining with relevant data for the application of interest. These deep neural networks may contain tens or hundreds of millions of learnable parameters, and assuming adequate user proficiency, successful optimization relies on sensitive hyperparameter selection and access to training data that sufficiently represents the population. The first (and second, and third) attempt at implementation often requires a daunting journey into neural network interrogation, which requires dissection into and higher-level understanding of the mathematical drivers presented here. At some point, the implementer usually becomes the integrator. This level of expertise requires some familiarity with the desired application domain and a lower-level understanding of the building blocks that enable deep learning. In addition to the challenges faced in basic implementation, the integrator needs to be able to generalize core concepts to mold a mathematical model to the desired domain. Disaster strikes again! Perhaps the individual experiences the exploding-gradient problem. Maybe the integrator desires a more representative loss function that may pose differentiability issues. Or maybe, during training, the individual recognizes that the selected optimization strategy is ineffective for the problem. Math for Deep Learning fills a void within the community by offering a coherent overview of the critical mathematical concepts that compose deep learning and helps overcome these obstacles. The integrator becomes the innovator when comfort with the subject matter allows the individual to be truly creative. With innovation comes the need for information dissemination, often requiring time away from practical development for publication, presentation, and a fair amount of teaching. Math for Deep Learning serves as a handbook to the foundation that the innovator holds in high esteem, providing quick references and reminders of seeds that yield new developments in artificial intelligence. Just as these roles build upon each other, deep learning creates its own hierarchy, one of nonintuitive concepts or features that solve a specific task. The sheer scope of the problem can be overwhelming without dedicated focus. Dr. Kneusel has over 15 years of industry experience applying machine learning and deep learning to image generation and exploitation problems, and he created Math for Deep Learning to consolidate and emphasize what matters most: the mathematical foundation from which all neural network solutions are made possible. No textbook is complete, and this one presents other resources that expound on the topics of statistics, linear algebra, and calculus. Math for Deep Learning is for the individual seeking a self-contained, concentrated overview of the components that build the mathematical engine for AI’s primary tool.

Derek J. Walvoord, PhD

ACKNOWLEDGMENTS I am a Bear of Very Little Brain, and long words Bother me. —Winnie the Pooh This book isn’t just the result of my own efforts. Sincere thanks and acknowledgment are in order. First, I want to thank all the excellent folks at No Starch Press for the opportunity to work with them again. They are all genuinely consummate professionals and a joy to interact with—and that goes double for my editor, Alex Freed. Once again, she has taken my rambling prose and finessed it into something clear and coherent. I also want to thank my friend David Gorodetzky for his expert technical review. David’s suggestions, and subtle way of pointing out goofs, have made the book stronger. If any errors remain, they are entirely my fault for not being wise enough to listen to David’s sage advice.

INTRODUCTION Math is essential to the modern world. Deep learning is also rapidly becoming essential. From the promise of self-driving cars to medical systems detecting fractures better than all but the very best physicians, to say nothing of increasingly capable, and possibly worrisome, voice-controlled assistants, deep learning is everywhere. This book covers the essential math for making deep learning comprehensible. It’s true that you can learn the toolkits, set up the configuration files or Python code, format some data, and train a model, all without understanding what you’re doing, let alone the math behind it. And, because of the power of deep learning, you’ll often be successful. However, you won’t understand, and you shouldn’t be satisfied. To understand, you need some math. Not a lot of math, but some specific math. In particular, you’ll need working knowledge of topics in probability, statistics, linear algebra, and differential calculus. Fortunately, those are the very topics this book happens to address. Who Is This Book For? This is not an introductory deep learning book. It will not teach you the basics of deep learning. Instead, it’s meant as an adjunct to such a book. (See my book Practical Deep Learning: A Python-Based Introduction [No Starch Press, 2021].) I expect you to be familiar with deep learning, at least conceptually, though I’ll explain things along the way. Additionally, I expect you to bring certain knowledge to the table. I expect you to know high school mathematics, in particular algebra. I also expect you to

be familiar with programming using Python, R, or a similar language. We’ll be using Python 3.x and some of its popular toolkits, such as NumPy, SciPy, and scikit-learn. I’ve attempted to keep other expectations to a minimum. After all, the point of the book is to give you what you need to be successful in deep learning. About This Book At its core, this is a math book. But instead of proofs and practice exercises, we’ll use code to illustrate the concepts. Deep learning is an applied discipline that you need to do to be able to understand. Therefore, we’ll use code to bridge the gap between pure mathematical knowledge and practice. The chapters build one upon the other, with foundational chapters followed by more advanced math topics and, ultimately, deep learning algorithms that make use of everything covered in the earlier chapters. I recommend reading the book straight through and, if you wish, skipping topics you’re already familiar with as you encounter them. Chapter 1: Setting the Stage This chapter configures our working environment and the toolkits we’ll use, which are those used most often in deep learning. Chapter 2: Probability Probability affects almost all aspects of deep learning and is essential to understanding how neural networks learn. This chapter, the first of two on this subject, introduces fundamental topics in probability. Chapter 3: More Probability Probability is so important that one chapter isn’t enough. This chapter continues our exploration and includes key deep learning topics, like probability distributions and Bayes’ theorem. Chapter 4: Statistics Statistics make sense of data and are crucial for evaluating models. Statistics go hand in hand with probability, so we need to understand statistics to understand deep learning. Chapter 5: Linear Algebra Linear algebra is the world of vectors and matrices. Deep learning is, at its core, linear algebra–focused. Implementing neural networks is an exercise in vector and matrix mathematics, so it is essential to understand what these concepts represent and how to work with them.

Statistics

Uploader

Math for Deep Learning What You Need to Know to Understand Neural Networks (Ronald T. Kneusel) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Statistics

Uploader

Math for Deep Learning What You Need to Know to Understand Neural Networks (Ronald T. Kneusel) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment