Statistics
81
Views
0
Downloads
0
Donations
Support
Share
Uploader

高宏飞

Shared on 2025-11-22

AuthorNikhil Buduma

We're in the midst of an AI research explosion. Deep learning has unlocked superhuman perception that has powered our push toward self-driving vehicles, the ability to defeat human experts at a variety of difficult games including Go and Starcraft, and even generate essays with shockingly coherent prose. But deciphering these breakthroughs often takes a Ph.D. education in machine learning and mathematics. This updated second edition describes the intuition behind these innovations without the jargon and complexity. By the end of this book, Python-proficient programmers, software engineering professionals, and computer science majors will be able to re-implement these breakthroughs on their own and reason about them with a level of sophistication that rivals some of the best in the field. New chapters cover recent advancements in the fields of generative modeling and interpretability. Code examples throughout the book are updated to TensorFlow 2 and PyTorch 1.4.

Tags
No tags
ISBN: 149208218X
Publisher: O'Reilly Media
Publish Year: 2022
Language: 英文
Pages: 390
File Format: PDF
File Size: 15.9 MB
Support Statistics
¥.00 · 0times
Text Preview (First 20 pages)
Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

(This page has no text content)
MACHINE LE ARNING “This book provides a great way to start off with deep learning, with plenty of examples and well-explained concepts. A perfect book for readers of all levels who are interested in the domain.” —Vishwesh Ravi Shrimali ADAS Engineer Fundamentals of Deep Learning US $69.99 CAN $87.99 ISBN: 978-1-492-08128-7 Twitter: @oreillymedia linkedin.com/company/oreilly-media youtube.com/oreillymedia We’re in the midst of an AI research explosion. Deep learning has unlocked superhuman perception to power our push toward creating self-driving vehicles, defeating human experts at a variety of difficult games including Go, and even generating essays with shockingly coherent prose. But deciphering these breakthroughs often takes a PhD in machine learning and mathematics. The updated second edition of this book describes the intuition behind these innovations without jargon or complexity. Python-proficient programmers, software engineering professionals, and computer science majors will be able to reimplement these breakthroughs on their own and reason about them with a level of sophistication that rivals some of the best developers in the field. • Learn the mathematics behind machine learning jargon • Examine the foundations of machine learning and neural networks • Manage problems that arise as you begin to make networks deeper • Build neural networks that analyze complex images • Perform effective dimensionality reduction using autoencoders • Dive deep into sequence analysis to examine language • Explore methods in interpreting complex machine learning models • Gain theoretical and practical knowledge on generative modeling • Understand the fundamentals of reinforcement learning Nithin Buduma is a machine learning scientist at Cresta, a leader in the contact center intelligence space. Nikhil Buduma is cofounder and chief scientist of Ambience Healthcare, a San Francisco-based company that makes autonomous technologies for healthcare delivery. Joe Papa, founder of TeachMe.AI, has over 25 years of experience in research and development. He’s led AI research teams with PyTorch at Booz Allen and Perspecta Labs. B ud um a , B ud um a & Pa p a
Nithin Buduma, Nikhil Buduma, and Joe Papa with contributions by Nicholas Locascio Fundamentals of Deep Learning Designing Next-Generation Machine Intelligence Algorithms SECOND EDITION Boston Farnham Sebastopol TokyoBeijing
978-1-492-08218-7 LSI Fundamentals of Deep Learning by Nithin Buduma, Nikhil Buduma, and Joe Papa Copyright © 2022 Nithin Buduma and Mobile Insights Technology Group, LLC. All rights reserved. Printed in the United States of America. Published by O’Reilly Media, Inc., 1005 Gravenstein Highway North, Sebastopol, CA 95472. O’Reilly books may be purchased for educational, business, or sales promotional use. Online editions are also available for most titles (http://oreilly.com). For more information, contact our corporate/institutional sales department: 800-998-9938 or corporate@oreilly.com. Acquisitions Editor: Rebecca Novack Development Editor: Melissa Potter Production Editor: Katherine Tozer Copyeditor: Sonia Saruba Proofreader: Stephanie English Indexer: Judith McConville Interior Designer: David Futato Cover Designer: Karen Montgomery Illustrator: Kate Dullea June 2017: First Edition May 2022: Second Edition Revision History for the Second Edition 2022-05-16: First Release See http://oreilly.com/catalog/errata.csp?isbn=9781492082187 for release details. The O’Reilly logo is a registered trademark of O’Reilly Media, Inc. Fundamentals of Deep Learning, the cover image, and related trade dress are trademarks of O’Reilly Media, Inc. While the publisher and the authors have used good faith efforts to ensure that the information and instructions contained in this work are accurate, the publisher and the authors disclaim all responsibility for errors or omissions, including without limitation responsibility for damages resulting from the use of or reliance on this work. Use of the information and instructions contained in this work is at your own risk. If any code samples or other technology this work contains or describes is subject to open source licenses or the intellectual property rights of others, it is your responsibility to ensure that your use thereof complies with such licenses and/or rights.
Table of Contents Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ix 1. Fundamentals of Linear Algebra for Deep Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Data Structures and Operations 1 Matrix Operations 3 Vector Operations 6 Matrix-Vector Multiplication 7 The Fundamental Spaces 7 The Column Space 7 The Null Space 10 Eigenvectors and Eigenvalues 13 Summary 15 2. Fundamentals of Probability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Events and Probability 17 Conditional Probability 20 Random Variables 22 Expectation 24 Variance 25 Bayes’ Theorem 27 Entropy, Cross Entropy, and KL Divergence 29 Continuous Probability Distributions 32 Summary 36 3. The Neural Network. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Building Intelligent Machines 39 The Limits of Traditional Computer Programs 40 The Mechanics of Machine Learning 41 iii
The Neuron 45 Expressing Linear Perceptrons as Neurons 47 Feed-Forward Neural Networks 48 Linear Neurons and Their Limitations 51 Sigmoid, Tanh, and ReLU Neurons 51 Softmax Output Layers 54 Summary 54 4. Training Feed-Forward Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 The Fast-Food Problem 55 Gradient Descent 57 The Delta Rule and Learning Rates 58 Gradient Descent with Sigmoidal Neurons 60 The Backpropagation Algorithm 61 Stochastic and Minibatch Gradient Descent 63 Test Sets, Validation Sets, and Overfitting 65 Preventing Overfitting in Deep Neural Networks 71 Summary 76 5. Implementing Neural Networks in PyTorch. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 Introduction to PyTorch 77 Installing PyTorch 77 PyTorch Tensors 78 Tensor Init 78 Tensor Attributes 79 Tensor Operations 80 Gradients in PyTorch 83 The PyTorch nn Module 84 PyTorch Datasets and Dataloaders 87 Building the MNIST Classifier in PyTorch 89 Summary 93 6. Beyond Gradient Descent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95 The Challenges with Gradient Descent 95 Local Minima in the Error Surfaces of Deep Networks 96 Model Identifiability 97 How Pesky Are Spurious Local Minima in Deep Networks? 98 Flat Regions in the Error Surface 101 When the Gradient Points in the Wrong Direction 104 Momentum-Based Optimization 106 A Brief View of Second-Order Methods 109 Learning Rate Adaptation 111 iv | Table of Contents
AdaGrad—Accumulating Historical Gradients 111 RMSProp—Exponentially Weighted Moving Average of Gradients 112 Adam—Combining Momentum and RMSProp 113 The Philosophy Behind Optimizer Selection 115 Summary 116 7. Convolutional Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117 Neurons in Human Vision 117 The Shortcomings of Feature Selection 118 Vanilla Deep Neural Networks Don’t Scale 121 Filters and Feature Maps 122 Full Description of the Convolutional Layer 127 Max Pooling 131 Full Architectural Description of Convolution Networks 132 Closing the Loop on MNIST with Convolutional Networks 134 Image Preprocessing Pipelines Enable More Robust Models 136 Accelerating Training with Batch Normalization 137 Group Normalization for Memory Constrained Learning Tasks 139 Building a Convolutional Network for CIFAR-10 141 Visualizing Learning in Convolutional Networks 143 Residual Learning and Skip Connections for Very Deep Networks 147 Building a Residual Network with Superhuman Vision 149 Leveraging Convolutional Filters to Replicate Artistic Styles 152 Learning Convolutional Filters for Other Problem Domains 154 Summary 155 8. Embedding and Representation Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157 Learning Lower-Dimensional Representations 157 Principal Component Analysis 158 Motivating the Autoencoder Architecture 160 Implementing an Autoencoder in PyTorch 161 Denoising to Force Robust Representations 171 Sparsity in Autoencoders 174 When Context Is More Informative than the Input Vector 177 The Word2Vec Framework 179 Implementing the Skip-Gram Architecture 182 Summary 188 9. Models for Sequence Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 Analyzing Variable-Length Inputs 189 Tackling seq2seq with Neural N-Grams 190 Implementing a Part-of-Speech Tagger 192 Table of Contents | v
Dependency Parsing and SyntaxNet 197 Beam Search and Global Normalization 203 A Case for Stateful Deep Learning Models 206 Recurrent Neural Networks 207 The Challenges with Vanishing Gradients 210 Long Short-Term Memory Units 213 PyTorch Primitives for RNN Models 218 Implementing a Sentiment Analysis Model 219 Solving seq2seq Tasks with Recurrent Neural Networks 224 Augmenting Recurrent Networks with Attention 227 Dissecting a Neural Translation Network 230 Self-Attention and Transformers 239 Summary 242 10. Generative Models. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243 Generative Adversarial Networks 244 Variational Autoencoders 249 Implementing a VAE 259 Score-Based Generative Models 264 Denoising Autoencoders and Score Matching 269 Summary 274 11. Methods in Interpretability. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 Overview 275 Decision Trees and Tree-Based Algorithms 276 Linear Regression 280 Methods for Evaluating Feature Importance 281 Permutation Feature Importance 281 Partial Dependence Plots 282 Extractive Rationalization 283 LIME 288 SHAP 292 Summary 297 12. Memory Augmented Neural Networks. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 299 Neural Turing Machines 299 Attention-Based Memory Access 301 NTM Memory Addressing Mechanisms 303 Differentiable Neural Computers 307 Interference-Free Writing in DNCs 309 DNC Memory Reuse 310 Temporal Linking of DNC Writes 311 vi | Table of Contents
Understanding the DNC Read Head 312 The DNC Controller Network 313 Visualizing the DNC in Action 314 Implementing the DNC in PyTorch 317 Teaching a DNC to Read and Comprehend 321 Summary 323 13. Deep Reinforcement Learning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325 Deep Reinforcement Learning Masters Atari Games 325 What Is Reinforcement Learning? 326 Markov Decision Processes 328 Policy 329 Future Return 330 Discounted Future Return 331 Explore Versus Exploit 331 ϵ-Greedy 333 Annealed ϵ-Greedy 333 Policy Versus Value Learning 334 Pole-Cart with Policy Gradients 335 OpenAI Gym 335 Creating an Agent 335 Building the Model and Optimizer 337 Sampling Actions 337 Keeping Track of History 337 Policy Gradient Main Function 338 PGAgent Performance on Pole-Cart 340 Trust-Region Policy Optimization 341 Proximal Policy Optimization 345 Q-Learning and Deep Q-Networks 347 The Bellman Equation 347 Issues with Value Iteration 348 Approximating the Q-Function 348 Deep Q-Network 348 Training DQN 349 Learning Stability 349 Target Q-Network 350 Experience Replay 350 From Q-Function to Policy 350 DQN and the Markov Assumption 351 DQN’s Solution to the Markov Assumption 351 Playing Breakout with DQN 351 Building Our Architecture 354 Table of Contents | vii
Stacking Frames 354 Setting Up Training Operations 354 Updating Our Target Q-Network 354 Implementing Experience Replay 355 DQN Main Loop 356 DQNAgent Results on Breakout 358 Improving and Moving Beyond DQN 358 Deep Recurrent Q-Networks 359 Asynchronous Advantage Actor-Critic Agent 359 UNsupervised REinforcement and Auxiliary Learning 360 Summary 361 Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 viii | Table of Contents
Preface With the reinvigoration of neural networks in the 2000s, deep learning has become an extremely active area of research that is paving the way for modern machine learning. This book uses exposition and examples to help you understand major concepts in this complicated field. Large companies such as Google, Microsoft, and Facebook have taken notice and are actively growing in-house deep learning teams. For the rest of us, deep learning is still a pretty complex and difficult subject to grasp. Research papers are filled to the brim with jargon, and scattered online tutorials do little to help build a strong intuition for why and how deep learning practitioners approach problems. Our goal is to bridge this gap. In this second edition, we provide more rigorous background sections in mathemat‐ ics with the aim of better equipping you for the material in the rest of the book. In addition, we have updated chapters in sequence analysis, computer vision, and reinforcement learning with deep dives into the latest advancements in the fields. And finally, we have added new chapters in the fields of generative modeling and interpretability to provide you with a broader view of the field of deep learning. We hope that these updates inspire you to practice deep learning on their own and apply their learnings to solve meaningful problems in the real world. Prerequisites and Objectives This book is aimed at an audience with a basic operating understanding of calculus and Python programming. In this latest edition, we provide extensive mathematical background chapters, specifically in linear algebra and probability, to prepare you for the material that lies ahead. By the end of the book, we hope you will be left with an intuition for how to approach problems using deep learning, the historical context for modern deep learning approaches, and a familiarity with implementing deep learning algorithms using the PyTorch open source library. ix
How Is This Book Organized? The first chapters of this book are dedicated to developing mathematical maturity via deep dives into linear algebra and probability, which are deeply embedded in the field of deep learning. The next several chapters discuss the structure of feed-forward neu‐ ral networks, how to implement them in code, and how to train and evaluate them on real-world datasets. The rest of the book is dedicated to specific applications of deep learning and understanding the intuition behind the specialized learning techniques and neural network architectures developed for those applications. Although we cover advanced research in these latter sections, we hope to provide a breakdown of these techniques that is derived from first principles and digestible. Conventions Used in This Book The following typographical conventions are used in this book: Italic Indicates new terms, URLs, email addresses, filenames, and file extensions. Constant width Used for program listings, as well as within paragraphs to refer to program elements such as variable or function names, databases, data types, environment variables, statements, and keywords. This element signifies a general note. This element indicates a warning or caution. Using Code Examples Supplemental material (code examples, exercises, etc.) is available for download at https://github.com/darksigma/Fundamentals-of-Deep-Learning-Book. If you have a technical question or a problem using the code examples, please email bookquestions@oreilly.com. x | Preface
This book is here to help you get your job done. In general, if example code is offered with this book, you may use it in your programs and documentation. You do not need to contact us for permission unless you’re reproducing a significant portion of the code. For example, writing a program that uses several chunks of code from this book does not require permission. Selling or distributing examples from O’Reilly books does require permission. Answering a question by citing this book and quoting example code does not require permission. Incorporating a significant amount of example code from this book into your product’s documentation does require permission. We appreciate, but do not require, attribution. An attribution usually includes the title, author, publisher, and ISBN. For example: “Fundamentals of Deep Learning by Nithin Buduma, Nikhil Buduma, and Joe Papa (O’Reilly). Copyright 2022 Nithin Buduma and Mobile Insights Technology Group, LLC, 978-1-492-08218-7.” If you feel your use of code examples falls outside fair use or the permission given above, feel free to contact us at permissions@oreilly.com. O’Reilly Online Learning For more than 40 years, O’Reilly Media has provided technol‐ ogy and business training, knowledge, and insight to help companies succeed. Our unique network of experts and innovators share their knowledge and expertise through books, articles, and our online learning platform. O’Reilly’s online learning platform gives you on-demand access to live training courses, in-depth learning paths, interactive coding environments, and a vast collection of text and video from O’Reilly and 200+ other publishers. For more information, visit https://oreilly.com. How to Contact Us Please address comments and questions concerning this book to the publisher: O’Reilly Media, Inc. 1005 Gravenstein Highway North Sebastopol, CA 95472 800-998-9938 (in the United States or Canada) 707-829-0515 (international or local) 707-829-0104 (fax) Preface | xi
We have a web page for this book, where we list errata, examples, and any addi‐ tional information. You can access this page at https://oreil.ly/fundamentals-of-deep- learning-2e. Email bookquestions@oreilly.com to comment or ask technical questions about this book. For news and information about our books and courses, visit https://oreilly.com. Find us on LinkedIn: https://www.linkedin.com/company/oreilly-media. Follow us on Twitter: https://twitter.com/oreillymedia. Watch us on YouTube: https://www.youtube.com/oreillymedia. Acknowledgements We’d like to thank several people who have been instrumental in the completion of this text. We’d like to start by acknowledging Mostafa Samir and Surya Bhupatiraju, who contributed heavily to the content of Chapters 7 and 8. We also appreciate the contributions of Mohamed (Hassan) Kane and Anish Athalye, who worked on early versions of the code examples in this book’s GitHub repository. Nithin and Nikhil This book would not have been possible without the never-ending support and expertise of our editor, Shannon Cutt. We’d also like to appreciate the commentary provided by our reviewers, Isaac Hodes, David Andrzejewski, Aaron Schumacher, Vishwesh Ravi Shrimali, Manjeet Dahiya, Ankur Patel, and Suneeta Mall, who pro‐ vided thoughtful, in-depth, and technical commentary on the original drafts of the text. Finally, we are thankful for all of the insight provided by our friends and family members, including Jeff Dean, Venkat Buduma, William, and Jack, as we finalized the manuscript of the text. Joe Updating the code for this book with PyTorch has been an enjoyable and exciting experience. No endeavor like this can be achieved by one person alone. First, I would like to thank the PyTorch community and its 2,100+ contributors for continuing to grow and improve PyTorch and its deep learning capabilities. It is because of you that we can demonstrate the concepts described in this book. I am forever grateful to Rebecca Novack for bringing me into this project and for her confidence in me as an author. Many thanks to Melissa Potter and the O’Reilly production staff in making this updated version come to life. xii | Preface
For his encouragement and support, I’d like to thank Matt Kirk. He’s been my rock through it all. Thank you for our countless chats full of ideas and resources. Special thanks to my kids, Savannah, Caroline, George, and Forrest, for being patient and understanding when Daddy had to work. And, most of all, thank you to my wife, Emily, who has always supported my dreams throughout life. While I diligently wrote code, she cared for our newborn through sleepless nights while ensuring the “big” kids had their needs met too. Without her, my contributions to this project would not be possible. Preface | xiii
(This page has no text content)
CHAPTER 1 Fundamentals of Linear Algebra for Deep Learning In this chapter, we cover important prerequisite knowledge that will motivate our discussion of deep learning techniques in the main text and the optional sidebars at the end of select chapters. Deep learning has recently experienced a renaissance, both in academic research and in the industry. It has pushed the limits of machine learning by leaps and bounds, revolutionizing fields such as computer vision and natural language processing. However, it is important to remember that deep learning is, at its core, a culmination of achievements in fields such as calculus, linear algebra, and probability. Although there are deeper connections to other fields of mathematics, we focus on the three listed here to help us broaden our perspective before diving into deep learning. These fields are key to unlocking both the big picture of deep learning and the intricate subtleties that make it as exciting as it is. In this first chapter on background, we cover the fundamentals of linear algebra. Data Structures and Operations The most important data structure in linear algebra (whenever we reference linear algebra in this text, we refer to its applied variety) is arguably the matrix, a 2D array of numbers where each entry can be indexed via its row and column. Think of an Excel spreadsheet, where you have offers from Company X and Company Y as two rows, and the columns represent some characteristic of each offer, such as starting salary, bonus, or position, as shown in Table 1-1. 1
Table 1-1. Excel spreadsheet Company X Company Y Salary $50,000 $40,000 Bonus $5,000 $7,500 Position Engineer Data Scientist The table format is especially suited to keep track of such data, where you can index by row and column to find, for example, Company X’s starting position. Matrices, similarly, are a multipurpose tool to hold all kinds of data, where the data we work in this book is of numerical form. In deep learning, matrices are often used to represent both datasets and weights in a neural network. A dataset, for example, has many individual data points with any number of associated features. A lizard dataset might contain information on length, weight, speed, age, and other important attributes. We can represent this intuitively as a matrix or table, where each row represents an individual lizard, and each column represents a lizard feature, such as age. However, as opposed to Table 1-1, the matrix stores only the numbers and assumes that the user has kept track of which rows correspond to which data points, which columns correspond to which feature, and what the units are for each feature, as you can see in Figure 1-1. Figure 1-1. A comparison of tables and matrices On the right side, we have a matrix, where it’s assumed, for example, that the age of each lizard is in years, and Komodo Ken weighs a whopping 50 kilograms! But why even work with matrices when tables clearly give the user more information? Well, in linear algebra and even deep learning, operations such as multiplication and addition are done on the tabular data itself, but such operations can only be computed efficiently when the data is in solely numerical format. Much of the work in linear algebra centers on the emergent properties of matrices, which are especially interesting when the matrix has certain base attributes, and operations on these data structures. Vectors, which can be seen as a subset type of matrices, are a 1D array of numbers. This data structure can be used to represent an individual data point or the weights in a linear regression, for example. We cover properties of matrices and vectors as well as operations on both. 2 | Chapter 1: Fundamentals of Linear Algebra for Deep Learning
Matrix Operations Matrices can be added, subtracted, and multiplied—there is no division of matrices, but there exists a similar concept called inversion. When indexing a matrix, we use a tuple, where the first index represents the row number and the second index represents the column number. To add two matrices A and B, one loops through each index (i,j) of the two matrices, sums the two entries at the current index, and places that result in the same index (i,j) of a new matrix C, as can be seen in Figure 1-2. Figure 1-2. Matrix addition This algorithm implies that we can’t add two matrices of different shapes, since indices that exist in one matrix wouldn’t exist in the other. It also implies that the final matrix C is of the same shape as A and B. In addition to adding matrices, we can multiply a matrix by a scalar. This involves simply taking the scalar and multiplying each of the entries of the matrix by it (the shape of the resultant matrix stays constant), as depicted in Figure 1-3. Figure 1-3. Scalar-matrix multiplication These two operations, addition of matrices and scalar-matrix multiplication, lead us directly to matrix subtraction, since computing A – B is the same as computing the matrix addition A + (–B), and computing –B is the product of a scalar –1 and the matrix B. Multiplying two matrices starts to get interesting. For reasons beyond the scope of this text (motivations in a more theoretical flavor of linear algebra where matrices represent linear transformations), we define the matrix product A · B as: Equation 1-1. Matrix multiplication formula A · B i, j = ∑k′ = 1 k Ai, k′Bk′, j Data Structures and Operations | 3
In simpler terms, this means that the value at the index (i,j) of A · B is the sum of the product of the entries in the ith row of A with those of the jth column of B. Figure 1-4 is an example of matrix multiplication. Figure 1-4. Matrix multiplication It follows that the rows of A and the columns of B must have the same length, so two matrices can be multiplied only if the shapes align. We use the term dimension to formally represent what we have referred to so far as shape: i.e., A is of dimen‐ sion m by k, meaning it has m rows and k columns, and B is of dimension k by n. If this weren’t the case, the formula for matrix multiplication would give us an indexing error. The dimension of the product is m by n, signifying an entry for every pair of rows in A and columns in B. This is the computational way of thinking about matrix multiplication, and it doesn’t lend itself well to theoretical interpretation. We’ll call Equation 1-1 the dot product interpretation of matrix multiplication, which will make more sense after reading “Vector Operations” on page 6. Note that matrix multiplication is not commutative, i.e., A · B ≠ B · A. Of course, if we were to take a matrix A that is 2 by 3 and a matrix B that is 3 by 5, for example, by the rules of matrix multiplication, B · A doesn’t exist. However, even if the product were defined due to both matrices being square, where square means that the matrix has an equal number of rows and columns, the two products will not be the same (this is an exercise for you to explore on your own). However, matrix multiplication is associative, i.e., A · B + C = A · B + A · C . Let’s delve into matrix multiplication a bit further. After some algebraic manipulation, we can see that another way to formulate matrix multiplication is: A · B · , j = A · B · , j This states that the jth column of the product A · B is the matrix product of A and the jth column of B, a vector. We’ll call this column vector interpretation of matrix multiplication, as can be seen in Figure 1-5. 4 | Chapter 1: Fundamentals of Linear Algebra for Deep Learning