Deep Learning in Science (Pierre Baldi) (Z-Library)

Author: Pierre Baldi

商业

This is the first rigorous, self-contained treatment of the theory of deep learning. Starting with the foundations of the theory and building it up, this is essential reading for any scientists, instructors, and students interested in artificial intelligence and deep learning. It provides guidance on how to think about scientific questions, and leads readers through the history of the field and its fundamental connections to neuroscience. The author discusses many applications to beautiful problems in the natural sciences, in physics, chemistry, and biomedicine. Examples include the search for exotic particles and dark matter in experimental physics, the prediction of molecular properties and reaction outcomes in chemistry, and the prediction of protein structures and the diagnostic analysis of biomedical images in the natural sciences. The text is accompanied by a full set of exercises at different difficulty levels and encourages out-of-the-box thinking.

📄 File Format: PDF
💾 File Size: 9.2 MB
53
Views
0
Downloads
0.00
Total Donations

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

📄 Page 1
(This page has no text content)
📄 Page 2
Deep Learning in Science This is the first rigorous, self-contained treatment of the theory of deep learning. Starting with the foundations of the theory and building it up, this is essential reading for any scientists, instructors, and students interested in artificial intelligence and deep learning. It provides guidance on how to think about scientific questions, and leads readers through the history of the field and its fundamental connections to neuro- science. The author discusses many applications to beautiful problems in the natural sciences, in physics, chemistry, and biomedicine. Examples include the search for exotic particles and dark matter in experimental physics, the prediction of molecular properties and reaction outcomes in chemistry, and the prediction of protein structures and the diagnostic analysis of biomedical images in the natural sciences. The text is accompanied by a full set of exercises at different difficulty levels and encourages out-of-the-box thinking. Pierre Baldi is Distinguished Professor of Computer Science at University of California, Irvine. His main research interest is understanding intelligence in brains and machines. He has made seminal contributions to the theory of deep learning and its applications to the natural sciences, and has written four other books.
📄 Page 3
(This page has no text content)
📄 Page 4
Deep Learning in Science P IERRE BALD I University of California, Irvine
📄 Page 5
University Printing House, Cambridge CB2 8BS, United Kingdom One Liberty Plaza, 20th Floor, New York, NY 10006, USA 477 Williamstown Road, Port Melbourne, VIC 3207, Australia 314–321, 3rd Floor, Plot 3, Splendor Forum, Jasola District Centre, New Delhi – 110025, India 79 Anson Road, #06–04/06, Singapore 079906 Cambridge University Press is part of the University of Cambridge. It furthers the University’s mission by disseminating knowledge in the pursuit of education, learning, and research at the highest international levels of excellence. www.cambridge.org Information on this title: www.cambridge.org/9781108845359 DOI: 10.1017/9781108955652 © Pierre Baldi 2021 This publication is in copyright. Subject to statutory exception and to the provisions of relevant collective licensing agreements, no reproduction of any part may take place without the written permission of Cambridge University Press. First published 2021 Printed in Singapore by Markono Print Media Pte Ltd A catalogue record for this publication is available from the British Library. ISBN 978-1-108-84535-9 Hardback Cambridge University Press has no responsibility for the persistence or accuracy of URLs for external or third-party internet websites referred to in this publication and does not guarantee that any content on such websites is, or will remain, accurate or appropriate.
📄 Page 6
To Cristina, Julia Melody, and Marco Jazz
📄 Page 7
(This page has no text content)
📄 Page 8
Contents Preface page xi 1 Introduction 1 1.1 Carbon-Based and Silicon-Based Computing 1 1.2 Early Beginnings Until the Late 1940s 4 1.3 From 1950 to 1980 10 1.4 From 1980 to Today 12 1.5 Roadmap 14 1.6 Exercises 15 2 Basic Concepts 16 2.1 Synapses 16 2.2 Units or Neurons 17 2.3 Activations 17 2.4 Transfer Functions 18 2.5 Discrete versus Continuous Time 22 2.6 Networks and Architectures 23 2.7 Functional and Cardinal Capacity of Architectures 26 2.8 The Bayesian Statistical Framework 28 2.9 Information Theory 31 2.10 Data and Learning Settings 33 2.11 Learning Rules 35 2.12 Computational Complexity Theory 36 2.13 Exercises 37 3 Shallow Networks and Shallow Learning 41 3.1 Supervised Shallow Networks and their Design 41 3.2 Capacity of Shallow Networks 46 3.3 Shallow Learning 52 3.4 Extensions of Shallow Learning 56 3.5 Exercises 58
📄 Page 9
viii Contents 4 Two-Layer Networks and Universal Approximation 63 4.1 Functional Capacity 63 4.2 Universal Approximation Properties 65 4.3 The Capacity of A(n,m, 1) Architectures 68 4.4 Exercises 69 5 Autoencoders 71 5.1 A General Autoencoder Framework 72 5.2 General Autoencoder Properties 73 5.3 Linear Autoencoders 75 5.4 Non-Linear Autoencoders: Unrestricted Boolean Case 84 5.5 Other Autoencoders and Autoencoder Properties 90 5.6 Exercises 96 6 Deep Networks and Backpropagation 99 6.1 Why Deep? 99 6.2 Functional Capacity: Deep Linear Case 101 6.3 Functional Capacity: Deep Unrestricted Boolean Case 102 6.4 Cardinal Capacity: Deep Feedforward Architectures 103 6.5 Other Notions of Capacity 104 6.6 Learning by Backpropagation 105 6.7 The Optimality of Backpropagation 109 6.8 Architecture Design 111 6.9 Practical Training Issues 114 6.10 The Bias–Variance Decomposition 118 6.11 Dropout 119 6.12 Model Compression/Distillation and Dark Knowledge 124 6.13 Multiplicative Interactions: Gating and Attention 125 6.14 Unsupervised Learning and Generative Models 127 6.15 Exercises 131 7 The Local Learning Principle 137 7.1 Virtualization and Learning in the Machine 137 7.2 The Neuronal View 138 7.3 The Synaptic View: the Local Learning Principle 139 7.4 Stratification of Learning Rules 141 7.5 Deep Local Learning and its Fundamental Limitations 142 7.6 Local Deep Learning: the Deep Learning Channel 144 7.7 Local Deep Learning and Deep Targets Equivalence 147 7.8 Exercises 149 8 The Deep Learning Channel 151 8.1 Random Backpropagation (RBP) and its Variations 152 8.2 Simulations of Random Backpropagation 154
📄 Page 10
Contents ix 8.3 Understanding Random Backpropagation 155 8.4 Mathematical Analysis of Random Backpropagation 157 8.5 Further Remarks About Learning Channels 162 8.6 Circular Autoencoders 164 8.7 Recirculation: Locality in Both Space and Time 165 8.8 Simulations of Recirculation 167 8.9 Recirculation is Random Backpropagation 168 8.10 Mathematical Analysis of Recirculation 170 8.11 Exercises 173 9 Recurrent Networks 177 9.1 Recurrent Networks 177 9.2 Cardinal Capacity of Recurrent Networks 178 9.3 Symmetric Connections: The Hopfield Model 179 9.4 Symmetric Connections: Boltzmann Machines 182 9.5 Exercises 185 10 Recursive Networks 189 10.1 Variable-Size Structured Data 189 10.2 Recursive Networks and Design 190 10.3 Relationships between Inner and Outer Approaches 199 10.4 Exercises 201 11 Applications in Physics 204 11.1 Deep Learning in the Physical Sciences 204 11.2 Antimatter Physics 208 11.3 High Energy Collider Physics 214 11.4 Neutrino Physics 224 11.5 Dark Matter Physics 228 11.6 Cosmology and Astrophysics 230 11.7 Climate Physics 233 11.8 Incorporating Physics Knowledge and Constraints 235 11.9 Conclusion: Theoretical Physics 237 12 Applications in Chemistry 239 12.1 Chemical Data and Chemical Space 240 12.2 Prediction of Small Molecule Properties 242 12.3 Prediction of Chemical Reactions 245 13 Applications in Biology and Medicine 257 13.1 Biomedical Data 257 13.2 Life in a Nutshell 258 13.3 Deep Learning in Proteomics 261 13.4 Deep Learning in Genomics and Transcriptomics 268
📄 Page 11
x Contents 13.5 Deep Learning in Biomedical Imaging 270 13.6 Deep Learning in Health Care 273 14 Conclusion 275 14.1 Explainability and the Black-Box Question 275 14.2 ANNs versus BNNs 277 Appendix A Reinforcement Learning and Deep Reinforcement Learning 282 A.1 Brief History and Background 282 A.2 Main Algorithmic Approaches 287 A.3 Limitations and Open Problems 298 A.4 Other Directions of Research 302 A.5 Deep Reinforcement Learning 303 A.6 Exercises 306 Appendix B Hints and Remarks for Selected Exercises 308 References 313 Index 365
📄 Page 12
Preface By and large, this book grew out of research conducted in my group as well as classes and lectures given at the University of California, Irvine (UCI) and elsewhere over the years. It can be used as a textbook for an undergraduate or graduate course in machine learn- ing, or as an introduction to the topic for scientists from other fields. Basic prerequisites for understanding the material include college-level algebra, calculus, and probability. Familiarity with information theory, statistics, coding theory, and computational com- plexity at an elementary level are also helpful. I have striven to focus primarily on fundamental principles and provide a treatment that is both self-contained and rigorous, sometimes referring to the literature for well-known technical results, or to the exercises, which are an integral part of the book. In writing this book, one of my goals has been to provide a rigorous treatment from first principles, as much as possible, in a still rapidly evolving field. This is one of the meanings of “in science” in the title. In this regard, the flow of the book is dictated primarily by complexity issues, going from shallow networks in their different forms, to deep feedforward networks, to recurrent and recursive networks. Two-layer networks, of which autoencoders are the prototypical example, provide the hinge between shallow and deep learning. For each kind of network, it is useful to consider special “hardware” cases, such as networks of linear units. Contrary to widespread belief, the linear case is often interesting and far from trivial. But this is not the only case where using a particular hardware model is helpful. Another example is the use of unrestricted Boolean units, another model that may seem trivial at first sight, but which leads to useful insights for both autoencoders and deep architectures. Yet another important example is provided by networks of linear or polynomial threshold gates. A second characteristic of this book is its connection to biology. Neural networks, deep learning, and the entire field of AI are deeply rooted in biology, in trying to understand how the brain works and the space of possible strategies to replicate and surpass its capabilities. This is evident in Turing’s foundational work on Turing machines, guided by the fundamental intuition of a brain capable of having only a finite number of states [736] and in the vocabulary of computer science, which is full of words clearly rooted in biology such as AI, machine learning, memory, computer vision, computer virus, genetic algorithms, and so forth. It is regrettable to see young students and practitioners of machine learning misled to believe that artificial neural networks have little to do with biology, or that machine learning is the set of techniques used to maximize engineering or business goals, such as advertising revenues for search engines. In addition, not only computers and neural networks are inspired by biology, but they are of course also being
📄 Page 13
xii Preface successfully used to analyze biological data, for instance high-throughput omic data, and through one of these surprising self-recursions onlymankind seems to have produced, the results of these bioinformatics and systems biology analyses are progressively informing our understanding of the brain, helping to reveal for instance key gene expression and protein mechanisms involved in synaptic formation and biological memory. A third characteristic of this book is precisely in the applications. The second meaning of “in science” in the title is “for science”. I have focused on applications of deep learning to the natural sciences – primarily physics, chemistry, and biology for the past three decades or so. These applications are expanding rapidly today, but were almost nonexistent in the 1980s. Plenty of textbooks and other material can be found dealing with applications of neural networks to problems in engineering and other related areas. A fourth characteristic is the emphasis placed on storage, specifically on the neural- style of information storage, in fundamental contrast to the Turing-style of information storage, ironically introduced by Turing precisely while thinking about the brain. This theme goes together with the importance of recognizing the virtualization process hidden behind most of today’s neural network applications. In most applications of neural networks today, there are no neurons and no synapses, only their digital mirage. This comes at a price that can only be understood by thinking about “learning in the machine”, as opposed to machine learning. In a physical neural system, learning rules must be local both in space and time. Among other things, this locality principle helps clarify the relationship between Hebbian learning and backpropagation and explains why Hebbian learning applied to feedforward convolutional architectures has never worked. It also naturally leads to randombackpropagation and recirculation algorithms, important topics that are poorly known because they are not particularly useful for current applications. For readers primarily interested in applications, or for courses with tight time limitations, I recommend using the abbreviated sequence of chapters: 2, 3, 6, and 10, covering most of the practical aspects. Finally, the field of neural networks has been polluted by fads and a significant amount of cronyism and collusion over the past few decades, that a fragmented, multigenera- tional, and often unaware community could do little to stop. Cronyism and collusion are nothing new in human affairs, but they have distorted and slowed down the development of the field through the subtle control and manipulation of conferences, publications, academic and corporate research departments, and other avenues of power and dissem- ination. Readers should read more widely, check what has been published – where and when – and decide for themselves which results are supported by mathematical proofs or sound simulations, and which are not. In the end, towering over human affairs, all that matters are the beauty of deep learning and the underlying mysteries it is intimately connected to: from whether silicon can be conscious to the fundamental nature of the universe. About the Exercises The exercises vary in difficulty substantially. Should you become frustrated at trying to solve one of them, remind yourself that it is only when you are struggling with a problem that your brain is really learning something.
📄 Page 14
Preface xiii In order to solve some of the problems in the book, or more broadly to think about scientific and other questions, I recommend that my students systematically try at least four different approaches. The first of course is to simplify. When a question seems too difficult at first, look for special or simpler cases. When trying to understand a theorem, look at the case of “small n”, or fix the values of certain parameters, or switch to the linear case, or try to interpolate. The second is the opposite way of thinking: generalize, abstract, or extrapolate. Are there other situations that bear some similarity to the current problem? How can a result be applied to more general cases? Can the conditions under which a theorem is true be relaxed? The third way of thinking is “to take the limit”, to look at what happens at the boundaries of a certain domain, under extreme conditions, to let n go to zero, or to infinity. And finally, the fourth way is always to invert, look at things somehow from an opposite perspective. Thus, for example, when thinking about an autoencoder, one may want first to simplify it by studying how to solve the top layer given the lower layer, which is usually an easier problem; and then to invert this approach by studying how the lower layer can be solved given the top layer, which is usually a harder problem. Of course these four principles are not a panacea to every situation and, for instance, identifying the right formof “inversion” in a given situationmaynot be obvious.However, the discipline of trying to apply these four principles in a systematic manner can be helpful and, incidentally, remains a major challenge for Artificial Intelligence (AI). Acknowledgments The number of people I am indebted to keeps growing every year, and I can only mention a few of them. • As a graduate student at Caltech (1983-1986) and visiting lecturer at UCSD (1986- 1988), I was fortunate to be able to participate and contribute to the early begin- nings of neural networks in the 1980s. Being at those two universities, which were the hotbeds of neural networks research at the time, resulted to a large extent from a series of chance encounters with several individuals, of whom I can only mention two: Brian Ekin and Gill Williamson. Brian, who I met by chance in Paris, told me to apply to Caltech, a name I had never heard before. And while bartendering for an alumni reunion in the basement of the Caltech Athenaeum, I met Gill Williamson who was a Professor at UCSD and, while still sober, offered me my first academic job. • From those early times, I wish also to acknowledge two kind mentors – Edward Posner and Walter Heiligenberg – both of whom died prematurely in tragic transportation accidents; as well as some of my early collaborators and friends including: Amir Atiya, Eric Baum, Joachim Buhmann, Yves Chauvin, Paolo Frasconi, Kurt Hornik, Ron Meir, Fernando Pineda, Yosi Rinott, and Santosh Venkatesh. • From more recent times, I wish to acknowledge past and present faculty colleagues at UCI in AI and machine learning, including: Rina Dechter, Charless Fowlkes, Roy Fox, Richard Granger, Alexander Ihler, Dennis Kibler, Richard Lathrop, Stephan
📄 Page 15
xiv Preface Mandt, Eric Mjoslness, Ioannis Panageas, Michael Pazzani, Deva Ramanan, Sameer Singh, Padhraic Smyth, Erik Sudderth, Max Welling, and Xiaohui Xie. • Successful applications of deep learning to the sciences vitally require interdisci- plinary collaborations. I am deeply grateful to all my collaborators from the natural sciences. Among the current active ones at UCI, I only have space to mention: Roman Vershynin (Mathematics) and Babak Shahbaba (Statistics), Daniel White- son and Jianming Bian, together with Simona Murgia, Franklin Dollar, and Steven Barwick (Physics), Michael Pritchard (Earth Sciences), David Van Vranken and Ann Marie Carlton (Chemistry), and Paolo Sassone-Corsi, Marcelo Wood, and Amal Alachkar (Biology). As I was about to send the final draft to Cambridge University Press, Paolo unexpectedly died and I wish to honor his memory here. He was an outstanding scientist, friend, and collaborator. • Together with my external collaborators, I am also deeply grateful to current and past students and postdoctoral fellows in my research group, who have con- tributed in so many ways to this book over the years, including: Forest Agostinelli, Alessio Andronico, Chloe Azencott, Kevin Bache, Pierre-François Baisnée, Ryan Benz, Vincenzo Bonnici, Martin Brandon, Andrew Brethorst, Jocelyne Bruand, Francesco Ceccarelli, Nicholas Ceglia, Ivan Chang, Jonathan Chen, Siwei Chen, Jianlin Cheng, Davide Chicco, Julian Collado, Kenneth Daily, Ekaterina Deyneka, Pietro Di Lena, Yimeng Dou, David Fooshee, Clovis Galliez, Steven Hampson, Lars Hertel, Qian-Nan Hu, Raja Jurdak, Matt Kayala, John B. Lanier, Chris- tine Lee, Lingge Li, Erik Linstead, Junze Liu, Yadong Lu, Alessandro Lusci, Christophe Magnan, Antonio Maratea, Stephen McAleer, Ken Nagata, Francesco Napolitano, Ramzi Nasr, Jordan Ott, Vishal Patel, Gianluca Pollastri, Liva Ralaivola, Arlo Randall, Paul Rigor, Alex Sadovsky, Peter Sadowski, Muntaha Samad, Hiroto Saigo, Siyu Shao, Alexander Shmakov, Suman Sundaresh, S. Joshua Swamidass, Mike Sweredoski, Amin Tavakoli, Gregor Urban, Alessan- dro Vullo, Eric Wang, Lin Wu, Yu Liu, and Michael Zeller. • I am equally grateful to Janet Ko, who has assisted me and other faculty, for so many years with her loyalty and outstanding administrative skills, always shielding scientific research from bureaucracy. • As far as the book itself is directly concerned, two chapters and the appendix reuse material from three previously published articles [72, 634, 262] and I thank the publishers – Springer and Annual Reviews – for their permissions. I wish also to thank Annie Vogel-Cierna for providing me two of the microscopy images in the last chapter. • This book was finished during the COVID-19 pandemic. It has been a real pleasure to work with the staff of Cambridge University Press, and in particular with David Tranah. I thank them for their outstanding support and professionalism, and their deep understanding of academic publishing. • Last but not least, I am deeply grateful to my close friends and to my family.
📄 Page 16
1 Introduction La gymnastique cerebrale n’est pas susceptible d’ameliorer l’organisation du cerveau en augmentant le nombre de cellules, car, on le sait, les elements nerveux ont perdu depuis l’epoque embryonnaire la propriety de proliferer; mais on peut admettre comme une chose tres vraisemblable que l’exercice mental suscite dans les regions cerebrales plus sollicitees un plus grand developpment de l’appareil protoplasmique et du systeme des collaterales nerveuses. De la sorte, des associations deja creees entre certains groupes de cellules se renforceraient notablement au moyen de la multiplication des ramilles terminales des appendices protoplasmiques et des collaterals nerveuses; mais, en outre, des connexions intercellulaires tout a fait nouvelles pourraient s’etablir grace a la neoformation de collaterales et d’expansions protoplasmiques. [Santiago Ramón y Cajal [165]] The long-term research goal behind this book is to understand intelligence in brains and machines. Intelligence, like consciousness, is one of those words that: (1) was coined a long time ago, when our scientific knowledge of the world was still fairly primitive; (2) is not well defined, but has been and remains very useful both in everyday commu- nication and scientific research; and (3) for which seeking a precise definition today is premature, and thus not particularly productive. Thus, rather than trying to define intelligence, we may try to gain a broader perspective on intelligent systems, by asking which systems are “intelligent”, and how they came about on planet Earth. For this purpose, imagine an alien from an advanced civilization on a distant galaxy charged with reporting to her alien colleagues on the state of intelligent systems on planet Earth. How would she summarize her main findings? 1.1 Carbon-Based and Silicon-Based Computing At a fundamental level, intelligent systems must be able to both compute and store information, and thus it is likely that the alien would organize her summary along these two axes. Along the computing axis, the first main finding she would have to report is that currently there are two computing technologies that are dominant on Earth: carbon-based computing implemented in all living systems, and silicon-based computing
📄 Page 17
2 Introduction implemented in a growing number of devices ranging from sensors, to cellphones, to laptops, to computer clusters and clouds. Carbon-based computing has a 3.8 billion- year-long history, driven by evolution. In contrast, silicon-based computing is less than 100 years old, with a history driven by human (hence carbon-based) design rather than evolution. Other computing technologies, from DNA computing to quantum computing, currently play minor roles, although quantum computing can be expected to significantly expand in the coming two decades. Along the storage axis, the main finding the alien would have to report is that there are at least two different styles of storage: the digital/Turing-tape style, and the neural style which is at the center of this book (Figure 1.1). In the digital style, information is stored neatly at different discrete locations, or memory addresses, of a physical substrate. In the neural style of computing, information is stored in a messy way, through some kind of holographic process, which distributes information across a large number of synapses. Think of how you may store your telephone number in a computer as opposed to your brain. In Turingmachines, storage and processing are physically separate and information must be transferred from the storage unit to the computing unit for processing. In neural machines, storage and processing are intimately intertwined. In the digital style, storage tends to be transparent and lossless. In the neural style, storage tends to be opaque and lossy. Remarkably, carbon-based computing discovered both ways of storing information. It first discovered the digital style of storage, using chemical processes, by storing information using DNA and RNA molecules which, to a first degree of approximation, can be viewed as finite tapes containing symbols from a four-letter alphabet at each position. Indeed, biological systems store genetic information, primarily about genes and their control, at precise addresses along their DNA/RNA genome. And every cell can be viewed as a formidable computer which, among other things, continuously measures and adjusts the concentration of thousands of different molecules. It took roughly 3.3 billion years of evolution of carbon-based digital computing for it to begin to discover the neural style of information processing, by developing the first primitive nervous circuits and brains, using tiny electrical signals to communicate information between neurons. Thus, about 500 million years ago it also began to discover the neural style of information storage, distributing information across synapses. In time, this evolutionary process led to the human brain in the last million year or so, and to language in the last few hundred thousand years. It is only over the very last 100 years, using precisely these tiny electrical signals and synapses, that the human brain invented silicon-based computing which, perhaps not too surprisingly, also uses tiny electrical signals to process information. In some sense, the evolution of storage in silicon-based computing is an accelerated recapitulation of the evolution of storage in carbon-based computing. Silicon-based computing rapidly adopted the digital Turing style of storage and computing we are so familiar with. As an aside, it is, ironically, striking that the notion of tape storage was introduced by Turing precisely while thinking about modeling the brain which uses a different style of storage. Finally, in the last seven decades or so, human brains started trying to simulate on digital computers, or implement in neuromorphic chips, the neural style of
📄 Page 18
1.1 Carbon-Based and Silicon-Based Computing 3 computing using silicon-based hardware, beginning the process of building intelligent machines (Figure 1.1). While true neuromorphic computing in silicon substrate is an active area of research, it must be stressed that the overwhelming majority of neural network implementations today are produced by a process of virtualization, simulating the neural style of computing and storage on digital, silicon-based, machines. Thus, for most of these neural networks, there are no neurons or synapses, but only fantasies of these objects stored in well-organized digital memory arrays. Silicon computing is fast enough that we often forget that we are running a neural fantasy. As we shall see later in this book, thinking about this virtualization and about computing in native neural systems, rather than their digital simulations, will be key to better understand neural information processing. Figure 1.1 Evolution of computing and intelligence on planet Earth with approximate time scales. Computing on Earth can be organized along two axis: processing (carbon-based vs. silicon-based) and storage style (Turing vs. neural). Evolution began with carbon-based processing and Turing-style storage approximately 3.8B years ago. Primitive neurons and brains began emerging 500M years ago. Primate brains are a few million years old and human language is a few hundred thousand years old. Over the last 100 years or so, human brains developed silicon-based computing and computers rooted in the idea of Turing machines. AI and ANNs (artificial neural networks) have been developed over the last 70 years or so (red arrow). Most neural networks used today are virtual, in the sense that they are implemented in digital machines using the Turing style of storage. Neuromorphic chips, with mostly Turing-style but occasionally also neural-style of storage, have been in development for the past 40 years. Likewise, over the past 40 years, digital computers and artificial neural networks have been applied to biology, from molecular biology and evolution to neuroscience, to better understand carbon-based computing (arrow not shown). Today, the carbon-based and silicon-based computing technologies are vastly dif- ferent and carbon-based computing is still in many ways far more sophisticated. The differences are at all levels: physical sizes, time scales, energy requirements, and overall architectures. For instance, the human brain occupies slightly less than two liters of space and uses on the order of 20–40 W of power, roughly the equivalent of a light bulb, to effortlessly pass the Turing test of human conversation. In comparison, some of our supercomputers with their basketball-court size use three to four orders of magnitude more energy – something on the order of 100,000 W – to match, or slightly outperform,
📄 Page 19
4 Introduction humans on a single task like the game of Jeopardy or GO, while miserably failing at passing the Turing test. This huge difference in energy consumption has a lot to do with the separation of storage and computing in silicon computers, versus their intimate and inextricable intertwining in the brain. In spite of these differences, in the quest for intelligence these two computing tech- nologies have converged on two key ideas, not unlike the well-known analogy in the quest of flight, where birds and airplanes have converged on the idea of using wings. In addition to using tiny electrical signals, both carbon-based and silicon-based intelligent systems have converged on the use of learning, including evolutionary learning and life- time learning, in order to build systems that can deliver intelligent behavior and adapt to variations and changes in their environments. Thus it should not be too surprising that machine learning is today one of the key and most successful areas of artificial intelligence, and has been so for at least four decades. As we have seen, on the silicon-side humans are learning how to emulate the neural- style of storing information. As an aside, and as an exercise in inversion, one may wonder whether evolution discovered how to emulate the Turing style of storage, for a second time, in brains. There is some evidence of that in our symbolic processing in general, in the discovery of individuals with superior autobiographical memory, or hyperthymesia [441, 644], who tend to index their life by dates, and in “enfants savants” and other individuals with superior arithmetic and other related capabilities, often connected to autism spectrum disorders (e.g. [383, 268, 370]). Unfortunately, we still know too little about information storage in the brain to really address this question, which touches on some of the main challenges for AI today. 1.2 Early Beginnings Until the Late 1940s We now turn to a brief history of neural networks and deep learning. The goal here is not to be comprehensive, but simply to connect some of the most salient historical points in order to gain a useful perspective on the field. Additional pointers can be found, for instance, in [653]. Although one can trace the beginnings of artificial intelligence back to the Greek philosophers and even more ancient times, a more precise beginning that is relevant for this book can be identified by considering shallow learning as the precursor of deep learning. And shallow learning began with the discovery of linear regression in the late 1700s. 1.2.1 Linear Regression The discovery of linear regression in the late 1700s resulted from the work of Carl Friedrich Gauss (1777–1855) and Adrien-Marie Legendre (1752–1833). Many if not most of the features of machine learning and deep learning are already present in the basic linear regression framework (Figure 1.2), such as having: (1) an initial set of data points;
📄 Page 20
1.2 Early Beginnings Until the Late 1940s 5 Figure 1.2 Linear regression in two dimensions. (2) a class of possible models; (3) a problem of model fitting; (4) a problem of prediction using fitted models; (5) a problem of model comparison; and so forth. All these features are present in the deep learning framework of today, and by and large a cynic could say that most of deep learning today is akin to linear regression on steroids. However there are two fundamental points where linear regression is misleading. First, there is a closed mathematical formula for the coefficients of the “best” model, which will be reviewed in a coming chapter. In deep learning the models are more complex and there is no analytical formula; the parameters of the model must be learnt progressively through a process of optimization. Second, and especially in two or three dimensions, the linear model is easily interpretable and our visual system can see the data points and the model. In deep learning one is typically fitting non-linear surfaces in high-dimensional spaces, a process that cannot be visualized directly, and the parameters of the models tend to be more opaque. However, even in the simple case of a linear model, or a linear neuron, opacity is already present due to the neural style of storage: information about the data is stored, through a shattering process, in the coefficients of the linear model. This storage process is irreversible: one cannot retrieve the original points from the coefficients. Only some of their basic statistical properties, such as means and covariances (the sufficient statistics), are retained. Of course, in addition to using non-linear models, deep learning today is often applied in situations characterized by very large numbers of data points in high-dimensional spaces (e.g. 1 billion humans in genome/DNA space or in photo/pixel space), where traditional linear regression has rarely ventured, for obvious historical and technical reasons.
The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00
Total Amount (¥)
0
Donation Count

Login to support the author

Login Now
Back to List