Deep Learning for Natural Language Processing (Stephan Raaijmakers) (Z-Library)

M A N N I N G Stephan Raaijmakers

Deep Learning for Natural Language Processing

(This page has no text content)

Deep Learning for Natural Language Processing STEPHAN RAAIJMAKERS M A N N I N G SHELTER ISLAND

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2022 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. The author and publisher have made every effort to ensure that the information in this book was correct at press time. The author and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Manning Publications Co. Development editor: Dustin Archibald 20 Baldwin Road Technical development editors: Michiel Trimpe and Al Krinker PO Box 761 Review editor: Ivan Martinović Shelter Island, NY 11964 Production editor: Keri Hales Copy editor: Tiffany Taylor Proofreader: Katie Tennant Technical proofreader: Mayur Patil Typesetter and cover designer: Marija Tudor ISBN 9781617295447 Printed in the United States of America

brief contents PART 1 INTRODUCTION .............................................................. 1 1 ■ Deep learning for NLP 3 2 ■ Deep learning and language: The basics 31 3 ■ Text embeddings 52 PART 2 DEEP NLP .................................................................... 87 4 ■ Textual similarity 89 5 ■ Sequential NLP 112 6 ■ Episodic memory for NLP 140 PART 3 ADVANCED TOPICS ...................................................... 161 7 ■ Attention 163 8 ■ Multitask learning 190 9 ■ Transformers 219 10 ■ Applications of Transformers: Hands-on with BERT 243v

contents preface x acknowledgments xii about this book xiii about the author xvi about the cover illustration xvii PART 1 INTRODUCTION ............................................... 1 1 Deep learning for NLP 3 1.1 A selection of machine learning methods for NLP 4 The perceptron 6 ■ Support vector machines 9 Memory-based learning 12 1.2 Deep learning 13 1.3 Vector representations of language 21 Representational vectors 22 ■ Operational vectors 25 1.4 Vector sanitization 28 The hashing trick 28 ■ Vector normalization 29 2 Deep learning and language: The basics 31 2.1 Basic architectures of deep learning 31 Deep multilayer perceptrons 32 ■ Two basic operators: Spatial and temporal 35 2.2 Deep learning and NLP: A new paradigm 50vi

CONTENTS vii3 Text embeddings 52 3.1 Embeddings 52 Embedding by direct computation: Representational embeddings 53 ■ Learning to embed: Procedural embeddings 55 3.2 From words to vectors: Word2Vec 64 3.3 From documents to vectors: Doc2Vec 76 PART 2 DEEP NLP ..................................................... 87 4 Textual similarity 89 4.1 The problem 90 4.2 The data 90 Authorship attribution and verification data 91 4.3 Data representation 92 Segmenting documents 93 ■ Word-level information 94 Subword-level information 98 4.4 Models for measuring similarity 100 Authorship attribution 101 ■ Verifying authorship 106 5 Sequential NLP 112 5.1 Memory and language 113 The problem: Question Answering 113 5.2 Data and data processing 114 5.3 Question Answering with sequential models 120 RNNs for Question Answering 121 ■ LSTMs for Question Answering 127 ■ End-to-end memory networks for Question Answering 132 6 Episodic memory for NLP 140 6.1 Memory networks for sequential NLP 140 6.2 Data and data processing 143 PP-attachment data 144 ■ Dutch diminutive data 145 Spanish part-of-speech data 147 6.3 Strongly supervised memory networks: Experiments and results 149

CONTENTSviiiPP-attachment 149 ■ Dutch diminutives 150 ■ Spanish part-of-speech tagging 150 6.4 Semi-supervised memory networks 151 Semi-supervised memory networks: Experiments and results 157 PART 3 ADVANCED TOPICS ....................................... 161 7 Attention 163 7.1 Neural attention 163 7.2 Data 167 7.3 Static attention: MLP 168 7.4 Temporal attention: LSTM 174 7.5 Experiments 183 MLP 184 ■ LSTM 187 8 Multitask learning 190 8.1 Introduction to multitask learning 190 8.2 Multitask learning 192 8.3 Multitask learning for consumer reviews: Yelp and Amazon 193 Data handling 194 ■ Hard parameter sharing 197 Soft parameter sharing 199 ■ Mixed parameter sharing 201 8.4 Multitask learning for Reuters topic classification 202 Data handling 203 ■ Hard parameter sharing 206 Soft parameter sharing 207 ■ Mixed parameter sharing 208 8.5 Multitask learning for part-of-speech tagging and named- entity recognition 209 Data handling 210 ■ Hard parameter sharing 214 Soft parameter sharing 215 ■ Mixed parameter sharing 216 9 Transformers 219 9.1 BERT up close: Transformers 220 9.2 Transformer encoders 223 Positional encoding 226 9.3 Transformer decoders 231 9.4 BERT: Masked language modeling 234 Training BERT 235 ■ Fine-tuning BERT 238 ■ Beyond BERT 239

CONTENTS ix10 Applications of Transformers: Hands-on with BERT 243 10.1 Introduction: Working with BERT in practice 244 10.2 A BERT layer 245 10.3 Training BERT on your data 248 10.4 Fine-tuning BERT 255 10.5 Inspecting BERT 258 Homonyms in BERT 259 10.6 Applying BERT 262 bibliography 265 index 269

preface Computers have been trying hard to make sense of language in recent decades. Sup- ported by disciplines like linguistics, computer science, statistics, and machine learn- ing, the field of computational linguistics or natural language processing (NLP) has come into full bloom, supported by numerous scientific journals, conferences, and active industry participation. Big tech companies like Google, Facebook, IBM, and Microsoft appear to have prioritized their efforts in natural language analysis and understand- ing, and progressively offer datasets and helpful open source software for the natural language processing community. Currently, deep learning is increasingly dominating the NLP field. To someone who is eager to join this exciting field, the high pace at which new developments take place in the deep learning–oriented NLP community may seem daunting. There seems to be a large gap between descriptive, statistical, and more tra- ditional machine learning approaches to NLP on the one hand, and the highly tech- nical, procedural approach of deep learning neural networks on the other hand. This book aims for bridging this gap a bit, through a gentle introduction to deep learning for NLP. It targets students, linguists, computer scientists, practitioners, and all other people interested in artificial intelligence. Let’s refer to these groups of people as NLP engineers. When I was a student, lacking a systematic computational linguistics pro- gram in those days, I pretty much pieced together a personal—and necessarily incom- plete—NLP curriculum. It was a tough job. My motivation for writing this book has been to make this journey a bit easier for aspiring NLP engineers, and to give you a head start by introducing you to the fundamentals of deep learning–based NLP. I sincerely believe that to become an NLP engineer with the ambition to produce innovative solutions, you need to possess advanced software development andx

PREFACE ximachine learning skills. You need to fiddle with algorithms and come up with new variants yourself. Much like the 17th-century Dutch scientist Antonie van Leeuwen- hoek, who designed and produced his own microscopes for experimentation, the modern-day NLP engineer creates their own digital instruments for studying and ana- lyzing language. Whenever an NLP engineer succeeds in building a model of natural language that “adheres to the facts,” that is, is observationally adequate, not only indus- trial (that is, practical) but also scientific progress has been made. I invite you to adopt this mindset, to continuously keep taking a good look at how humans process language, and to contribute to the wonderful field of NLP, where, in spite of algorith- mic progress, so many topics are still open!

acknowledgments I wish to thank my employer, TNO (The Netherlands Organisation for Applied Scien- tific Research) for supporting the realization of this book. My thanks go to students from the faculties of Humanities and Science from Leiden University and assorted readers of the book for your feedback on the various MEAP versions, including cor- recting typos and other errors. I would also like to thank the Manning staff—in partic- ular, development editor Dustin Archibald, production editor Keri Hales, and proofreader Katie Tennant, for their enduring support, encouragement and, above all, patience. At my request, Manning transfers all author fees to UNICEF. Through your pur- chase of this book, you contribute to a better future for children in need, and that need is even more acute in 2022. “UNICEF is committed to ensuring special protec- tion for the most disadvantaged children—victims of war, disasters, extreme poverty, all forms of violence and exploitation, and those with disabilities” (www.unicef.org/ about-us/mission-statement). Many thanks for your help. To all the reviewers: Alejandro Alcalde Barros, Amlan Chatterjee, Chetan Mehra, Deborah Mesquita, Eremey Vladimirovich Valetov, Erik Sapper, Giuliano Araujo Ber- toti, Grzegorz Mika, Harald Kuhn, Jagan Mohan, Jorge Ezequiel Bo, Kelum Senanay- ake, Ken W. Alger, Kim Falk Jørgensen, Manish Jain, Mike F. Cuddy, Mortaza Doulaty, Ninoslav Čerkez, Philippe Van Bergen, Prabhuti Prakash, Ritwik Dubey, Rohit Agar- wal, Shashank Polasa Venkata, Sowmya Vajjala, Thomas Peklak, Vamsi Sistla, and Vlad Navitski, thank you—your suggestions helped make this a better book. xii

about this book This book will give you a thorough introduction to deep learning applied to a variety of language analysis tasks, supported by actual hands-on code. Explicitly linking the evergreens of computational linguistics (such as part-of-speech tagging, textual simi- larity, topic labeling, and Question Answering) to deep learning will help you become a proficient deep learning, natural language processing (NLP) expert. Beyond this, the book covers state-of-the-art approaches to challenging new problems. Who should read this book The intended audience for this book is anyone working in NLP: computational lin- guists, software engineers, and students. The field of machine learning–based NLP is vast and comprises a daunting number of formalisms and approaches. With deep learning entering the stage, many are eager to get their feet wet but may shy away from the highly technical nature of deep learning and the fast pace of this field—new approaches, software, and papers emerge on a daily basis. This book will bring you up to speed. This book is not for those who wish to become proficient in deep learning in a general manner, readers in need of an introduction to NLP, or anyone desiring to master Keras, the deep learning Python library we use. Manning offers two books that fill these gaps and can be read as companions to this book: Natural Language Processing in Action (Hobson Lane, Cole Howard, and Hannes Hapke, 2019; www.manning .com/books/natural-language-processing-in-action) and Deep Learning with Python (François Chollet, 2021: www.manning.com/books/deep-learning-with-python -second-edition). If you want a quick and thorough introduction to Keras, visit https://keras.io/getting_started/intro_to_keras_for_engineers.xiii

ABOUT THIS BOOKxivHow this book is organized: A road map Part 1, consisting of chapters 1, 2, and 3, introduces the history of deep learning, the basic architectures of deep learning for NLP and their implementation in Keras, and how to represent text for deep learning using embeddings and popular embedding strategies. Part 2, consisting of chapters 4, 5, and 6, focuses on assessing textual similarity with deep learning, processing long sequences with memory-equipped models for Ques- tion Answering, and then applying such memory models to other NLP. Part 3, consisting of chapters 7, 8, 9, and 10, starts by introducing neural attention, then moves on to the concept of multitask learning, using Transformers, and finally getting hands-on with BERT and inspecting the embeddings it produces. About the code The code we develop in this book is somewhat generic. Keras is a dynamic library, and while I was writing the book, some things changed, including the now-exclusive dependency of Keras on TensorFlow as a backend (a Keras backend is low-level code for performing efficient neural network computations). The changes are limited, but occasionally you may need to adapt the syntax of your code if you're using the latest Keras version (version 2.0 and above). In the book, we draw pragmatic inspiration from public domain, open source code and reuse code snippets that are handy. Specific sources include the following:  The Keras source code base, which contains many examples addressing NLP  The code accompanying the companion book Deep Learning with Python  Popular and excellent open source websites like https://adventuresinmachine- learning.com and https://machinelearningmastery.com  Blogs like http://karpathy.github.io  Coder communities like Stack Overflow The emphasis of the book is more on outlining algorithms and code and less on achieving academic state-of-the-art results. However, starting from the basic solutions and approaches outlined throughout the book, and backed up by the many practical code examples, you will be empowered to reach better results. This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In some cases, even this was not enough, and listings include line-continuation markers (➥). Code annotations accompany many of the listings, highlighting import- ant concepts. You can get executable snippets of code from the liveBook (online) version of this book at https://livebook.manning.com/book/deep-learning-for-natural-language-

ABOUT THIS BOOK xvprocessing. The complete code for the examples in the book is available for download from the Manning website at https://www.manning.com/books/deep-learning-for -natural-language-processing, and from GitHub at https://github.com/stephanraaij makers/deeplearningfornlp. liveBook discussion forum Purchase of Deep Learning for Natural Language Processing includes free access to live- Book, Manning’s online reading platform. Using liveBook’s exclusive discussion fea- tures, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the author and other users. To access the forum, go to https://livebook.manning.com/book/deep-learning-for-natural-language-process- ing/discussion. You can also learn more about Manning's forums and the rules of con- duct at https://livebook.manning.com/discussion. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We sug- gest you try asking him some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

about the author STEPHAN RAAIJMAKERS received his education as a computational linguist at Leiden University, the Netherlands. He obtained his PhD on machine learning–based NLP from Tilburg University. He has been working since 2000 at TNO, The Netherlands Organisation for Applied Scientific Research, an independent organization founded by law in 1932, aimed at enabling business and government to apply scientific know- ledge, contributing to industrial innovation and societal welfare. Within TNO, he has worked on many machine learning–intensive projects dealing with language. Stephan is also a professor of communicative AI at Leiden University (LUCL, Leiden University Centre for Linguistics). His chair focuses on deep learning–based approaches to human-machine dialogue.xvi

about the cover illustration The figure on the cover of Deep Learning for Natural Language Processing, titled “Paisan de dalecarlie,” or “Peasant, Dalecarlia,” is from an image held by the New York Public Library in the Miriam and Ira D. Wallach Division of Art, Prints and Photographs: Pic- ture Collection. Each illustration is finely drawn and colored by hand. In those days, it was easy to identify where people lived and what their trade or sta- tion in life was just by their dress. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional cul- ture centuries ago, brought back to life by pictures from collections such as this one. xvii

ABOUT THE COVER ILLUSTRATIONxviii

Part 1 Introduction Part 1 introduces the history of deep learning, relating it to other forms of machine learning–based natural language processing (NLP; chapter 1). Chapter 2 discusses the basic architectures of deep learning for NLP and their implemen- tation in Keras. Chapter 3 discusses how to represent text for deep learning using embeddings and focuses on Word2Vec and Doc2Vec, two popular embed- ding strategies.

Statistics

Uploader

Deep Learning for Natural Language Processing (Stephan Raaijmakers) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Statistics

Uploader

Deep Learning for Natural Language Processing (Stephan Raaijmakers) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment