Transfer Learning for Natural Language Processing (Paul Azunre) (Z-Library)

Author: Paul Azunre

科学

Transfer Learning for Natural Language Processing gets you up to speed with the relevant ML concepts before diving into the cutting-edge advances that are defining the future of NLP.Building and training deep learning models from scratch is costly, time-consuming, and requires massive amounts of data. To address this concern, cutting-edge transfer learning techniques enable you to start with pretrained models you can tweak to meet your exact needs. In Transfer Learning for Natural Language Processing, you'll go hands-on with customizing these open source resources for your own NLP architectures. Transfer Learning for Natural Language Processing gets you up to speed with the relevant ML concepts before diving into the cutting-edge advances that are defining the future of NLP. You’ll learn how to adapt existing state-of-the art models into real-world applications, including building

📄 File Format: PDF

💾 File Size: 6.8 MB

Views

Downloads

0.00

Total Donations

📖 Read Online ⬇️ Download

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

📄 Page 1

M A N N I N G Paul Azunre

📄 Page 2

Key concept coverage by chapter Chapter Key concepts introduced 1 NLP Transfer Learning 2 Generalized Linear Models 3 Decision trees, random forests, gradient-boosting machines 4 word2vec, sent2vec, fastText, multitask learning, domain adaptation 5 Fake news detection, column-type classification 6 ELMo, SIMOn 7 Transformer, GPT, chatbot 8 BERT, mBERT, NSP, fine-tuning transformers, cross-lingual transfer 9 ULMFiT, DistilBERT, knowledge distillation, discriminative fine-tuning 10 ALBERT, GLUE, sequential adaptation 11 RoBERTa, GPT-3, XLNet, LongFormer, BART, T5, XLM

📄 Page 3

Transfer Learning for Natural Language Processing PAUL AZUNRE MANN I NG SHELTER ISLAND

📄 Page 4

For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2021 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. Development editor: Susan Ethridge Technical development editor: Al Krinker Manning Publications Co. Review editor: Aleksandar Dragosavljević 20 Baldwin Road Production editor: Keri Hales PO Box 761 Copy editor: Pamela Hunt Shelter Island, NY 11964 Proofreader: Melody Dolab Technical proofreader: Ariel Gamiño Typesetter: Dennis Dalinnik Cover designer: Marija Tudor ISBN: 9781617297267 Printed in the United States of America

📄 Page 5

This book is dedicated to my wife, Diana, son, Khaya, and puppy, Lana, who shared the journey of writing it with me.

📄 Page 6

(This page has no text content)

📄 Page 7

contents preface xi acknowledgments xiii about this book xv about the author xix about the cover illustration xx PART 1 INTRODUCTION AND OVERVIEW ...........................1 1 What is transfer learning? 3 1.1 Overview of representative NLP tasks 5 1.2 Understanding NLP in the context of AI 7 Artificial intelligence (AI) 8 ■ Machine learning 8 Natural language processing (NLP) 12 1.3 A brief history of NLP advances 14 General overview 14 ■ Recent transfer learning advances 16 1.4 Transfer learning in computer vision 18 General overview 18 ■ Pretrained ImageNet models 19 Fine-tuning pretrained ImageNet models 20 1.5 Why is NLP transfer learning an exciting topic to study now? 21v

📄 Page 8

CONTENTSvi2 Getting started with baselines: Data preprocessing 24 2.1 Preprocessing email spam classification example data 27 Loading and visualizing the Enron corpus 28 ■ Loading and visualizing the fraudulent email corpus 30 ■ Converting the email text into numbers 34 2.2 Preprocessing movie sentiment classification example data 37 2.3 Generalized linear models 39 Logistic regression 40 ■ Support vector machines (SVMs) 42 3 Getting started with baselines: Benchmarking and optimization 44 3.1 Decision-tree-based models 45 Random forests (RFs) 45 ■ Gradient-boosting machines (GBMs) 46 3.2 Neural network models 50 Embeddings from Language Models (ELMo) 51 ■ Bidirectional Encoder Representations from Transformers (BERT) 56 3.3 Optimizing performance 59 Manual hyperparameter tuning 60 ■ Systematic hyperparameter tuning 61 PART 2 SHALLOW TRANSFER LEARNING AND DEEP TRANSFER LEARNING WITH RECURRENT NEURAL NETWORKS (RNNS) .........................................................65 4 Shallow transfer learning for NLP 67 4.1 Semisupervised learning with pretrained word embeddings 70 4.2 Semisupervised learning with higher-level representations 75 4.3 Multitask learning 76 Problem setup and a shallow neural single-task baseline 78 Dual-task experiment 80 4.4 Domain adaptation 81

📄 Page 9

CONTENTS vii5 Preprocessing data for recurrent neural network deep transfer learning experiments 86 5.1 Preprocessing tabular column-type classification data 89 Obtaining and visualizing tabular data 90 ■ Preprocessing tabular data 93 ■ Encoding preprocessed data as numbers 95 5.2 Preprocessing fact-checking example data 96 Special problem considerations 96 ■ Loading and visualizing fact- checking data 97 6 Deep transfer learning for NLP with recurrent neural networks 99 6.1 Semantic Inference for the Modeling of Ontologies (SIMOn) 100 General neural architecture overview 101 ■ Modeling tabular data 102 ■ Application of SIMOn to tabular column-type classification data 102 6.2 Embeddings from Language Models (ELMo) 110 ELMo bidirectional language modeling 111 ■ Application to fake news detection 112 6.3 Universal Language Model Fine-Tuning (ULMFiT) 114 Target task language model fine-tuning 115 ■ Target task classifier fine-tuning 116 PART 3 DEEP TRANSFER LEARNING WITH TRANSFORMERS AND ADAPTATION STRATEGIES ........................................119 7 Deep transfer learning for NLP with the transformer and GPT 121 7.1 The transformer 123 An introduction to the transformers library and attention visualization 126 ■ Self-attention 128 ■ Residual connections, encoder-decoder attention, and positional encoding 132 Application of pretrained encoder-decoder to translation 134 7.2 The Generative Pretrained Transformer 136 Architecture overview 137 ■ Transformers pipelines introduction and application to text generation 140 ■ Application to chatbots 141

📄 Page 10

CONTENTSviii8 Deep transfer learning for NLP with BERT and multilingual BERT 145 8.1 Bidirectional Encoder Representations from Transformers (BERT) 146 Model architecture 148 ■ Application to question answering 151 ■ Application to fill in the blanks and next- sentence prediction tasks 154 8.2 Cross-lingual learning with multilingual BERT (mBERT) 156 Brief JW300 dataset overview 157 ■ Transfer mBERT to monolingual Twi data with the pretrained tokenizer 158 mBERT and tokenizer trained from scratch on monolingual Twi data 160 9 ULMFiT and knowledge distillation adaptation strategies 162 9.1 Gradual unfreezing and discriminative fine-tuning 163 Pretrained language model fine-tuning 165 ■ Target task classifier fine-tuning 168 9.2 Knowledge distillation 170 Transfer DistilmBERT to monolingual Twi data with pretrained tokenizer 172 10 ALBERT, adapters, and multitask adaptation strategies 177 10.1 Embedding factorization and cross-layer parameter sharing 179 Fine-tuning pretrained ALBERT on MDSD book reviews 180 10.2 Multitask fine-tuning 183 General Language Understanding Dataset (GLUE) 184 Fine-tuning on a single GLUE task 186 ■ Sequential adaptation 188 10.3 Adapters 191 11 Conclusions 195 11.1 Overview of key concepts 196 11.2 Other emerging research trends 203 RoBERTa 203 ■ GPT-3 203 ■ XLNet 205 ■ BigBird 206 Longformer 206 ■ Reformer 206 ■ T5 207 ■ BART 208 XLM 209 ■ TAPAS 209

📄 Page 11

CONTENTS ix11.3 Future of transfer learning in NLP 210 11.4 Ethical and environmental considerations 212 11.5 Staying up-to-date 214 Kaggle and Zindi competitions 214 ■ arXiv 215 News and social media (Twitter) 215 11.6 Final words 216 appendix A Kaggle primer 218 appendix B Introduction to fundamental deep learning tools 228 index 237

📄 Page 12

(This page has no text content)

📄 Page 13

preface Over the past couple of years, it has become increasingly difficult to ignore the break- neck speed at which the field of natural language processing (NLP) has been pro- gressing. Over this period, you have likely been bombarded with news articles about trending NLP models such as ELMo, BERT, and more recently GPT-3. The excitement around this technology is warranted, because these models have enabled NLP applica- tions we couldn’t imagine would be practical just three years prior, such as writing pro- duction code from a mere description of it, or the automatic generation of believable poetry and blogging. A large driver behind this advance has been the focus on increasingly sophisticated transfer learning techniques for NLP models. Transfer learning is an increasingly pop- ular and exciting paradigm in NLP because it enables you to adapt or transfer the knowledge acquired from one scenario to a different scenario, such as a different lan- guage or task. It is a big step forward for the democratization of NLP and, more widely, artificial intelligence (AI), allowing knowledge to be reused in new settings at a fraction of the previously required resources. As a citizen of the West African nation of Ghana, where many budding entrepre- neurs and inventors do not have access to vast computing resources and where so many fundamental NLP problems remain to be solved, this topic is particularly per- sonal to me. This paradigm empowers engineers in such settings to build potentially life-saving NLP technologies, which would simply not be possible otherwise. I first encountered these ideas in 2017, while working on open source automatic machine learning technologies within the US Defense Advanced Research Projectsxi

📄 Page 14

PREFACExiiAgency (DARPA) ecosystem. We used transfer learning to reduce the requirement for labeled data by training NLP systems on simulated data first and then transferring the model to a small set of real labeled data. The breakthrough model ELMo emerged shortly after and inspired me to learn more about the topic and explore how I could leverage these ideas further in my software projects. Naturally, I discovered that a comprehensive practical introduction to the topic did not exist, due to the sheer novelty of these ideas and the speed at which the field is moving. When an opportunity to write a practical introduction to the topic presented itself in 2019, I didn’t think twice. You are holding in your hands the product of approximately two years of effort toward this purpose. This book will quickly bring you up to speed on key recent NLP models in the space and provide executable code you will be able to modify and reuse directly in your own projects. Although it would be impossible to cover every single architecture and use case, we strategically cover architectures and examples that we believe will arm you with fundamental skills for further exploration and staying up-to-date in this burgeoning field on your own. You made a good decision when you decided to learn more about this topic. Opportunities for novel theories, algorithmic methodologies, and breakthrough applications abound. I look forward to hearing about the transformational positive impact you make on the society around you with it.

📄 Page 15

acknowledgments I am grateful to members of the NLP Ghana open source community, where I have had the privilege to learn more about this important topic. The feedback from mem- bers of the group and users of our tools has served to underscore my understanding of how transformational this technology truly is. This has inspired and motivated me to push this book across the finish line. I would like to thank my Manning development editor, Susan Ethridge, for the uncountable hours spent reading the manuscript, providing feedback, and guiding me through the many challenges. I am thankful for all the time and effort my techni- cal development editor, Al Krinker, put in to help me improve the technical dimen- sion of my writing. I am grateful to all members of the editorial board, the marketing professionals, and other members of the production team that worked hard to make this book a reality. In no particular order, these include Rebecca Rinehart, Bert Bates, Nicole But- terfield, Rejhana Markanovic, Aleksandar Dragosavljević, Melissa Ice, Branko Latincic, Christopher Kaufmann, Candace Gillhoolley, Becky Whitney, Pamela Hunt, and Radmila Ercegovac. The technical peer reviewers provided invaluable feedback at several junctures during this project, and the book would not be nearly as good without them. I am very grateful for their input. These include Andres Sacco, Angelo Simone Scotto, Ariel Gamiño, Austin Poor, Clifford Thurber, Jaume López, Marc-Anthony Taylor, Mathijs Affourtit, Matthew Sarmiento, Michael Wall, Nikos Kanakaris, Ninoslav Cerkez, Or Golan, Rani Sharim, Sayak Paul, Sebastián Palma, Sergio Govoni, Todd Cook, andxiii

📄 Page 16

ACKNOWLEDGMENTSxivVamsi Sistla. I am thankful to the technical proofreader, Ariel Gamiño, for catching many typos and other errors during the proofreading process. I am grateful to all the excellent comments from book forum participants that further helped improve the book. I am extremely grateful to my wife, Diana, for supporting and encouraging this work. I am grateful to my Mom and my siblings—Richard, Gideon, and Gifty—for continuing to motivate me.

📄 Page 17

about this book This book is an attempt to produce a comprehensive practical introduction to the important topic of transfer learning for NLP. Rather than focusing on theory, we stress building intuition via representative code and examples. Our code is written to facili- tate quickly modifying and repurposing it to solve your own practical problems and challenges. Who should read this book? To get the most out of this book, you should have some experience with Python, as well as some intermediate machine learning skills, such as an understanding of basic classification and regression concepts. It would also help to have some basic data manipulation and preprocessing skills with libraries such as Pandas and NumPy. That said, I wrote the book in a way that allows you to pick up these skills with a bit of extra work. The first three chapters will rapidly bring you up to speed on everything you need to know to grasp the transfer learning for NLP concepts sufficiently to apply in your own projects. Subsequently, following the included curated references on your own will solidify your prerequisite background skills, if that is something you feel that you need. Road map The book is divided into three parts. You will get the most out of it by progressing through them in the order of appearance. Part 1 reviews key concepts in machine learning, presents a historical overview of advances in machine learning that have enabled the recent progress in transfer learningxv

📄 Page 18

ABOUT THIS BOOKxvifor NLP, and provides the motivation for studying the subject. It also walks through a pair of examples that serve to both review your knowledge of more traditional NLP methods and get your hands dirty with some key modern transfer learning for NLP approaches. A chapter-level breakdown of covered concepts in this part of the book follows: ■ Chapter 1 covers what exactly transfer learning is, both generally in AI and in the context of NLP. It also looks at the historical progression of technological advances that enabled it. ■ Chapter 2 introduces a pair of representative example natural language pro- cessing (NLP) problems and shows how to obtain and preprocess data for them. It also establishes baselines for them using the traditional linear machine learning methods of logistic regression and support vector machines. ■ Chapter 3 continues baselining the pair of problems from chapter 2 with the traditional tree-based machine learning methods—random forests and gradi- ent boosting machines. It also baselines them using key modern transfer learn- ing techniques, ELMo and BERT. Part 2 dives deeper into some important transfer learning NLP approaches based on shallow neural networks, that is, neural networks with relatively few layers. It also begins to explore deep transfer learning in more detail via representative techniques, such as ELMo, that employ recurrent neural networks (RNNs) for key functions. A chapter-level breakdown of covered concepts in this part of the book follows: ■ Chapter 4 applies shallow word and sentence embedding techniques, such as word2vec and sent2vec, to further explore some of our illustrative examples from part 1 of the book. It also introduces the important transfer learning con- cepts of domain adaptation and multitask learning. ■ Chapter 5 introduces a set of deep transfer learning NLP methods that rely on RNNs, as well as a fresh pair of illustrative example datasets that will be used to study them. ■ Chapter 6 discusses the methods introduced in chapter 5 in more detail and applies them to the datasets introduced in the same chapter. Part 3 covers arguably the most important subfield in this space, namely, deep transfer learning techniques relying on transformer neural networks for key functions, such as BERT and GPT. This model architecture class is proving to be the most influential on recent applications, partly due to better scalability on parallel computing architec- tures than equivalent prior methods. This part also digs deeper into various adapta- tion strategies for making the transfer learning process more efficient. A chapter-level breakdown of covered concepts in this part of the book follows: ■ Chapter 7 describes the fundamental transformer architecture and uses an important variant of it—GPT—for some text generation and a basic chatbot.

📄 Page 19

ABOUT THIS BOOK xvii■ Chapter 8 covers the important transformer architecture BERT and applies it to a number of use cases, including question answering, filling in the blanks, and cross-lingual transfer to a low-resource language. ■ Chapter 9 introduces some adaptation strategies meant to make the transfer learning process more efficient. This includes the strategies of discriminative fine-tuning and gradual unfreezing from the method ULMFiT, as well as knowl- edge distillation. ■ Chapter 10 introduces additional adaptation strategies, including embedding factorization and parameter sharing—strategies behind the ALBERT method. The chapter also covers adapters and sequential multitask adaptation. ■ Chapter 11 concludes the book by reviewing important topics and briefly dis- cussing emerging research topics and directions, such as the need to think about and mitigate potential negative impacts of the technology. These include biased predictions on different parts of the population and the environmental impact of training these large models. Software requirements Kaggle notebooks are the recommended way of executing these methods, because they allow you to get moving right away without any setup delays. Moreover, the free GPU resources provided by this service at the time of writing expand the accessibility of all these methods to people who may not have access to powerful GPUs locally, which is consistent with the “democratization of AI” agenda that excites so many peo- ple about NLP transfer learning. Appendix A provides a Kaggle quick start guide and a number of the author’s personal tips on how to maximize the platform’s usefulness. However, we anticipate that most readers should find it pretty self-explanatory to get started. We have hosted all notebooks publicly on Kaggle with all required data attached to enable you to start executing code in a few clicks. However, please remem- ber to “copy and edit” (fork) notebooks—instead of copying and pasting into a new Kaggle notebook—because this will ensure that the resulting libraries in the environ- ment match those that we wrote the code for. About the code This book contains many examples of source code both in numbered listings and in line with normal text. In both cases, source code is formatted in a fixed-width font like this to separate it from ordinary text. Sometimes code is also in bold to high- light code that has changed from previous steps in the chapter, such as when a new feature adds to an existing line of code. In many cases, the original source code has been reformatted; we’ve added line breaks and reworked indentation to accommodate the available page space in the book. In rare cases, even this was not enough, and listings include line-continuation markers (➥). Additionally, comments in the source code have often been removed

📄 Page 20

ABOUT THIS BOOKxviiifrom the listings when the code is described in the text. Code annotations accompany many of the listings, highlighting important concepts. The code for the examples in this book is available for download from the Man- ning website at http://www.manning.com/downloads/2116 and from GitHub at https://github.com/azunre/transfer-learning-for-nlp. liveBook discussion forum Purchase of Transfer Learning for Natural Language Processing includes free access to a private web forum run by Manning Publications where you can make comments about the book, ask technical questions, and receive help from the author and from other users. To access the forum, go to https://livebook.manning.com/#!/book/transfer- learning-for-natural-language-processing/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/#!/ discussion. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We sug- gest you try asking the author some challenging questions lest his interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website as long as the book is in print.

The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00

Total Amount (¥)

Donation Count

← Back to List