📄 Page
1
(This page has no text content)
📄 Page
2
Machine Learning and Deep Learning in Natural Language Processing Natural Language Processing (NLP) is a sub-field of Artificial Intelligence, linguis- tics, and computer science and is concerned with the generation, recognition, and understanding of human languages, both written and spoken. NLP systems ex- amine the grammatical structure of sentences as well as the specific meanings of words, and then they utilize algorithms to extract meaning and produce results. Machine Learning and Deep Learning in Natural Language Processing aims at providing a review of current Neural Network techniques in the NLP field, in par- ticular about Conversational Agents (chatbots), Text-to-Speech, management of non-literal content – like emotions, but also satirical expressions – and applica- tions in the healthcare field. NLP has the potential to be a disruptive technology in various healthcare fields, but so far little attention has been devoted to that goal. This book aims at provid- ing some examples of NLP techniques that can, for example, restore speech, detect Parkinson’s disease, or help psychotherapists. This book is intended for a wide audience. Beginners will find useful chapters pro- viding a general introduction to NLP techniques, while experienced professionals will appreciate the chapters about advanced management of emotion, empathy, and non-literal content.
📄 Page
3
(This page has no text content)
📄 Page
4
Machine Learning and Deep Learning in Natural Language Processing Edited By Anitha S. Pillai Hindustan Institute of Technology and Science, Chennai, India Roberto Tedesco Scuola universitaria professionale della Svizzera italiana (SUPSI), Lugano-Viganello, Switzerland
📄 Page
5
Designed cover image: ©ShutterStock Images First Edition published 2024 by CRC Press 2385 NW Executive Center Drive, Suite 320, Boca Raton, FL 33431 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2024 selection and editorial matter, Anitha S. Pillai and Roberto Tedesco; individual chapters, the contributors Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. For works that are not available on CCC please contact mpkbookspermissions@tandf. co.uk Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. ISBN: 978-1-032-26463-9 (hbk) ISBN: 978-1-032-28287-9 (pbk) ISBN: 978-1-003-29612-6 (ebk) DOI: 10.1201/9781003296126 Typeset in Minion by KnowledgeWorks Global Ltd.
📄 Page
6
v Contents Preface, vii Editors, xiii Contributors, xiv Part I Introduction ChaPter 1 ◾ Introduction to Machine Learning, Deep Learning, and Natural Language Processing 3 Anitha S. Pillai and Roberto Tedesco Part II Overview of Conversational Agents ChaPter 2 ◾ Conversational Agents and Chatbots: Current Trends 17 Alwin Joseph and Naived George Eapen ChaPter 3 ◾ Unsupervised Hierarchical Model for Deep Empathetic Conversational Agents 53 Vincenzo Scotti Part III Sentiment and Emotions ChaPter 4 ◾ EMOTRON: An Expressive Text-to-Speech 77 Cristian Regna, Licia Sbattella, Vincenzo Scotti, Alexander Sukhov, and Roberto Tedesco
📄 Page
7
vi ◾ Contents Part IV Fake News and Satire ChaPter 5 ◾ Distinguishing Satirical and Fake News 97 Anna Giovannacci and Mark J. Carman ChaPter 6 ◾ Automated Techniques for Identifying Claims and Assisting Fact Checkers 125 Stefano Agresti and Mark J. Carman Part V Applications in Healthcare ChaPter 7 ◾ Whisper Restoration Combining Real- and Source-Model Filtered Speech for Clinical and Forensic Applications 149 Francesco Roberto Dani, Sonia Cenceschi, Alice Albanesi, Elisa Colletti, and Alessandro Trivilini ChaPter 8 ◾ Analysis of Features for Machine Learning Approaches to Parkinson’s Disease Detection 169 Claudio Ferrante, Licia Sbattella, Vincenzo Scotti, Bindu Menon, and Anitha S. Pillai ChaPter 9 ◾ Conversational Agents, Natural Language Processing, and Machine Learning for Psychotherapy 184 Licia Sbattella INDEX, 224
📄 Page
8
vii Preface NATURAL LANGUAGE PROCESSING Machine Learning and Deep Learning in Natural Language Processsing aims at providing a review of current techniques for extracting informa- tion from human language, with a special focus on paralinguistic aspects. Such techniques represent an important part of the Artificial Intelligence (AI) research field. In fact, especially after the advent of very powerful conversational agents able to simulate a human being and interact with the user in a very convincing way, AI and the historical field of Natural Language Processing almost become synonymous (think of the abilities of GPT-3-derived models; for example, ChatGPT1). But let’s start with a brief discussion about AI. BRIEF INTRODUCTION TO AI AI is the ability of machines to perform tasks that would normally require human intelligence; in particular, AI focuses on three cognitive processes: learning, reasoning, and self-correction. Historically, AI methods have been divided into two broad categories: model-driven and data-driven mod- els. The former approach is based on a model of the task to be performed, derived by human experts looking at data; then, an algorithm is devised, based on the model. Instead, in the latter approach, the model is directly computed from data. In the following, we will focus on the latter approach. Machine Learning (ML) is a sub-field of AI and refers to data-driven methods where systems can learn on their own, from (possibly anno- tated) data, without much human intervention. Using ML models, com- puter scientists train a machine by feeding it large amounts of data, the so-called datasets. In the so-called supervised approach, such datasets are annotated by human experts (e.g., think of a set of speech recordings annotated with the related transcriptions), and thus the machine tries to
📄 Page
9
viii ◾ Preface find the correlations among input (e.g., speech recording) and output (e.g., provided transcription). Once trained, the machine is able to perform the task on new, unknown data (e.g., new speech recordings). Another popular approach is called unsupervised, where the machine is trained on a dataset that does not have any labels; the goal is to discover patterns or relation- ships in the data. Once trained, the machine is able to apply the pattern to new data; clustering is a typical application of the unsupervised approach. A semi-supervised approach is used when there is a combination of both labelled and unlabelled data, and labelled data is less in comparison with unlabelled data. Learning problems of this type cannot use neither super- vised nor unsupervised learning algorithms, and hence it is challenging. ML, in general, requires the developer to define the set of fea- tures (useful information extracted from “raw” data) that the model will leverage. For example, in automatic speech recognition, the Mel Frequency Cepstral Coefficient (MFCC) is the set of spectral and energy characteristics, extracted from the raw input audio samples, that classic ML models employed as input information. Selecting the right set of features is one of the most complex steps when a ML system is under development, and elicited much research efforts on feature selection, aggregation, etc. As an evolution of ML methodologies, the Deep Learning (DL) approach starts from raw data, leaving to the model the effort of discovering use- ful features to describe data in an efficient and effective way. Data, thus, go through a set of layers that try to extract a more and more abstract description of them. Then, the remaining parts of the model perform the required task. This approach is useful, in two ways: (1) developers do not need to choose the features, and (2) the description found by the model is usually way better than the set of pre-defined features developers employ. A Drawback of DL is the complexity of the resulting models and the need of a huge amount of data. In theory, many ML approaches can be “augmented” with DL, but in practice the models that are becoming the de-facto standards are based on Deep Neural Networks (DNNs, but often written as NNs). A NN is (loosely) inspired by the structure and function of the human brain; it is composed of a large number of interconnected nodes (neurons), which are usually organized into (many, in case of DNN) layers. Many differ- ent architectures (i.e., organization structures) have been defined so far, and probably many more will follow, permitting NNs to cope with basi- cally any data typology (numbers, text, images, audio, etc.) executing any
📄 Page
10
Preface ◾ ix conceivable task. DNNs proved to be so effective that often re-defined entire research fields (think of, for example, image recognition). Natural Language Processing (NLP) is a subset of AI, linguistics, and computer science, and it is concerned with generation, recognition, and understanding of human language, both written and spoken. NLP systems examine the grammatical structure of sentences as well as the specific meanings of words, and then they utilize algorithms to extract meaning and produce results. In other words, NLP permits to understand human language so that it can accomplish various activities automatically. NLP started in the 1950s as the intersection of AI and linguistics, and at present it is a combination of various diverse fields. In terms of NLP, task meth- ods for NLP are categorized into two types: syntax analysis and semantic analysis. Syntax analysis deals with understanding the structure of words, sentences, and documents. Some of the tasks under this category include morphological segmentation, word segmentation, Part-of-Speech (POS) tagging, and parsing. Semantics analysis, on the other hand, deals with the meanings of words, sentences, and their combination and includes named entity recognition, sentiment analysis, machine translation, ques- tion answering, etc. NLP MULTIMODAL DATA: TEXT, SPEECH, NON-VERBAL SIGNALS FOR ANALYSIS AND SYNTHESIS Data is available across a wide range of modalities. Language data is mul- timodal and is available in the form of text, speech, audio, gestures, facial expressions, nodding the head, acoustics, and so in an ideal human– machine conversational system, machines should be able to understand and interpret this multimodal language. Words are fundamental constructs in natural language and when arranged sequentially, such as in phrases or sentences, meaning emerges. NLP operations involve processing words or sequences of words appropriately. Identifying and extracting names of persons, places, objects, orga- nizations, etc., from natural language text is called the Named Entity Recognition (NER) task. Humans find this identification relatively easy, as proper nouns begin with capital letters. NER plays a major role in solv- ing many NLP problems, such as Question Answering, Summarization Systems, Information Retrieval, Machine Translation, Video Annotation, Semantic Web Search, and Bioinformatics. The Sixth Message Understanding Conference (MUC6) introduced the NER challenge, which
📄 Page
11
x ◾ Preface includes recognition of entity names (people and organizations), location names, temporal expressions, and numerical expressions. Semantics refers to the meaning being communicated, while syntax refers to the grammatical form of the text. Syntax is the set of rules needed to ensure a sentence is grammatically correct; semantics is how one’s lexi- con, grammatical structure, tone, and other elements of a sentence com- bine to communicate its meaning. The meaning of a word in Natural Language can vary depending on its usage in sentences and the context of the text. Word Sense Disambiguation (WSD) is the process of interpreting the meaning of a word based on its context in a text. For example, the word “bark” can refer to either a dog’s bark or the outermost layer of a tree. Similarly, the word “rock” can mean a “stone” or a “type of music” with the precise meaning of the word being highly dependent on its context and usage in the text. Thus, WSD refers to a machine’s ability to overcome the ambiguity involved in determining the meaning of a word based on its usage and context. Historically, NLP approaches took inspiration from two very differ- ent research fields: linguistics and computer science; in particular, lin- guistics was adopted to provide the theoretical basis on which to develop algorithms trying to transfer the insight of the theory to practical tasks. Unfortunately, this process proved to be quite difficult, as theories were typically too abstract to be implemented as an effective algorithm. On the other hand, computer science provided plenty of approaches, from AI and Formal Languages fields. Researchers took inspiration from practi- cally any methodology defined in such fields, with mixed results. Thus, the result was a plethora of very different approaches, often tailored on very specific tasks, that proved difficult to generalize and often not very effective. However, in a seminal paper published in 2003, Bengio and colleagues proposed an effective language model based on DNNs (Bengio et al., 2003). Moreover, in 2011, Collobert and colleagues proved that many NLP tasks could be greatly improved adopting DNNs (Collobert et al., 2011). Since then, ML and in particular DL and DNNs emerged as fundamental tools able to significantly improve the results obtained in many NLP tasks. One of the most difficult tasks that classical NLP methodologies strug- gled to cope with is the recognition of any kind of content “hidden” in the language, such as emotions, empathy, and in general any non-literal content (irony, satirical contents, etc.). As DL promises to improve on
📄 Page
12
Preface ◾ xi those areas, in this book we will focus on the richness of human affective interactions and dialogues (from both textual and vocal points of view). We will consider different application fields, paying particular attention to social and critical interactions and communication, and to clinics. HOW THIS BOOK IS ORGANIZED We organized the chapters into five parts: I. Introduction, II. Overview of Conversational Agents, III. Sentiment and Emotions, IV. Fake News and Satire, and V. Applications in Healthcare. In Part I, the editors introduce ML, DL, and NLP and the advancement of NLP applications using these technologies. Part II provides an overview on current methodologies for Conversational Agents and Chatbots. Chapter 2 focuses on the applications of Chatbots and Conversational Agents (CAs) where the authors have highlighted how vari- ous AI techniques have helped in the development of intelligent CAs, and they have also compared the different state-of-the-art NLP-based chatbot architectures. An architecture of an open-domain empathetic CA designed for social conversations trained in two steps is presented in Chapter 3. The agent learns the relevant high-level structures of the conversation, leverag- ing a mixture of unsupervised and supervised learning, and in the second step the agent is refined through supervised and reinforcement learning to learn to elicit positive sentiments in the user by selecting the most appropri- ate high-level aspects of the desired response. Part III focuses on methodologies for sentiment and emotion detection, and for production of Conversational Agent output that is augmented with emotions. In Chapter 4 authors present EMOTRON the conditioned gen- eration of emotional speech trained with a combination of a spectrogram regression loss, to enforce synthesis, and an emotional classification style loss, to induce the conditioning. Part IV presents methodologies for coping with fake news and satirical texts. In Chapter 5, how DL can be trained to effectively distinguish satiri- cal content from no satire is highlighted. In Chapter 6 the authors present the development of a prototype to assist journalists with their fact-check- ing activities by retrieving passages from news articles that may provide evidence for supporting or rebutting the claims. Finally, Part V shows some implementations of CA in the field of health- care. Chapter 7 focuses on the structure and development of the algorith- mic components of VocalHUM, a smart system aiming to enhance the
📄 Page
13
xii ◾ Preface intelligibility of patients’ whispered speech in real time, based on audio to minimize the muscular and respiratory effort necessary to achieve ade- quate voice intelligibility and the physical movements required to speak at a normal intensity. Chapter 8 identifies the features essential for early detection of Parkinson’s disease using a ML approach, and Chapter 9 explains how CAs, NLP, and ML help in psychotherapy. NOTE 1. https://openai.com/blog/chatgpt/ REFERENCES Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jauvin, A Neural Probabilistic Language Model, Journal of Machine Learning Research, 3(2003), pp. 1137–1155. Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa, Natural Language Processing (Almost) from Scratch, Journal of Machine Learning Research, 12(2011), pp. 2493–2537.
📄 Page
14
xiii Editors Anitha S. Pillai is a professor in the School of Computing Sciences, Hindustan Institute of Technology and Science, India. She earned a Ph.D. in Natural Language Processing and has three decades of teaching and research experience. She has authored and co-authored several papers in national and international confer- ences and journals. She is also the co-founder of AtINeu – Artificial Intelligence in Neurology – focusing on the applications of AI in neurological disorders. Roberto Tedesco earned a Ph.D. in Computer Science in 2006 at Politecnico di Milano in Milan, Italy, where he was contract professor for the Natural Language Processing and the Accessibility courses. He is now researcher at the Scuola universitaria professionale della Svizzera italiana (SUPSI) in Lugano, Switzerland. His research interests are NLP, assistive technologies, and HCI.
📄 Page
15
xiv Contributors Stefano Agresti Politecnico Di Milano Milan, Italy Alice Albanesi Scuola universitaria professionale della Svizzera italiana (SUPSI) Lugano-Viganello, Switzerland Mark J. Carman Politecnico Di Milano Milan, Italy Sonia Cenceschi Scuola universitaria professionale della Svizzera italiana (SUPSI) Lugano-Viganello, Switzerland Elisa Colletti Scuola universitaria professionale della Svizzera italiana (SUPSI) Lugano-Viganello, Switzerland Francesco Roberto Dani Scuola universitaria professionale della Svizzera italiana (SUPSI) Lugano-Viganello, Switzerland Naived George Eapen Christ University Pune, India Claudio Ferrante Politecnico Di Milano Milan, Italy Anna Giovannacci Politecnico Di Milano Milan, Italy Alwin Joseph Christ University Pune, India Bindu Menon Apollo Specialty Hospitals Nellore, India Cristian Regna Politecnico Di Milano Milan, Italy Licia Sbattella Politecnico Di Milano Milan, Italy
📄 Page
16
Contributors ◾ xv Vincenzo Scotti Politecnico Di Milano Milan, Italy Alexander Sukhov Politecnico Di Milano Milan, Italy Alessandro Trivilini Scuola universitaria professionale della Svizzera italiana (SUPSI) Lugano-Viganello, Switzerland
📄 Page
17
(This page has no text content)
📄 Page
18
1 I Introduction
📄 Page
19
(This page has no text content)
📄 Page
20
3DOI: 10.1201/9781003296126-2 C h a p t e r 1 Introduction to Machine Learning, Deep Learning, and Natural Language Processing Anitha S. Pillai Hindustan Institute of Technology and Science, Tamil Nadu, India Roberto Tedesco Scuola universitaria professionale della Svizzera italiana (SUPSI), Lugano-Viganello, Switzerland 1.1 ARTIFICIAL INTELLIGENCE FOR NATURAL LANGUAGE PROCESSING Natural Language Processing (NLP) is a sub-field of computer science, information engineering, and Artificial Intelligence (AI) that deals with the computational processing and comprehension of human languages. NLP started in the 1950s as the intersection of AI and linguistics, and at present it is a combination of various diverse fields (Nadkarni et al., 2011, Otter et al., 2021). Ample volume of text is generated daily by various social media platforms and web applications making it difficult to process and discover the knowledge or information hidden in it, especially within the given time limits. This paved the way for automation using AI techniques and tools to analyze and extract information from documents, trying to emulate what human beings are capable of doing with a limited volume of text data. Moreover, NLP also aims to teach machines to interact with