Understanding Natural Language Understanding (Erik Cambria) (z-library.sk, 1lib.sk, z-lib.sk)

Understanding Natural Language Understanding Erik Cambria

Understanding Natural Language Understanding

Erik Cambria Understanding Natural Language Understanding

Erik Cambria Nanyang Technological University Singapore, Singapore College of Computing and Data Science ISBN 978-3-031-73973-6 ISBN 978-3-031-73974-3 (eBook) https://doi.org/10.1007/978-3-031-73974-3 © The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature Switzerland AG 2025 This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed. The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or the editors give a warranty, expressed or implied, with respect to the material contained herein or for any errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Cover illustration: Cover illustration created by the author using Adobe Firefly This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland If disposing of this product, please recycle the paper.

To Marvin Minsky, A wonderful mentor and a visionary pioneer whose brilliance and curiosity have paved the way for generations of dreamers and thinkers. Your groundbreaking work in artificial intelligence has not only expanded the horizons of human knowledge but also inspired us to imagine a future where machines understand and enrich our lives. In gratitude for your relentless pursuit of understanding and your unwavering belief in the potential of the human mind. MIT Media Lab, Cambridge, MA, United States, 2008

Preface About half a century ago, artificial intelligence (AI) pioneers like Marvin Minsky embarked on the ambitious project of emulating how the human mind encodes and decodes meaning. While today we have a better understanding of the brain thanks to neuroscience, we are still far from unlocking the secrets of themind. Especially when it comes to language, the prime example of human intelligence, we face enormous diculties in replicating how the human mind processes it. “Understanding natural language understanding”, i.e., understanding how the mind encodes and decodes meaning through language, is a significant milestone in our journey towards creating machines that genuinely comprehend human language. Large language models (LLMs), such as GPT-4, have astounded us with their ability to generate coherent, contextually relevant text, seemingly bridging the gap between human and machine communication. Yet, despite their impressive capabil- ities, these models operate on statistical patterns rather than true comprehension. This textbook delves into the nuanced dierences between these two paradigms and explores the future of AI as we strive to achieve true natural language understand- ing (NLU). LLMs excel at identifying and replicating patterns within vast datasets, producing responses that appear intelligent and meaningful. They can generate text that mimics human writing styles, provide summaries of complex documents, and even engage in extended dialogues with users. However, their limitations become evident when they encounter tasks that require deeper understanding, reasoning, and contextual knowledge. LLMs can produce plausible-sounding but incorrect answers, struggle with ambiguous queries, and often lack the ability to generalize knowledge across dierent domains eectively. Instead, an NLU system that deconstructs meaning leveraging linguistics and semiotics (on top of statistical analysis) represents a more profound level of language comprehension. It involves understanding context in a manner similar to human cognition, discerning subtle meanings, implications, and nuances that current LLMs might miss or misinterpret. NLU grasps the semantics behind words and sentences, comprehending synonyms, metaphors, idioms, and abstract concepts with precision. This deeper comprehension allows for consistent accuracy and robustness, enabling AI systems to handle ambiguous, incomplete, or novel queries more eectively. vii

viii Preface One of the key advantages of cognitive-inspired NLU systems is their ability to learn and adapt from interactions meaningfully. Unlike LLMs, which remain static once trained, NLU systems can incorporate new knowledge dynamically, correcting misunderstandings and adapting to new contexts. This adaptability is crucial in fields where information is rapidly evolving, such as technology and medicine. Moreover, NLU systems can generalize knowledge from one domain to another, providing more reliable and versatile applications across interdisciplinary fields. Ethical considerations also play a significant role in the development ofNLUsystems, as these are more transparent and can better recognize and mitigate biases, ensuring fair and ethical responses. They understand the potential impact of their responses, avoiding harmful or inappropriate content more reliably. This understanding of ethical implications and societal norms is crucial for building trustworthy AI systems in sensitive areas like mental health, education, and decision-making. Achieving true NLU involves advanced knowledge representation, such as incor- porating symbolic reasoning and structured knowledge bases. This includes ontolo- gies, semantic networks, and rule-based systems that explicitly encode relationships and rules. Combining symbolic AI withmachine learning creates hybrid systems that leverage both structured knowledge and the pattern recognition strengths of LLMs. Neurosymbolic integration, which merges neural networks with symbolic reasoning systems, is a promising approach to understanding and generating more accurate and contextually appropriate responses. Embedding real-world knowledge and com- monsense reasoning into AI systems is also crucial for NLU. Training on diverse data sources and integrating world models that simulate real-world scenarios enable these systems to maintain context over long conversations, understand intents and sentiments, and manage turn-taking eectively. Furthermore, NLU systems should emulate human-like learning and adaptation, continuously learning from human interaction, feedback, and correction. In this textbook, we explore the current state of LLMs, their capabilities, and limitations, and contrast them with the aspirational goals of NLU. We delve into the technical foundations required for achieving true NLU, including advanced knowl- edge representation, hybrid AI systems, and neurosymbolic integration. We also examine some ethical implications and societal impacts of developing AI systems that genuinely understand human language. The textbook features multiple exercises at the end of each chapter, along with a group assignment and a final quiz. It evolved out of the author’s own teaching course SC4021 (NTU CCDS course on information retrieval) but it can be used for many other courses, e.g., natural language processing (NLP), AI, data analytics, data mining, etc., at NTU and universities worldwide. We will explore dierent ways of encoding and decoding meaning, which can be used for knowledge representation and reasoning in several downstream applications. As we embark on this exploration, we are reminded of the pioneering work of visionaries likeMarvinMinsky, whose relentless pursuit of understanding and belief in the potential of the human mind have inspired us to dream of a future where machines not only mimic human language but truly comprehend it. This textbook is dedicated to those who continue to push the boundaries of AI, striving to create systems that enrich our lives with genuine understanding and insight.

Acknowledgements This work would have never been possible without the help of my wonderful re- search group, the Sentic Team (https://sentic.net/team), who have helped me translate my silly ideas into concrete research works over the last ten years. Special thanks go to my awesome postdocs Drs. Rui Mao, Qian Liu, and Xulang Zhang, who were instrumental in organizing and refining the materials of this textbook. Last but not least, I thank my beautiful wife, Jocelyn Choong, for often forcing me to focus on finishing this book despite the many distractions life has to oer. Sentic Team, NTU CCDS, 2024 ix

Contents 1 Natural Language Understanding & AI . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 1.2 Towards More Reliable AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2.1 Multidisciplinarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.2 Task Decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 1.2.3 Parallel Analogy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.2.4 Symbol Grounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 1.2.5 Similarity Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.2.6 Intention Awareness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.2.7 Trustworthiness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 1.3 Towards More Responsible AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.3.1 Impact of AI on Economy and Humans . . . . . . . . . . . . . . . . . . 14 1.3.2 Towards Human-like Recommender Systems . . . . . . . . . . . . . 15 1.3.3 Responsible Recommender Systems . . . . . . . . . . . . . . . . . . . . 18 1.3.4 What Next . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 1.3.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 1.4 Towards More Personalized AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 1.4.1 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 1.4.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 1.4.3 Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 1.4.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 1.6 Learning Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 2 Syntactics Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 2.2 Microtext Normalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 2.2.1 Linguistic Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 2.2.2 Statistical Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 2.2.3 Neural Network Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 2.2.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 xi

xii Contents 2.3 Sentence Boundary Disambiguation . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 2.3.1 Word-based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 2.3.2 Syntax-based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 2.3.3 Prosody-based Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62 2.3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 2.4 POS Tagging . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.4.1 Tagging Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67 2.4.2 Feature Engineering Approach . . . . . . . . . . . . . . . . . . . . . . . . . 68 2.4.3 Deep Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 2.4.4 Semi-Supervised Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 2.4.5 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 2.5 Text Chunking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 2.5.1 Tagging Schemas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83 2.5.2 Feature Engineering Approach . . . . . . . . . . . . . . . . . . . . . . . . . 84 2.5.3 Deep Learning Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90 2.5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 2.6 Lemmatization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96 2.6.1 Transformation Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97 2.6.2 Statistical Transduction Approach . . . . . . . . . . . . . . . . . . . . . . 99 2.6.3 Neural Transduction Approach . . . . . . . . . . . . . . . . . . . . . . . . . 102 2.6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107 2.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110 2.8 Learning Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111 3 Semantics Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 3.2 Word Sense Disambiguation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118 3.2.1 Theoretical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 3.2.2 Annotation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 3.2.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121 3.2.4 Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 3.2.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 3.2.6 Annotation Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 3.2.7 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126 3.2.8 Downstream Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131 3.2.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133 3.3 Named Entity Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138 3.3.1 Theoretical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 3.3.2 Annotation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142 3.3.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143 3.3.4 Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 3.3.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146 3.3.6 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 3.3.7 Downstream Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 3.3.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 https://avxhm.se/blogs/hill0

Contents xiii 3.4 Concept Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161 3.4.1 Theoretical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 3.4.2 Annotation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 3.4.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164 3.4.4 Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167 3.4.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 3.4.6 Annotation Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 3.4.7 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168 3.4.8 Downstream Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174 3.4.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 175 3.5 Anaphora Resolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 3.5.1 Theoretical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 3.5.2 Annotation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183 3.5.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184 3.5.4 Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187 3.5.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188 3.5.6 Annotation Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189 3.5.7 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190 3.5.8 Downstream Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197 3.5.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199 3.6 Subjectivity Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203 3.6.1 Theoretical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204 3.6.2 Annotation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 3.6.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206 3.6.4 Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209 3.6.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 3.6.6 Annotation Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 3.6.7 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 210 3.6.8 Downstream Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220 3.6.9 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221 3.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 226 3.8 Learning Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227 4 Pragmatics Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229 4.2 Metaphor Understanding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 4.2.1 Theoretical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234 4.2.2 Annotation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237 4.2.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238 4.2.4 Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240 4.2.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241 4.2.6 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242 4.2.7 Downstream Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249 4.2.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 251 4.3 Sarcasm Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 https://avxhm.se/blogs/hill0

xiv Contents 4.3.1 Theoretical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255 4.3.2 Annotation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 4.3.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258 4.3.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 4.3.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 261 4.3.6 Downstream Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 268 4.3.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270 4.4 Personality Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 275 4.4.1 Theoretical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276 4.4.2 Annotation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 278 4.4.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 4.4.4 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 4.4.5 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 281 4.4.6 Downstream Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 289 4.4.7 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 4.5 Aspect Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295 4.5.1 Theoretical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296 4.5.2 Annotation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 297 4.5.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298 4.5.4 Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301 4.5.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 4.5.6 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 302 4.5.7 Downstream Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 307 4.5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 308 4.6 Downstream Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314 4.6.1 Theoretical Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 4.6.2 Annotation Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317 4.6.3 Datasets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 319 4.6.4 Knowledge Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 320 4.6.5 Evaluation Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 4.6.6 Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321 4.6.7 Downstream Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330 4.6.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 4.7 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336 4.8 Learning Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337 5 Knowledge Representation & Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . 339 5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339 5.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344 5.2.1 Theory of Conceptual Primitives . . . . . . . . . . . . . . . . . . . . . . . 344 5.2.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 5.2.3 Related Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 347 5.3 Overall Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 5.3.1 Task Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352 5.3.2 Overall Framework of PrimeNet Construction . . . . . . . . . . . . 353 https://avxhm.se/blogs/hill0

Contents xv 5.4 Knowledge Graph Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355 5.4.1 Commonsense Knowledge Acquisition . . . . . . . . . . . . . . . . . . 355 5.4.2 Knowledge Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356 5.4.3 Graph Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357 5.4.4 Exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358 5.5 Concept Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 5.5.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 5.5.2 Conceptualization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360 5.6 Primitive Discovery . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363 5.6.1 Concept Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364 5.6.2 Primitive Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 5.7 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 5.7.1 Statistics and Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366 5.7.2 Case Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 372 5.8 Future Directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373 5.8.1 Logical Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374 5.8.2 Implicit Reasoning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375 5.8.3 Neurosymbolic AI . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 5.9 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 376 5.10 Learning Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377 6 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 379 6.1 Learning Resources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 6.1.1 Assignment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381 6.1.2 Quiz . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 385 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 400 https://avxhm.se/blogs/hill0

Acronyms ABOM Aspect-Based Opinion Mining ABSA Aspect-Based Sentiment Analysis ACE Automatic Content Extraction AI Artificial Intelligence ALM Anomalous Language Modeling AMT Amazon Mechanical Turk ASR Automatic Speech Recognition AZP Anaphoric Zero Pronoun AZPR Anaphoric Zero Pronoun Resolution BERT Bidirectional Encoder Representations from Transformer BGRNN Bidirectional Gated Recurrent Neural Network BLEU Bi-Lingual Evaluation Understudy CEG Commonsense Explanation Generation CNN Convolutional Neural Network CRF Conditional Random Field CMT Conceptual Metaphor Theory DAG Directed Acyclic Graph DRL Deep Reinforcement Learning FFM Five Factor Model FOL First-Order Logic GBN Gaussian Bayesian Network GCN Graph Convolutional Network GRU Gated Recurrent Unit GPT Generative Pretrained Transformer HAN Hierarchical Attention Network HCI Human-Computer Interaction HMM Hidden Markov Model IPA International Phonetic Alphabet JERE Joint Entity and Relation Extraction KGC Knowledge Graph Construction LCS Longest Common Subsequence xvii https://avxhm.se/blogs/hill0

xviii Acronyms LDA Latent Dirichlet Allocation LLM Large Language Model LSTM Long Short-Term Memory MAE Mean Absolute Error MAP Mean Average Precision MBTI Myers-Briggs Type Indicator MLP Multi-Layer Perceptron MRC Machine-Reading Comprehension MRD Machine-Readable Dictionary MRF Markov Random Field MSE Mean Squared Error MTL Multi-Task Learning MUC Message Understanding Conference MWE Multi-Word Expression NER Named Entity Recognition NLP Natural Language Processing NLU Natural Language Understanding NMF Non-negative Matrix Factorization NMT Neural Machine Translation NTN Neural Tensor Network OCR Optical Character Recognition OMCS Open Mind Common Sense OOV Out Of Vocabulary PCA Principal Component Analysis POS Part Of Speech PLM Pretrained Language Model PMI Pointwise Mutual Information PSA Position-aware Self Attention QA Question Answering RDR Ripple Down Rule RNN Recurrent Neural Network ROC Receiver Operation Characteristic RRS Responsible Recommender System SBD Sentence Boundary Disambiguation SES Shortest Edit Script SMO Sequential Minimal Optimization SMS Short Message Service SMT Statistical Machine Translation SRL Semantic Role Labeling SSL Semi Supervised Learning STM Syntactic Tree Matching SVD Singular Value Decomposition SVM Support Vector Machine UGC User-Generated Content WSD Word Sense Disambiguation https://avxhm.se/blogs/hill0

Chapter 1 Natural Language Understanding & AI Abstract In this chapter, we delve into the critical role that natural language un- derstanding (NLU) plays in shaping the future of artificial intelligence (AI). To set the stage, we begin by defining what constitutes an NLU system. Next, we explore how NLU can drive the evolution of next-generation AI systems, which promise to be more reliable, responsible, and personalized. To this end, we introduce the seven pillars for the future of AI, which represent the foundational elements necessary to advance AI technology in a way that is more transparent and reliable. Next, we propose the concept of responsible recommender systems, which incorporate ethi- cal guidelines and user-centric principles to ensure recommendations are not only relevant but also fair, unbiased, and respectful of user privacy. Lastly, we present a framework for personalized sentiment analysis, which aims at making AI systems more responsive and attuned to the needs and emotions of each user. Key words: Natural Language Understanding, Reliable AI, Responsible AI, Per- sonalized AI 1.1 Introduction We define an NLU system as a brain-inspired modular framework that deconstructs meaning through explicitly modeling the cognitive processes that the human mind leverages to encode and decode language. While large language models (LLMs) like GPT-4 have demonstrated significant advancements in generating coherent and contextually relevant text, they fundamentally lack genuine comprehension. This distinction between statistical pattern recognition and actual understandingmarks the critical dierence between natural language processing (NLP) and the aspirational goal of NLU. 1© The Author(s), under exclusive license to Springer Nature Switzerland AG 2025 E. Cambria, Understanding Natural Language Understanding, https://doi.org/10.1007/978-3-031-73974-3_1

2 1 Natural Language Understanding & AI NLU involves an understanding of context that mirrors human comprehension. It discerns subtle meanings, implications, and nuances that LLMs might miss or misinterpret. For example, understanding the sentence “The bank is on the river bank” requires recognizing that “bank” refers to two dierent concepts based on context. While LLMs use statistical patterns to guess meanings, NLU comprehends these distinctions inherently. This comprehension extends to more complex con- structs like sarcasm, irony, and humor, which often elude LLMs. Moreover, NLU goes beyond mere word associations to grasp the semantics behind sentences. This includes understanding synonyms, metaphors, idioms, and abstract concepts. For instance, interpreting a metaphor like “Time is a thief” involves recognizing the abstract concept that time, like a thief, can take things away from us. LLMs might recognize this phrase as common but would not truly understand the conceptual com- parison without extensive training data on similar metaphors. Similarly, idiomatic expressions like “kick the bucket” (meaning “to die”) require an understanding of cultural context and figurative language that NLU can provide. NLU ensures consistent accuracy by providing precise and reliable responses. This capability stems from an understanding that enables the system to handle am- biguous, incomplete, or novel queries eectively through reasoning. LLMs, in con- trast, sometimes generate plausible-sounding but incorrect or nonsensical answers due to their reliance on probabilistic models. For instance, when asked a complex question involving multiple steps of logic, an LLM might provide an answer that fits part of the question but does not fully resolve the complexity. An example might be a multi-part medical diagnosis where the system needs to integrate symptoms, patient history, and current medical knowledge to provide an accurate assessment. The robustness of NLU means it can interpret and respond accurately even when the input is vague or requires additional contextual knowledge. For example, if a user says, “I need a place to stay near the event,” NLU can infer that the user is looking for accommodations close to a specific location, while an LLM might provide generic information about lodging without understanding the specific requirement. NLU can also handle evolving conversations where the context shifts, maintaining coherence and relevance in its responses. NLU systems are capable of generalizing knowledge from one domain to another eectively. While LLMs are proficient at recognizing patterns within their training data, they often struggle to apply knowledge across vastly dierent contexts. NLU, however, can transfer learning and apply relevant information dynamically. For in- stance, an NLU system could use its understanding of medical terminology to assist in a legal context where medical information is relevant. This cross-domain gener- alization is crucial for applications in interdisciplinary fields such as bioinformatics, where knowledge of both biology and data science is necessary. Furthermore, NLU systems learn and adapt from interactions in a meaningful way. They can incorpo- rate new knowledge and correct misunderstandings dynamically, similar to human learning processes. This contrasts with LLMs, which are static once trained. For example, if an NLU system encounters a new scientific discovery, it can integrate this information into its knowledge base and apply it in future interactions.

1.1 Introduction 3 NLU can better recognize and avoid biases, leading to more fair and ethical responses. LLMs can inadvertently reinforce biases present in their training data because they lack the deeper understanding necessary to critically evaluate and mitigate such biases. For example, an LLM trained on biased data might perpetuate stereotypes, whereas NLU would recognize and avoid such biases. This capability is essential in applications such as hiring processes, where unbiased decision-making is crucial for fairness.Moreover, NLUunderstands the potential impact of its responses, avoiding harmful or inappropriate content more reliably. It comprehends the ethical implications and societal norms guiding human interactions, ensuring safer and more responsible AI behavior. For instance, an NLU system would avoid making insensitive comments about sensitive topics, understanding the context and potential repercussions. This understanding helps in creating AI systems that can be trusted in sensitive applications like mental health support and education. Achieving true NLU involves advanced knowledge representation, such as incor- porating symbolic reasoning and structured knowledge bases. This includes ontolo- gies, semantic networks, and rule-based systems that explicitly encode relationships and rules. For example, an NLU system could use an ontology to understand the relationship between dierent medical conditions and treatments. This structured approach allows the system to make logical inferences and provide reasoned answers based on a deep understanding of the subject matter. Combining symbolic AI with machine learning creates hybrid systems that leverage both structured knowledge and the pattern recognition strengths of LLMs. Neurosymbolic integration, which merges neural networks with symbolic reasoning systems, helps in understanding and generating more accurate and contextually appropriate responses. For instance, a neurosymbolic system might use neural networks to process natural language input and symbolic reasoning to deduce the appropriate response based on an internal knowledge base. This hybrid approach allows for more sophisticated and reliable AI systems that can handle complex queries and tasks. NLU also requires embedding real-world knowledge and commonsense reasoning into AI systems. This involves training on diverse data sources and integratingworldmodels that simulate real-world scenarios. Advanced dialogue systems can maintain context over long conversations, understand intents and sentiments, and manage turn-taking eectively. For example, a customer service chatbot with NLU would handle a multi-step customer query seamlessly, maintaining context and providing accurate solutions throughout the interaction. This capability is essential for creating AI systems that can engage in meaningful and productive conversations with users. Techniques like meta-learning and analogical reasoning enable systems to adapt quickly to new information and contexts, transferring knowledge from known situations to new, similar ones. This continuous learning and adaptation make AI systems more resilient and eective in dynamic environments. In summary, NLU systems go beyond the mere statistical analysis of language and, hence, have the potential to be the enablers of next-generation AI systems that are reliable, responsible and personalized. We discuss this in more detail in the next three sections.

4 1 Natural Language Understanding & AI 1.2 Towards More Reliable AI In 2022, the world was stunned by ChatGPT, a chatbot that relies on an LLM built by means of generative pretraining transformers (GPT). The performance capabilities of GPT-based LLMs enable chatbots to generate detailed, original, and plausible responses to prompts. GPT-4 and other LLMs are pretrained on a large dataset (self- supervised and at scale), before being adapted for a variety of downstream tasks through fine-tuning. Pretraining is time-intensive and never repeated, whereas fine- tuning is conducted in a regular fashion. The behavior of GPT-based chatbots arises through fine-tuning. The performance capabilities of LLMs have been attributed to at least two factors: pretraining and scale (Bommasani et al., 2021). Pretraining, an instance of transfer learning in which LLMs use knowledge acquired from one task and transfer it to another, makes LLMs possible. Scale, including better computer hardware, the transformer architecture, the availability of more and higher-quality training data, makes LLMs powerful. Although these capabilities are not insub- stantial, they do not yet rise to the level of NLU (Bender and Koller, 2020; Amin et al., 2024, 2023). In addition, LLMs are prone to hallucination: ChatGPT may produce linguistic responses that, though syntactically and semantically fine and credible-sounding, are ultimately incorrect (Shen et al., 2023b). Furthermore, we may distinguish between the capabilities of LLMs (acquired through pretraining) and the behavior (aected by fine-tuning, which happens after pretraining) of LLMs. Fine-tuning can have unintended eects, including behavioral drift on certain tasks. ChatGPT, in fact, seems prone to the ‘short blanket dilemma’: while trying to im- prove its accuracy on some tasks, OpenAI researchers inadvertently made ChatGPT worse for tasks which it previously excelled at (Chen et al., 2023a). AI research has slowly been drifting away from what its forefathers envisioned back in the 1960s. Instead of evolving towards the emulation of human intelligence, AI research has regressed into themimicking of intelligent behavior in the past decade or so. The main goal of most tech companies is not designing the building blocks of intelligence but simply creating products that existing and potential customers deem intelligent. In this context, instead of labeling it as ‘artificial’ intelligence, it may be more apt to characterize such research as ‘pareidoliac’ intelligence. This term highlights the development of expert systems while raising questions about their claim to possess genuine intelligence. We feel there is a need for an AI refocus on humanity, an Anti-Copernican revolution of sorts: like Copernicus demoted humans from their privileged spot at the center of the universe, in fact, deep learning has removed humans from the equation of learning. In traditional neural networks, especially those with a shallow architecture (few hidden layers), humans were at the center of the technological universe as they had to carefully design the input features, select appropriate hyperparameters, adjust learning rates, etc. Instead, due to their increased complexity and capacity to automatically learn features from data, deep neural networks do not require manual feature engineering and, hence, have eectively removed humans from the loop of learning. While this is good in terms of cost, time, and eectiveness, it is bad for several other reasons, including transparency, accountability, and bias.

1.2 Towards More Reliable AI 5 In the deep learning era, humans no longer have control on how the learning process takes place. To save on cost and time, we delegated the important task of selecting which features are important for classification to deep neural networks. These, however, are mathematical models with no commonsense whatsoever: they do not know how to properly choose features. For example, in selecting candidates for a job opening, deep neural networks may decide that gender is an important feature to take into account simply because more men are present in the training data as positive samples. The issue is not only that deep nets may accidentally choose unimportant or even wrong features, but that we have no way of knowing this because of their black-box nature (Yeo et al., 2024b). In other words, not only humans have been taken out of the picture but they have also been blindfolded. For these reasons, we feel there is a need to bring human-centered capabilities back at the center of AI, e.g., by having human-in-the-loop or human-in-command systems that ensure AI outputs and rea- soning steps are human-readable and human-editable. To this end, we propose seven pillars for the future of AI (Cambria et al., 2023), namely: Multidisciplinarity, Task Decomposition, Parallel Analogy, Symbol Grounding, Similarity Measure, Intention Awareness, and Trustworthiness (Fig. 1.1). Fig. 1.1: Seven pillars for the future of AI.

Statistics

Uploader

Understanding Natural Language Understanding (Erik Cambria) (z-library.sk, 1lib.sk, z-lib.sk)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Recommended for You

Statistics

Uploader

Understanding Natural Language Understanding (Erik Cambria) (z-library.sk, 1lib.sk, z-lib.sk)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment

Recommended for You