The AI Pocketbook (Emmanuel Maggiori) (Z-Library)

(This page has no text content)

The AI Pocket Book Emmanuel Maggiori To comment go to livebook. Manning Shelter Island For more information on this and other Manning titles go to manning.com.

copyright For online information and ordering of this and other Manning books, please visit www.manning.com. The publisher offers discounts on this book when ordered in quantity. For more information, please contact Special Sales Department Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Email: orders@manning.com ©2025 by Manning Publications Co. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by means electronic, mechanical, photocopying, or otherwise, without prior written permission of the publisher. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in the book, and Manning Publications was aware of a trademark claim, the designations have been printed in initial caps or all caps. Recognizing the importance of preserving what has been written, it is Manning’s policy to have the books we publish

printed on acid-free paper, and we exert our best efforts to that end. Recognizing also our responsibility to conserve the resources of our planet, Manning books are printed on paper that is at least 15 percent recycled and processed without the use of elemental chlorine. The authors and publisher have made every effort to ensure that the information in this book was correct at press time. The authors and publisher do not assume and hereby disclaim any liability to any party for any loss, damage, or disruption caused by errors or omissions, whether such errors or omissions result from negligence, accident, or any other cause, or from any usage of the information herein. Manning Publications Co. 20 Baldwin Road PO Box 761 Shelter Island, NY 11964 Development editor: Ian Hough Technical editor: Artur Guja Review editor: Radmila Ercegovac Production editor: Andy Marinkovich Copy editor: Lana Todorovic-Arndt Proofreader: Keri Hales Typesetter: Bojan Stojanović and Tamara Švelić Sabljić Cover designer: Marija Tudor ISBN: 9781633435759 Printed in the United States of America

contents preface acknowledgments about this book about the author about the cover illustration 1 How AI works 2 Hallucinations 3 Selecting and evaluating AI tools 4 When to use (and not to use) AI 5 How AI will affect jobs and how to stay ahead 6 The fine print appendix A Catalog of generative AI tools index

preface In the 2010s, a methodology known as machine learning became extremely popular. The novelty of machine learning was that, instead of writing every detail of a computer program by hand, some parts were determined automatically by having a computer analyze data. While machine learning wasn’t new, it rose to prominence during this period thanks to increased computing power and an unprecedented amount of data ready to be exploited. Machine learning soon became the favorite methodology of artificial intelligence, which is a more general research field that tries to have computers perform tasks similarly to humans. Notably, AI researchers used machine learning to reach record performance in automated analysis of images, video, and text. They also used machine learning to build the famous game-playing software AlphaGo, which beat a human player at the difficult game of Go. Machine learning also boomed in the business world. For example, companies started using it to automatically analyze online shoppers’ data and generate personalized product recommendations. Due to machine learning’s success and wide adoption in the AI field, people soon started using the terms “machine learning” and “AI” interchangeably. The business world became highly enthusiastic about AI’s prospects and made big promises. However, while AI expanded steadily in academia and business, it was not massively adopted by the general public. This was probably because general-purpose AI tools weren’t all that useful yet (think of Alexa and Siri) and because AI was still not that great at analyzing natural language. But in the late 2010s and early 2020s, a series of methodological innovations made AI much better at analyzing written language and generating new content. This led to a race to build AI tools that could be used as high-performing assistants by the general public.

AI exploded in 2022, with the launch of a number of remarkable customer- facing AI apps. One of them was ChatGPT, which reached a hundred million users in three months. Another one was Midjourney, a powerful tool for creating realistic images from a textual description. Enthusiasm about AI soared and so did dramatic predictions about its effects. Some people predicted extreme productivity gains. Others predicted massive unemployment due to AI replacing people’s jobs. In particular, many people argued that software engineers would become obsolete. I’m a software engineer who specializes in AI. I did my PhD in AI and have been involved in the field for over a decade. Early in my career, while I was impressed by AI, I became a bit frustrated by the amount of hype around it —I kept stumbling upon failed AI projects that were swept under the rug, and I had the impression that AI’s limitations were often overlooked. In 2023, I published a book on the subject, titled Smart Until It’s Dumb (Applied Maths Ltd, 2023). As opposed to other books on AI, which were either very positive or negative about it, I wanted to share a more nuanced view. As the title implies, I think AI can be really cool sometimes, but it can be less cool other times—think of those pesky hallucinations that AI often suffers from. After I wrote that book, people started asking me questions about all things AI related. For example, they asked me whether I thought machines would become conscious or whether self-driving cars would soon roam every street. But the most common topic was the future of work. Specifically, aspiring software engineers seemed particularly concerned about their future careers. People asked me, “Is it even worth becoming a software engineer, now that AI can code?” A teacher told me a few of her students had dropped out because they thought AI would make their skills irrelevant. In addition, numerous software engineers started to use AI at work and build AI-based products, but they often told me they couldn’t make it work as intended. For example, they said AI often generated inconsistent outputs, and users didn’t appreciate it. This book is intended to help you ride the AI revolution, both in terms of using AI effectively and making sure your job stays ahead of what AI can do. The book is based on my own experience in the AI field and also on the

numerous conversations I’ve had with people about it. You’ll read stories, reflections, and general advice, which I hope you’ll find useful. After you finish the book, I hope you’ll feel that you understand AI better, including its limitations, and that you’ll discover new ways of using AI effectively and future-proof your career against it.

acknowledgments The most difficult thing about writing a book is not putting words together or thinking about grammar (which AI is quite good at). Instead, the most difficult thing is writing a book whose content resonates with the target audience. That’s why my biggest thank you goes to the humans who went through this book’s draft and shared useful advice to improve it. This includes my developmental editor at Manning, Ian Hough and my technical editor, Artur Guja, risk manager, computer scientist, systems developer, and financial markets professional with over 20 years of experience in the banking sector. I’d also like to thank my acquisitions editor, Andy Waldron, and the wider Manning team who’ve been extremely helpful throughout the process. Finally, many thanks to all the reviewers from the software industry who read the draft early on and shared their thoughts: Aarohi Tripathi, Aayush Bhutani, Aeshna Kapoor, Ajay Tanikonda, An Nadein, Anil Kumar Moka, Annie Taylor Chen, Anupam Mehta, Arpankumar Patel, Arpit Chaudhary, Ashish Anil Pawar, Batul Bohara, Devendra Singh Parmar, Divakar Verma, Gajendra Babu Thokala, Harsh Daiya, Karthik Rajashekaran, Lalit Chourey, Maksym Prokhorenko, Manohar Sai Jasti, Martin Knudsen, Meghana Puvvadi, Mohit Palriwal, Naresh Dulam, Natapong Sornrpom, Nilesh Charankar, Nupur Baghel, Prachit Kurani, Prakash Reddy Putta, Prasann Pradeep Patil, Premkumar Reddy, Raghav Hrishikeshan Mukundan, Radhika Kanubaddhi, Rajeev Reddy Vishaka, Rajesh Daruvuri, Ram Kumar Nimmakayala, Riddhi Shah, Ruchi Agarwal, Sai Chiligireddy, Shivendra Srivastava, Siddharth Parakh, Subba Rao Katragadda, Sudheer Kumar Lagisetty, Sumit Dahiya, Sudharshan Tumkunta, and Vishnu Challagulla. Your feedback helped improve this book. Thank you all!

about this book This book will help you navigate the AI revolution, using AI effectively in your work and making sure your job won’t be replaced by AI. The book was primarily written for software engineers, but its content was designed to be accessible to other audiences, too. So, there are no prerequisites to read this book, and anyone should be able to understand it. It is helpful, however, to know the basics of coding and math to fully understand all the examples. The book starts with a plain-English overview of how AI works. It then covers a wide range of timely and controversial AI-related topics such as hallucinations, the future of work, and copyright. Who should read this book? Two main groups of people should read this book. The first one is software engineers—aspiring, novice, and seasoned ones—who want to understand the effects of AI on their careers and prepare for it. The second group includes people related to or interested in the software industry, even if they’re not engineers themselves. For example, these are product managers and startup entrepreneurs. One of this book’s reviewers said he thought the book would be useful not just for software engineers but also for “software sympathizers,” which I thought was a good way to put it.

How this book is organized: A road map The book is divided into six chapters: Chapter 1: How AI works—This chapter explains how large language models and other types of AI work and how AI is built. Chapter 2: Hallucinations—This chapter explains the reasons for AI’s pesky mistakes (known as hallucinations), whether they will be fixed soon, and what we can do about them. Chapter 3: Selecting and evaluating AI tools—This chapter explains a method to select and compare different AI tools and avoid common biases in your evaluation. Chapter 4: When to use (and not to use) AI—This chapter is a checklist that will help you decide whether it is a good idea to use AI to assist you with a certain task or as the building block of a customer- facing product. Chapter 5: How AI will affect jobs and how to stay ahead—This chapter explains three characteristics of jobs that will help them resist AI advancements and how software engineers can stay relevant in the AI era. Chapter 6: The fine print—This chapter covers the less flattering side of AI, such as exaggeration, copyright disputes, and dubious comparisons of AI models with the human brain. It is meant to help you get up to speed with some of the bigger questions around AI.

liveBook discussion forum Purchase of The AI Pocket Book includes free access to liveBook, Manning’s online reading platform. Using liveBook’s exclusive discussion features, you can attach comments to the book globally or to specific sections or paragraphs. It’s a snap to make notes for yourself, ask and answer technical questions, and receive help from the author and other users. To access the forum, go to https://livebook.manning.com/book/the-ai- pocketbook/discussion. You can also learn more about Manning’s forums and the rules of conduct at https://livebook.manning.com/discussion. Manning’s commitment to our readers is to provide a venue where a meaningful dialogue between individual readers and between readers and the author can take place. It is not a commitment to any specific amount of participation on the part of the author, whose contribution to the forum remains voluntary (and unpaid). We suggest you try asking the author some challenging questions lest their interest stray! The forum and the archives of previous discussions will be accessible from the publisher’s website for as long as the book is in print.

about the author Emmanuel Maggiori, PhD, has been an AI industry insider for 10 years. He has developed AI for various applications, from processing satellite images to packaging deals for holiday travelers. He is the author of the books Smart Until It’s Dumb, which analyzes the AI industry, and Siliconned, which analyzes the wider tech industry.

about the cover illustration The figure on the cover of The AI Pocketbook, captioned “Allay tzaoussou (alay chavushu), ou inspecteur aux parades,” or “Allay tzaoussou (alay chavushu), or parade inspector,” is from the George Arents Collection, courtesy of the New York Public Library (1808–1826). In those days, it was easy to identify where people lived and what their trade or station in life was just by their dress. Manning celebrates the inventiveness and initiative of the computer business with book covers based on the rich diversity of regional culture centuries ago, brought back to life by pictures from collections such as this one.

1 How AI works This chapter covers The way LLMs process inputs and generate outputs The transformer architecture that powers LLMs Different types of machine learning How LLMs and other AI models learn from data How convolutional neural networks are used to process different types of media with AI Combining different types of data (e.g., producing images from text) This chapter clarifies how AI works, discussing many foundational AI topics. Since the latest AI boom, many of these topics (e.g., “embeddings” and “temperature”) are now widely discussed, not just by AI practitioners but also by businesspeople and the general public. This chapter demystifies them. Instead of just piling up definitions and writing textbook explanations, this chapter is a bit more opinionated. It points out common AI problems, misconceptions, and limitations based on my experience working in the field, as well as discussing some interesting insights you might not be aware of. For example, we’ll discuss why language generation is more expensive in French than in English and how OpenAI hires armies of human workers to manually help train ChatGPT. So, even if you are already familiar with all the topics covered in this chapter, reading it might provide you with a different perspective. The first part of this chapter is a high-level explanation of how large language models (LLMs) such as ChatGPT work. Its sections are ordered to roughly mimic how LLMs themselves turn inputs into outputs one step at a time.

The middle part of this chapter discusses machine learning, which is the technique that makes computers learn from data to create LLMs and other types of AI. Note that AI and machine learning don’t mean the same. AI is a research field that tries to create computer programs to perform tasks in a way similar to humans. Machine learning may or may not be used for that goal. However, machine learning has been the preferred methodology in AI for at least two decades. So, you might hear people use the terms AI and machine learning interchangeably. When I speak of AI in this book, I mean current AI methods, and these methods involve the use of machine learning. The last third of this chapter discusses how AI works outside language generation. Specifically, I give an overview of how AI analyzes and generates images or combinations of text and images. We also comment on current developments in AI-based video generation. Enjoy the ride! How LLMs work Language models are computer programs that try to represent the structure of human language. A large language model, or LLM, is a language model on steroids. Its sheer size lets the LLM perform complex analyses of sentences and generate new text with impressive performance. Examples of LLMs are Open AI’s GPT-4o, Meta’s Llama-3, Anthropic’s Claude 3.5 Sonnet, Google’s Gemini 1.5 Pro, and Mistral AI’s Mixtral 8x7b. Current LLMs are designed to perform one specific task—guess the next word given an input sentence. The input sentence is known as the prompt. Suppose I asked you to predict the word that comes after the incomplete sentence “The Eiffel.” You’re very likely to suggest that “Tower” is the most logical choice. This is the exact job LLMs are designed to do. So, we can think of LLMs as sophisticated autocomplete programs. Officially, we say that LLMs are autoregressive, which means that they’re designed to produce a single extra piece of content based on previous content. The autocomplete task may seem simple at first, but it is far-reaching. Consider the following prompt: “How much is 2 + 5? It is. . .”

Autocompleting this kind of sentence requires knowing how to perform arithmetic operations. So, the task of performing arithmetic operations is included in the autocomplete task. Now, consider the following prompt: “How do you say ‘umbrella’ in French?” To accurately autocomplete this kind of sentence, you’d need to be capable of translating French to English. So, at least in theory, the autocomplete task encompasses all sorts of tasks. LLMs are created using machine learning, a process in which a computer analyzes a huge amount of data—pretty much a snapshot of the entire public internet—to automatically put the LLM together. The resulting LLM is a self-contained piece of software, meaning that it doesn’t access any external information to generate its outputs. For example, it doesn’t browse the web to make its next-word predictions. In addition, the LLM is static, so it must be periodically updated with new data if we want it to speak about recent events. When we interact with LLMs, we don’t usually do so directly. Instead, we use an intermediary piece of software that processes our requests and manages the underlying LLM. Let’s call it the LLM wrapper. The wrapper uses tricks to provide further functionality to the user than just guessing the next word like the bare LLM would do. For example, the wrapper generates entire sentences, responds in a chatty way, and answers with real-time information, such as the current date. An example of an LLM wrapper is ChatGPT, which is OpenAI’s customer- facing application. This application manages our interactions with the underlying LLM, such as GPT-4 and GPT-4o. Note that it is common to just use the term LLM to refer to the whole AI system, including the wrapper. The next few sections discuss examples of how LLM wrappers use tricks to enhance the capabilities of their underlying, next-word guessing LLMs. Text generation We typically use LLMs to output entire sentences instead of just guessing a single word. The LLM wrapper achieves this through a simple trick: it

makes the LLM eat its own output repeatedly. Suppose we give an LLM the prompt “The Eiffel.” The LLM guesses the most likely continuation of the sentence: “Tower.” The LLM wrapper then attaches this word to the initial prompt, which leads to the new prompt: “The Eiffel Tower.” It then uses the LLM to guess the following word, say, “is” and attaches it to the prompt again. The process is repeated (see figure 1.1) to generate entire sentences such as “The Eiffel Tower is in Paris.” Figure 1.1 To generate full sentences, the LLM wrapper used the LLM to generate one word, then attached that word to the initial prompt, then used the LLM again to generate one more word, and so on. End of text In addition to outputting regular words, LLMs are designed to output a handful of special code words. One of them, often stylized as “<|end of text|>” in the literature, is a code word that signals the end of the text. When the LLM is built, it is exposed to examples of sentences containing this special code word to indicate their end. So, the LLM gains the capability of guessing that the next best word is actually an indication of its ending. When the LLM wrapper encounters this special code word, it stops the process of having the LLM eat its own output to generate more text, as explained in the previous section.

Chat Have a look at what happened when I inputted the prompt “The Eiffel” to ChatGPT: It seems that your sentence was cut off. Were you asking about the Eiffel Tower? The Eiffel Tower is one of the most iconic landmarks in the world, located in Paris, France. It was designed by Gustave Eiffel and completed in 1889 for the Exposition Universelle, a world’s fair held to celebrate the 100th anniversary of the French Revolution. Originally criticized by some of Paris's leading artists and intellectuals for its design, the tower has become a global cultural icon of France and one of the most recognizable structures in the world. By the looks of it, the LLM is not just autocompleting my input, or else it would have continued with “Tower” instead of telling me that my sentence was cut off. What happened here was that, behind the scenes, the LLM wrapper inserted a special code word to indicate the end of my prompt. This code word is often stylized as “<|end of prompt|>” in the literature. So, the actual input sent to the LLM by the wrapper was “The Eiffel<|end of prompt|>”. During its construction, the underlying LLM was fed examples of dialog ending with “<|end of prompt|>”, which is why it recognized that in my sentence, the code word was located at an unusual place. As we continue chatting back and forth with ChatGPT, the wrapper sends the LLM a history of the entire conversation each time, filled with “<|end of prompt|>” and “<|end of text|>” code words to clearly delimit the beginning and end of each interaction. This helps ChatGPT converse effectively. OpenAI lets software developers use a more naked version of the LLM wrapper, which doesn’t insert the “<|end of prompt|>” code word after the

user’s prompt. This is available through the API (an API is a way to connect two pieces of software—in this case, your software with OpenAI’s software) or the online platform for software developers (see https://platform.openai.com) by using the completions mode of the API instead of the default chat mode. NOTE OpenAI’s documentation contains a warning sign telling us that the completions mode is “now considered legacy,” but it is still available for use as of this writing. Let’s see what happened when I inputted “The Eiffel” to the LLM using the completions mode instead: Tower was inaugurated on March 31st in 1889. It was originally built as the entrance arch for the 1889 World’s Fair, and was only intended to stand for 20 years. However, its unique design and popularity among Parisians and tourists alike led to its preservation and it has become an iconic symbol of France and one of the most recognizable monuments in the world. The idea for the Eiffel Tower came from businessman Gustave Eiffel, who wanted to create a structure that would showcase the advancements in engineering and design during the Industrial Revolution. He and his team spent two years designing and constructing the tower, using over 18,000 individual pieces of wrought iron and 2.5 million rivets. We can see that the API simply autocompleted my input with a sensible continuation, which is why it started with “ Tower” (the word Tower with a leading space) instead of telling me that my prompt was incomplete. The system prompt I asked, “What is today’s date?” The response was

Statistics

Uploader

The AI Pocketbook (Emmanuel Maggiori) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Recommended for You

Statistics

Uploader

The AI Pocketbook (Emmanuel Maggiori) (Z-Library)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment

Recommended for You