Bayesian Statistics The Fun Way Understanding Statistics And Probability With Star Wars, LEGO, And Rubber Ducks (Will Kurt) (z-library.sk, 1lib.sk, z-lib.sk)
Author: Will Kurt
数据
This book will give you a complete understanding of Bayesian statistics through simple explanations and un-boring examples. Find out the probability of UFOs landing in your garden, how likely Han Solo is to survive a flight through an asteroid shower, how to win an argument about conspiracy theories, and whether a burglary really was a burglary, to name a few examples.
📄 File Format:
PDF
💾 File Size:
6.3 MB
12
Views
0
Downloads
0.00
Total Donations
📄 Text Preview (First 20 pages)
ℹ️
Registered users can read the full content for free
Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.
📄 Page
1
(This page has no text content)
📄 Page
2
BAYESIAN STATISTICS THE FUN WAY Understanding Statistics and Probability with Star Wars®, LEGO®, and Rubber Ducks by Will Kurt San Francisco
📄 Page
3
BAYESIAN STATISTICS THE FUN WAY. Copyright © 2019 by Will Kurt. All rights reserved. No part of this work may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying, recording, or by any information storage or retrieval system, without the prior written permission of the copyright owner and the publisher. ISBN-10: 1-59327-956-6 ISBN-13: 978-1-59327-956-1 Publisher: William Pollock Production Editor: Laurel Chun Cover Illustration: Josh Ellingson Interior Design: Octopod Studios Developmental Editor: Liz Chadwick Technical Reviewer: Chelsea Parlett-Pelleriti Copyeditor: Rachel Monaghan Compositor: Danielle Foster Proofreader: James Fraleigh Indexer: Erica Orloff For information on distribution, translations, or bulk sales, please contact No Starch Press, Inc. directly: No Starch Press, Inc. 245 8th Street, San Francisco, CA 94103 phone: 1.415.863.9900; sales@nostarch.com www.nostarch.com A catalog record of this book is available from the Library of Congress No Starch Press and the No Starch Press logo are registered trademarks of No Starch Press, Inc. Other product and company names mentioned herein may be the trademarks of their respective owners. Rather than use a trademark symbol with every occurrence of a trademarked name, we are using the names only in an editorial fashion and to the benefit of the trademark owner, with no intention of infringement of the trademark. The information in this book is distributed on an “As Is” basis, without warranty. While every precaution has been taken in the preparation of this work, neither the author nor No Starch Press, Inc. shall have any liability to any person or entity with respect to any loss or damage caused or alleged to be caused directly or indirectly by the information contained in it.
📄 Page
4
About the Author Will Kurt currently works as a data scientist at Wayfair, and has been using Bayesian statistics to solve real business problems for over half a decade. He frequently blogs about probability on his website, CountBayesie.com. Kurt is the author of Get Programming with Haskell (Manning Publications) and lives in Boston, Massachusetts. About the Technical Reviewer Chelsea Parlett-Pelleriti is a PhD student in Computational and Data Science, and has a long- standing love of all things lighthearted and statistical. She is also a freelance statistics writer, contributing to projects including the YouTube series Crash Course Statistics and The Princeton Review’s Cracking the AP Statistics Exam. She currently lives in Southern California.
📄 Page
5
ACKNOWLEDGMENTS Writing a book is really an incredible effort that involves the hard work of many people. Even with all the names following I can only touch on some of the many people that have made this book possible. I would like to start by thanking my son, Archer, for always keeping me curious and inspiring me. The books published by No Starch have long been my some of my favorite books to read and it is a real honor to get to work with the amazing team there to produce this book. I give tremendous thanks to my editors, reviewers, and the incredible team at No Starch. Liz Chadwick originally approached me about creating this book and provided excellent editiorial feedback and guidence through the entire porcess of this book. Laurel Chun made sure the entire process of going from some messy R notebooks to a full fledged book went incredibly smoothly. Chelsea Parlett-Pelleriti went well beyond the requirements of a technical reviewer and really helped to make this book the best it can be. Frances Saux added many insightful comments to the later chapters of the book. And of course thank you to Bill Pollock for creating such a delightful publishing company. As an English literature major in undergrad I never could have imagined writing a book on any mathematical subject. There are a few people who were really essential to helping me see the wonder of mathematics. I will forever be grateful to my college roommate, Greg Muller, who showed a crazy English major just how exciting and interesting the world of mathematics can be. Professor Anatoly Temkin at Boston University opened the doors to mathematical thinking for me by teaching me to always answer the question, “what does this mean?” And of course a huge thanks to Richard Kelley who, when I found myself in the desert for many years, provided an oasis of mathematical conversations and guidence. I would also like to give a shoutout to the data science team at Bombora, especially Patrick Kelley, who provided so many wonderful questions and coversations, some of which found their way into this book. I will also be forever grateful to the readers of my blog, Count Bayesie, who have always provided wonderful questions and insights. Among these readers, I would especially like to thank the commentor Nevin who helped correct some early misunderstandings I had. Finally I want to give thanks to some truly great authors in Bayesian statistics whose books have done a great deal to guide my own growth in the subject. John Kruschke’s Doing Bayesian Data Analysis and Bayesian Data Analysis by Andrew Gelman, et al. are great books everyone should read. By far the most influential book on my own thinking is E.T. Jaynes’ phenomenal Probability Theory: The Logic of Science, and I’d like to add thanks to Aubrey Clayton for making a series of lectures on this challenging book which really helped clarify it for me.
📄 Page
6
INTRODUCTION Virtually everything in life is, to some extent, uncertain. This may seem like a bit of an exaggeration, but to see the truth of it you can try a quick experiment. At the start of the day, write down something you think will happen in the next half-hour, hour, three hours, and six hours. Then see how many of these things happen exactly like you imagined. You’ll quickly realize that your day is full of uncertainties. Even something as predictable as “I will brush my teeth” or “I’ll have a cup of coffee” may not, for some reason or another, happen as you expect. For most of the uncertainties in life, we’re able to get by quite well by planning our day. For example, even though traffic might make your morning commute longer than usual, you can make a pretty good estimate about what time you need to leave home in order to get to work on time. If you have a super-important morning meeting, you might leave earlier to allow for delays. We all have an innate sense of how to deal with uncertain situations and reason about uncertainty. When you think this way, you’re starting to think probabilistically. WHY LEARN STATISTICS? The subject of this book, Bayesian statistics, helps us get better at reasoning about uncertainty, just as studying logic in school helps us to see the errors in everyday logical thinking. Given that virtually everyone deals with uncertainty in their daily life, as we just discussed, this makes the audience for this book pretty wide. Data scientists and researchers already using statistics will benefit from a deeper understanding and intuition for how these tools work. Engineers and programmers will learn a lot about how they can better quantify decisions they have to make (I’ve even used Bayesian analysis to identify causes of software bugs!). Marketers and salespeople can apply the ideas in this book when running A/B tests, trying to understand their audience, and better assessing the value of opportunities. Anyone making high-level decisions should have at least a basic sense of probability so they can make quick back-of-the-envelope estimates about the costs and benefits of uncertain decisions. I wanted this book to be something a CEO could study on a flight and develop a solid enough foundation by the time they land to better assess choices that involve probabilities and uncertainty. I honestly believe that everyone will benefit from thinking about problems in a Bayesian way. With Bayesian statistics, you can use mathematics to model that uncertainty so you can make better choices given limited information. For example, suppose you need to be on time for work for a particularly important meeting and there are two different routes you could take. The first route is usually faster, but has pretty regular traffic back-ups that can cause huge delays. The second route takes longer in general but is less prone to traffic. Which route should you take? What type of information would you need to decide this? And how certain can you be in your choice? Even just a small amount of added complexity requires some extra thought and technique.
📄 Page
7
Typically when people think of statistics, they think of scientists working on a new drug, economists following trends in the market, analysts predicting the next election, baseball managers trying to build the best team with fancy math, and so on. While all of these are certainly fascinating uses of statistics, understanding the basics of Bayesian reasoning can help you in far more areas in everyday life. If you’ve ever questioned some new finding reported in the news, stayed up late browsing the web wondering if you have a rare disease, or argued with a relative over their irrational beliefs about the world, learning Bayesian statistics will help you reason better. WHAT IS “BAYESIAN” STATISTICS? You may be wondering what all this “Bayesian” stuff is. If you’ve ever taken a statistics class, it was likely based on frequentist statistics. Frequentist statistics is founded on the idea that probability represents the frequency with which something happens. If the probability of getting heads in a single coin toss is 0.5, that means after a single coin toss we can expect to get one-half of a head of a coin (with two tosses we can expect to get one head, which makes more sense). Bayesian statistics, on the other hand, is concerned with how probabilities represent how uncertain we are about a piece of information. In Bayesian terms, if the probability of getting heads in a coin toss is 0.5, that means we are equally unsure about whether we’ll get heads or tails. For problems like coin tosses, both frequentist and Bayesian approaches seem reasonable, but when you’re quantifying your belief that your favorite candidate will win the next election, the Bayesian interpretation makes much more sense. After all, there’s only one election, so speaking about how frequently your favorite candidate will win doesn’t make much sense. When doing Bayesian statistics, we’re just trying to accurately describe what we believe about the world given the information we have. One particularly nice thing about Bayesian statistics is that, because we can view it simply as reasoning about uncertain things, all of the tools and techniques of Bayesian statistics make intuitive sense. Bayesian statistics is about looking at a problem you face, figuring out how you want to describe it mathematically, and then using reason to solve it. There are no mysterious tests that give results that you aren’t quite sure of, no distributions you have to memorize, and no traditional experiment designs you must perfectly replicate. Whether you want to figure out the probability that a new web page design will bring you more customers, if your favorite sports team will win the next game, or if we really are alone in the universe, Bayesian statistics will allow you to start reasoning about these things mathematically using just a few simple rules and a new way of looking at problems. WHAT’S IN THIS BOOK Here’s a quick breakdown of what you’ll find in this book. Part I: Introduction to Probability Chapter 1: Bayesian Thinking and Everyday Reasoning This first chapter introduces you to Bayesian thinking and shows you how similar it is to everyday methods of thinking critically about a situation. We’ll explore the probability that a bright light outside your window at night is a UFO based on what you already know and believe about the world. Chapter 2: Measuring Uncertainty In this chapter you’ll use coin toss examples to assign actual values to your uncertainty in the form of probabilities: a number from 0 to 1 that represents how certain you are in your belief about something.
📄 Page
8
Chapter 3: The Logic of Uncertainty In logic we use AND, NOT, and OR operators to combine true or false facts. It turns out that probability has similar notions of these operators. We’ll investigate how to reason about the best mode of transport to get to an appointment, and the chances of you getting a traffic ticket. Chapter 4: Creating a Binomial Probability Distribution Using the rules of probability as logic, in this chapter, you’ll build your own probability distribution, the binomial distribution, which you can apply to many probability problems that share a similar structure. You’ll try to predict the probability of getting a specific famous statistician collectable card in a Gacha card game. Chapter 5: The Beta Distribution Here you’ll learn about your first continuous probability distribution and get an introduction to what makes statistics different from probability. The practice of statistics involves trying to figure out what unknown probabilities might be based on data. In this chapter’s example, we’ll investigate a mysterious coin-dispensing box and the chances of making more money than you lose. Part II: Bayesian Probability and Prior Probabilities Chapter 6: Conditional Probability In this chapter, you’ll condition probabilities based on your existing information. For example, knowing whether someone is male or female tells us how likely they are to be color blind. You’ll also be introduced to Bayes’ theorem, which allows us to reverse conditional probabilities. Chapter 7: Bayes’ Theorem with LEGO Here you’ll gain a better intuition for Bayes’ theorem by reasoning about LEGO bricks! This chapter will give you a spatial sense of what Bayes’ theorem is doing mathematically. Chapter 8: The Prior, Likelihood, and Posterior of Bayes’ Theorem Bayes’ theorem is typically broken into three parts, each of which performs its own function in Bayesian reasoning. In this chapter, you’ll learn what they’re called and how to use them by investigating whether an apparent break-in was really a crime or just a series of coincidences. Chapter 9: Bayesian Priors and Working with Probability Distributions This chapter explores how we can use Bayes’ theorem to better understand the classic asteroid scene from Star Wars: The Empire Strikes Back, through which you’ll gain a stronger understanding of prior probabilities in Bayesian statistics. You’ll also see how you can use entire distributions as your prior. Part III: Parameter Estimation Chapter 10: Introduction to Averaging and Parameter Estimation Parameter estimation is the method we use to formulate a best guess for an uncertain value. The most basic tool in parameter estimation is to simply average your observations. In this chapter we’ll see why this works by analyzing snowfall levels. Chapter 11: Measuring the Spread of Our Data Finding the mean is a useful first step in estimating parameters, but we also need a way to account for how spread out our observations are. Here you’ll be introduced to mean absolute deviation (MAD), variance, and standard deviation as ways to measure how spread out our observations are. Chapter 12: The Normal Distribution By combining our mean and standard deviation, we get a very useful distribution for making estimates: the normal distribution. In this chapter, you’ll learn how to use the normal distribution to not only estimate unknown values but also to know how certain you are in those estimates. You’ll use these new skills to time your escape during a bank heist.
📄 Page
9
Chapter 13: Tools of Parameter Estimation: The PDF, CDF, and Quantile Function Here you’ll learn about the PDF, CDF, and quantile function to better understand the parameter estimations you’re making. You’ll estimate email conversion rates using these tools and see what insights each provides. Chapter 14: Parameter Estimation with Prior Probabilities The best way to improve our parameter estimates is to include a prior probability. In this chapter, you’ll see how adding prior information about the past success of email click-through rates can help us better estimate the true conversion rate for a new email. Chapter 15: From Parameter Estimation to Hypothesis Testing: Building a Bayesian A/B Test Now that we can estimate uncertain values, we need a way to compare two uncertain values in order to test a hypothesis. You’ll create an A/B test to determine how confident you are in a new method of email marketing. Part IV: Hypothesis Testing: The Heart of Statistics Chapter 16: Introduction to the Bayes Factor and Posterior Odds: The Competition of Ideas Ever stay up late, browsing the web, wondering if you might have a super-rare disease? This chapter will introduce another approach to testing ideas that will help you determine how worried you should actually be! Chapter 17: Bayesian Reasoning in The Twilight Zone How much do you believe in psychic powers? In this chapter, you’ll develop your own mind-reading skills by analyzing a situation from a classic episode of The Twilight Zone. Chapter 18: When Data Doesn’t Convince You Sometimes data doesn’t seem to be enough to change someone’s mind about a belief or help you win an argument. Learn how you can change a friend’s mind about something you disagree on and why it’s not worth your time to argue with your belligerent uncle! Chapter 19: From Hypothesis Testing to Parameter Estimation Here we come full circle back to parameter estimation by looking at how to compare a range of hypotheses. You’ll derive your first example of statistics, the beta distribution, using the tools that we’ve covered for simple hypothesis tests to analyze the fairness of a particular fairground game. Appendix A: A Quick Introduction to R This quick appendix will teach you the basics of the R programming language. Appendix B: Enough Calculus to Get By Here we’ll cover just enough calculus to get you comfortable with the math used in the book. BACKGROUND FOR READING THE BOOK The only requirement of this book is basic high school algebra. If you flip forward, you’ll see a few instances of math, but nothing particularly onerous. We’ll be using a bit of code written in the R programming language, which I’ll provide and talk through, so there’s no need to have learned R beforehand. We’ll also touch on calculus, but again no prior experience is required, and the appendixes will give you enough information to cover what you’ll need. In other words, this book aims to help you start thinking about problems in a mathematical way without requiring significant mathematical background. When you finish reading it, you may find yourself inadvertently writing down equations to describe problems you see in everyday life! If you do happen to have a strong background in statistics (even Bayesian statistics), I believe you’ll still have a fun time reading through this book. I have always found that the best way to understand a field well is to revisit the fundamentals over and over again, each time in a different light. Even as
📄 Page
10
the author of this book, I found plenty of things that surprised me just in the course of the writing process! NOW OFF ON YOUR ADVENTURE! As you’ll soon see, aside from being very useful, Bayesian statistics can be a lot of fun! To help you learn Bayesian reasoning we’ll be taking a look at LEGO bricks, The Twilight Zone, Star Wars, and more. You’ll find that once you begin thinking probabilistically about problems, you’ll start using Bayesian statistics all over the place. This book is designed to be a pretty quick and enjoyable read, so turn the page and let’s begin our adventure in Bayesian statistics!
📄 Page
11
PART I INTRODUCTION TO PROBABILITY
📄 Page
12
1 BAYESIAN THINKING AND EVERYDAY REASONING In this first chapter, I’ll give you an overview of Bayesian reasoning, the formal process we use to update our beliefs about the world once we’ve observed some data. We’ll work through a scenario and explore how we can map our everyday experience to Bayesian reasoning. The good news is that you were already a Bayesian even before you picked up this book! Bayesian statistics is closely aligned with how people naturally use evidence to create new beliefs and reason about everyday problems; the tricky part is breaking down this natural thought process into a rigorous, mathematical one. In statistics, we use particular calculations and models to more accurately quantify probability. For now, though, we won’t use any math or models; we’ll just get you familiar with the basic concepts and use our intuition to determine probabilities. Then, in the next chapter, we’ll put exact numbers to probabilities. Throughout the rest of the book, you’ll learn how we can use rigorous mathematical techniques to formally model and reason about the concepts we’ll cover in this chapter. REASONING ABOUT STRANGE EXPERIENCES One night you are suddenly awakened by a bright light at your window. You jump up from bed and look out to see a large object in the sky that can only be described as saucer shaped. You are generally a skeptic and have never believed in alien encounters, but, completely perplexed by the scene outside, you find yourself thinking, Could this be a UFO?! Bayesian reasoning involves stepping through your thought process when you’re confronted with a situation to recognize when you’re making probabilistic assumptions, and then using those assumptions to update your beliefs about the world. In the UFO scenario, you’ve already gone through a full Bayesian analysis because you: 1. Observed data 2. Formed a hypothesis 3. Updated your beliefs based on the data This reasoning tends to happen so quickly that you don’t have any time to analyze your own thinking. You created a new belief without questioning it: whereas before you did not believe in the existence of UFOs, after the event you’ve updated your beliefs and now think you’ve seen a UFO. In this chapter, you’ll focus on structuring your beliefs and the process of creating them so you can examine it more formally, and we’ll look at quantifying this process in chapters to come.
📄 Page
13
Let’s look at each step of reasoning in turn, starting with observing data. Observing Data Founding your beliefs on data is a key component of Bayesian reasoning. Before you can draw any conclusions about the scene (such as claiming what you see is a UFO), you need to understand the data you’re observing, in this case: • An extremely bright light outside your window • A saucer-shaped object hovering in the air Based on your past experience, you would describe what you saw out your window as “surprising.” In probabilistic terms, we could write this as: P(bright light outside window, saucer-shaped object in sky) = very low where P denotes probability and the two pieces of data are listed inside the parentheses. You would read this equation as: “The probability of observing bright lights outside the window and a saucer- shaped object in the sky is very low.” In probability theory, we use a comma to separate events when we’re looking at the combined probability of multiple events. Note that this data does not contain anything specific about UFOs; it’s simply made up of your observations—this will be important later. We can also examine probabilities of single events, which would be written as: P(rain) = likely This equation is read as: “The probability of rain is likely.” For our UFO scenario, we’re determining the probability of both events occurring together. The probability of one of these two events occurring on its own would be entirely different. For example, the bright lights alone could easily be a passing car, so on its own the probability of this event is more likely than its probability coupled with seeing a saucer-shaped object (and the saucer-shaped object would still be surprising even on its own). So how are we determining this probability? Right now we’re using our intuition—that is, our general sense of the likelihood of perceiving these events. In the next chapter, we’ll see how we can come up with exact numbers for our probabilities. Holding Prior Beliefs and Conditioning Probabilities You are able to wake up in the morning, make your coffee, and drive to work without doing a lot of analysis because you hold prior beliefs about how the world works. Our prior beliefs are collections of beliefs we’ve built up over a lifetime of experiences (that is, of observing data). You believe that the sun will rise because the sun has risen every day since you were born. Likewise, you might have a prior belief that when the light is red for oncoming traffic at an intersection, and your light is green, it’s safe to drive through the intersection. Without prior beliefs, we would go to bed terrified each night that the sun might not rise tomorrow, and stop at every intersection to carefully inspect oncoming traffic. Our prior beliefs say that seeing bright lights outside the window at the same time as seeing a saucer-shaped object is a rare occurrence on Earth. However, if you lived on a distant planet populated by vast numbers of flying saucers, with frequent interstellar visitors, the probability of seeing lights and saucer-shaped objects in the sky would be much higher. In a formula we enter prior beliefs after our data, separated with a | like so:
📄 Page
14
We would read this equation as: “The probability of observing bright lights and a saucer-shaped object in the sky, given our experience on Earth, is very low.” The probability outcome is called a conditional probability because we are conditioning the probability of one event occurring on the existence of something else. In this case, we’re conditioning the probability of our observation on our prior experience. In the same way we used P for probability, we typically use shorter variable names for events and conditions. If you’re unfamiliar with reading equations, they can seem too terse at first. After a while, though, you’ll find that shorter variable names aid readability and help you to see how equations generalize to larger classes of problems. We’ll assign all of our data to a single variable, D: D = bright light outside window, saucer-shaped object in sky So from now on when we refer to the probability of set of data, we’ll simply say, P(D). Likewise, we use the variable X to represent our prior belief, like so: X = experience on Earth We can now write this equation as P(D | X). This is much easier to write and doesn’t change the meaning. Conditioning on Multiple Beliefs We can add more than one piece of prior knowledge, too, if more than one variable is going to significantly affect the probability. Suppose that it’s July 4th and you live in the United States. From prior experience you know that fireworks are common on the Fourth of July. Given your experience on Earth and the fact that it’s July 4th, the probability of seeing lights in the sky is less unlikely, and even the saucer-shaped object could be related to some fireworks display. You could rewrite this equation as: Taking both these experiences into account, our conditional probability changed from “very low” to “low.” Assuming Prior Beliefs in Practice In statistics, we don’t usually explicitly include a condition for all of our existing experiences, because it can be assumed. For that reason, in this book we won’t include a separate variable for this condition. However, in Bayesian analysis, it’s essential to keep in mind that our understanding of the world is always conditioned on our prior experience in the world. For the rest of this chapter, we’ll keep the “experience on Earth” variable around to remind us of this. Forming a Hypothesis So far we have our data, D (that we have seen a bright light and a saucer-shaped object), and our prior experience, X. In order to explain what you saw, you need to form some kind of hypothesis—a
📄 Page
15
model about how the world works that makes a prediction. Hypotheses can come in many forms. All of our basic beliefs about the world are hypotheses: • If you believe the Earth rotates, you predict the sun will rise and set at certain times. • If you believe that your favorite baseball team is the best, you predict they will win more than the other teams. • If you believe in astrology, you predict that the alignment of the stars will describe people and events. Hypotheses can also be more formal or sophisticated: • A scientist may hypothesize that a certain treatment will slow the growth of cancer. • A quantitative analyst in finance may have a model of how the market will behave. • A deep neural network may predict which images are animals and which ones are plants. All of these examples are hypotheses because they have some way of understanding the world and use that understanding to make a prediction about how the world will behave. When we think of hypotheses in Bayesian statistics, we are usually concerned with how well they predict the data we observe. When you see the evidence and think A UFO!, you are forming a hypothesis. The UFO hypothesis is likely based on countless movies and television shows you’ve seen in your prior experience. We would define our first hypothesis as: H1 = A UFO is in my back yard! But what is this hypothesis predicting? If we think of this situation backward, we might ask, “If there was a UFO in your back yard, what would you expect to see?” And you might answer, “Bright lights and a saucer-shaped object.” Because H1 predicts the data D, when we observe our data given our hypothesis, the probability of the data increases. Formally we write this as: P(D| H1,X) >> P(D| X) This equation says: “The probability of seeing bright lights and a saucer-shaped object in the sky, given my belief that this is a UFO and my prior experience, is much higher [indicated by the double greater-than sign >>] than just seeing bright lights and a saucer-shaped object in the sky without explanation.” Here we’ve used the language of probability to demonstrate that our hypothesis explains the data. Spotting Hypotheses in Everyday Speech It’s easy to see a relationship between our everyday language and probability. Saying something is “surprising,” for example, might be the same as saying it has low-probability data based on our prior experiences. Saying something “makes sense” might indicate we have high-probability data based on our prior experiences. This may seem obvious once pointed out, but the key to probabilistic reasoning is to think carefully about how you interpret data, create hypotheses, and change your beliefs, even in an ordinary, everyday scenario. Without H1, you’d be in a state of confusion because you have no explanation for the data you observed. GATHERING MORE EVIDENCE AND UPDATING YOUR BELIEFS Now you have your data and a hypothesis. However, given your prior experience as a skeptic, that hypothesis still seems pretty outlandish. In order to improve your state of knowledge and draw
📄 Page
16
more reliable conclusions, you need to collect more data. This is the next step in statistical reasoning, as well as in your own intuitive thinking. To collect more data, we need to make more observations. In our scenario, you look out your window to see what you can observe: As you look toward the bright light outside, you notice more lights in the area. You also see that the large saucer-shaped object is held up by wires, and notice a camera crew. You hear a loud clap and someone call out “Cut!” You have, very likely, instantly changed your mind about what you think is happening in this scene. Your inference before was that you might be witnessing a UFO. Now, with this new evidence, you realize it looks more like someone is shooting a movie nearby. With this thought process, your brain has once again performed some sophisticated Bayesian analysis in an instant! Let’s break down what happened in your head in order to reason about events more carefully. You started with your initial hypothesis: H1 = A UFO has landed! In isolation, this hypothesis, given your experience, is extremely unlikely: P(H1 | X) = very, very low However, it was the only useful explanation you could come up with given the data you had available. When you observed additional data, you immediately realized that there’s another possible hypothesis—that a movie is being filmed nearby: H2 = A film is being made outside your window In isolation, the probability of this hypothesis is also intuitively very low (unless you happen to live near a movie studio): P(H2 | X) = very low Notice that we set the probability of H1 as “very, very low” and the probability of H2 as just “very low.” This corresponds to your intuition: if someone came up to you, without any data, and asked, “Which do you think is more likely, a UFO appearing at night in your neighborhood or a movie being filmed next door?” you would say the movie scenario is more likely than a UFO appearance. Now we just need a way to take our new data into account when changing our beliefs. COMPARING HYPOTHESES You first accepted the UFO hypothesis, despite it being unlikely, because you didn’t initially have any other explanation. Now, however, there’s another possible explanation—a movie being filmed—so you have formed an alternate hypothesis. Considering alternate hypotheses is the process of comparing multiple theories using the data you have. When you see the wires, film crew, and additional lights, your data changes. Your updated data are:
📄 Page
17
On observing this extra data, you change your conclusion about what was happening. Let’s break this process down into Bayesian reasoning. Your first hypothesis, H1, gave you a way to explain your data and end your confusion, but with your additional observations H1 no longer explains the data well. We can write this as: P(Dupdated | H1, X) = very, very low You now have a new hypothesis, H2, which explains the data much better, written as follows: P(Dupdated | H2, X) >> P(Dupdated | H1, X) The key here is to understand that we’re comparing how well each of these hypotheses explains the observed data. When we say, “The probability of the data, given the second hypothesis, is much greater than the first,” we’re saying that what we observed is better explained by the second hypothesis. This brings us to the true heart of Bayesian analysis: the test of your beliefs is how well they explain the world. We say that one belief is more accurate than another because it provides a better explanation of the world we observe. Mathematically, we express this idea as the ratio of the two probabilities: When this ratio is a large number, say 1,000, it means “H2 explains the data 1,000 times better than H1.” Because H2 explains the data many times better than another H1, we update our beliefs from H1 to H2. This is exactly what happened when you changed your mind about the likely explanation for what you observed. You now believe that what you’ve seen is a movie being made outside your window, because this is a more likely explanation of all the data you observed. DATA INFORMS BELIEF; BELIEF SHOULD NOT INFORM DATA One final point worth stressing is that the only absolute in all these examples is your data. Your hypotheses change, and your experience in the world, X, may be different from someone else’s, but the data, D, is shared by all. Consider the following two formulas. The first is one we’ve used throughout this chapter: P(D | H,X) which we read as “The probability of the data given my hypotheses and experience in the world,” or more plainly, “How well my beliefs explain what I observe.” But there is a reversal of this, common in everyday thinking, which is: P(H | D,X) We read this as “The probability of my beliefs given the data and my experiences in the world,” or “How well what I observe supports what I believe.” In the first case, we change our beliefs according to data we gather and observations we make about the world that describe it better. In the second case, we gather data to support our existing beliefs. Bayesian thinking is about changing your mind and updating how you understand the
📄 Page
18
world. The data we observe is all that is real, so our beliefs ultimately need to shift until they align with the data. In life, too, your beliefs should always be mutable. As the film crew packs up, you notice that all the vans bear military insignia. The crew takes off their coats to reveal army fatigues and you overhear someone say, “Well, that should have fooled anyone who saw that . . . good thinking.” With this new evidence, your beliefs may shift again! WRAPPING UP Let’s recap what you’ve learned. Your beliefs start with your existing experience of the world, X. When you observe data, D, it either aligns with your experience, P(D | X) = very high, or it surprises you, P(D | X) = very low. To understand the world, you rely on beliefs you have about what you observe, or hypotheses, H. Oftentimes a new hypothesis can help you explain the data that surprises you, P(D | H, X) >> P(D | X). When you gather new data or come up with new ideas, you can create more hypotheses, H1, H2, H3, . . . You update your beliefs when a new hypothesis explains your data much better than your old hypothesis: Finally, you should be far more concerned with data changing your beliefs than with ensuring data supports your beliefs, P(H | D). With these foundations set up, you’re ready to start adding numbers into the mix. In the rest of Part I, you’ll model your beliefs mathematically to precisely determine how and when you should change them. EXERCISES Try answering the following questions to see how well you understand Bayesian reasoning. The solutions can be found at https://nostarch.com/learnbayes/. 1. Rewrite the following statements as equations using the mathematical notation you learned in this chapter: • The probability of rain is low • The probability of rain given that it is cloudy is high • The probability of you having an umbrella given it is raining is much greater than the probability of you having an umbrella in general. 2. Organize the data you observe in the following scenario into a mathematical notation, using the techniques we’ve covered in this chapter. Then come up with a hypothesis to explain this data: You come home from work and notice that your front door is open and the side window is broken. As you walk inside, you immediately notice that your laptop is missing. 3. The following scenario adds data to the previous one. Demonstrate how this new information changes your beliefs and come up with a second hypothesis to explain the data, using the notation you’ve learned in this chapter.
📄 Page
19
A neighborhood child runs up to you and apologizes profusely for accidentally throwing a rock through your window. They claim that they saw the laptop and didn’t want it stolen so they opened the front door to grab it, and your laptop is safe at their house.
📄 Page
20
2 MEASURING UNCERTAINTY In Chapter 1 we looked at some basic reasoning tools we use intuitively to understand how data informs our beliefs. We left a crucial issue unresolved: how can we quantify these tools? In probability theory, rather than describing beliefs with terms like very low and high, we need to assign real numbers to these beliefs. This allows us to create quantitative models of our understanding of the world. With these models, we can see just how much the evidence changes our beliefs, decide when we should change our thinking, and gain a solid understanding of our current state of knowledge. In this chapter, we will apply this concept to quantify the probability of an event. WHAT IS A PROBABILITY? The idea of probability is deeply ingrained in our everyday language. Whenever you say something such as “That seems unlikely!” or “I would be surprised if that’s not the case” or “I’m not sure about that,” you’re making a claim about probability. Probability is a measurement of how strongly we believe things about the world. In the previous chapter we used abstract, qualitative terms to describe our beliefs. To really analyze how we develop and change beliefs, we need to define exactly what a probability is by more formally quantifying P(X)—that is, how strongly we believe in X. We can consider probability an extension of logic. In basic logic we have two values, true and false, which correspond to absolute beliefs. When we say something is true, it means that we are completely certain it is the case. While logic is useful for many problems, very rarely do we believe anything to be absolutely true or absolutely false; there is almost always some level of uncertainty in every decision we make. Probability allows us to extend logic to work with uncertain values between true and false. Computers commonly represent true as 1 and false as 0, and we can use this model with probability as well. P(X) = 0 is the same as saying that X = false, and P(X) = 1 is the same as X = true. Between 0 and 1 we have an infinite range of possible values. A value closer to 0 means we are more certain that something is false, and a value closer to 1 means we’re more certain something is true. It’s worth noting that a value of 0.5 means that we are completely unsure whether something is true or false. Another important part of logic is negation. When we say “not true” we mean false. Likewise, saying “not false” means true. We want probability to work the same way, so we make sure that the
The above is a preview of the first 20 pages. Register to read the complete e-book.
Recommended for You
Loading recommended books...
Failed to load, please try again later