R All-in-One For Dummies (Joseph Schmuller) (Z-Library)

Author: Joseph Schmuller

非小说

A deep dive into the programming language of choice for statistics and data With R All-in-One For Dummies, you get five mini-books in one, offering a complete and thorough resource on the R programming language and a road map for making sense of the sea of data we're all swimming in. Maybe you're pursuing a career in data science, maybe you're looking to infuse a little statistics know-how into your existing career, or maybe you're just R-curious. This book has your back. Along with providing an overview of coding in R and how to work with the language, this book delves into the types of projects and applications R programmers tend to tackle the most. You'll find coverage of statistical analysis, machine learning, and data management with R. - Grasp the basics of the R programming language and write your first lines of code - Understand how R programmers use code to analyze data and perform statistical analysis - Use R to create data visualizations and machine learning programs - Work through sample projects to hone your R coding skill This is an excellent all-in-one resource for beginning coders who'd like to move into the data space by knowing more about R.

📄 File Format: PDF
💾 File Size: 11.8 MB
93
Views
0
Downloads
0.00
Total Donations

📄 Text Preview (First 20 pages)

ℹ️

Registered users can read the full content for free

Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.

📄 Page 1
(This page has no text content)
📄 Page 2
(This page has no text content)
📄 Page 3
R A L L - I N - O N E by Joseph Schmuller
📄 Page 4
R All-in-One For Dummies® Published by: John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030-5774, www.wiley.com Copyright © 2023 by John Wiley & Sons, Inc., Hoboken, New Jersey Published simultaneously in Canada No part of this publication may be reproduced, stored in a retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning or otherwise, except as permitted under Sections 107 or 108 of the 1976 United States Copyright Act, without the prior written permission of the Publisher. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions. Trademarks: Wiley, For Dummies, the Dummies Man logo, Dummies.com, Making Everything Easier, and related trade dress are trademarks or registered trademarks of John Wiley & Sons, Inc. and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book. LIMIT OF LIABILITY/DISCLAIMER OF WARRANTY: WHILE THE PUBLISHER AND AUTHORS HAVE USED THEIR BEST EFFORTS IN PREPARING THIS WORK, THEY MAKE NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE ACCURACY OR COMPLETENESS OF THE CONTENTS OF THIS WORK AND SPECIFICALLY DISCLAIM ALL WARRANTIES, INCLUDING WITHOUT LIMITATION ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. NO WARRANTY MAY BE CREATED OR EXTENDED BY SALES REPRESENTATIVES, WRITTEN SALES MATERIALS OR PROMOTIONAL STATEMENTS FOR THIS WORK. THE FACT THAT AN ORGANIZATION, WEBSITE, OR PRODUCT IS REFERRED TO IN THIS WORK AS A CITATION AND/ OR POTENTIAL SOURCE OF FURTHER INFORMATION DOES NOT MEAN THAT THE PUBLISHER AND AUTHORS ENDORSE THE INFORMATION OR SERVICES THE ORGANIZATION, WEBSITE, OR PRODUCT MAY PROVIDE OR RECOMMENDATIONS IT MAY MAKE. THIS WORK IS SOLD WITH THE UNDERSTANDING THAT THE PUBLISHER IS NOT ENGAGED IN RENDERING PROFESSIONAL SERVICES. THE ADVICE AND STRATEGIES CONTAINED HEREIN MAY NOT BE SUITABLE FOR YOUR SITUATION. YOU SHOULD CONSULT WITH A SPECIALIST WHERE APPROPRIATE. FURTHER, READERS SHOULD BE AWARE THAT WEBSITES LISTED IN THIS WORK MAY HAVE CHANGED OR DISAPPEARED BETWEEN WHEN THIS WORK WAS WRITTEN AND WHEN IT IS READ. NEITHER THE PUBLISHER NOR AUTHORS SHALL BE LIABLE FOR ANY LOSS OF PROFIT OR ANY OTHER COMMERCIAL DAMAGES, INCLUDING BUT NOT LIMITED TO SPECIAL, INCIDENTAL, CONSEQUENTIAL, OR OTHER DAMAGES. For general information on our other products and services, please contact our Customer Care Department within the U.S. at 877-762-2974, outside the U.S. at 317-572-3993, or fax 317-572-4002. For technical support, please visit https://hub.wiley.com/community/support/dummies. Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com. Library of Congress Control Number: 2022950749 ISBN: 978-1-119-98369-9 (pbk); 978-1-119-98370-5 (ebk); 978-1-119-98371-2 (ebk)
📄 Page 5
Contents at a Glance Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 Book 1: Introducing R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 CHAPTER 1: R: What It Does and How It Does It . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 CHAPTER 2: Working with Packages, Importing, and Exporting . . . . . . . . . . . . . . . . . . 37 Book 2: Describing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 CHAPTER 1: Getting Graphic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 CHAPTER 2: Finding Your Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 CHAPTER 3: Deviating from the Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103 CHAPTER 4: Meeting Standards and Standings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113 CHAPTER 5: Summarizing It All . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 CHAPTER 6: What’s Normal? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Book 3: Analyzing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 CHAPTER 1: The Confidence Game: Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 CHAPTER 2: One-Sample Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181 CHAPTER 3: Two-Sample Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207 CHAPTER 4: Testing More than Two Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233 CHAPTER 5: More Complicated Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 CHAPTER 6: Regression: Linear, Multiple, and the General Linear Model . . . . . . . . 279 CHAPTER 7: Correlation: The Rise and Fall of Relationships . . . . . . . . . . . . . . . . . . . . 315 CHAPTER 8: Curvilinear Regression: When Relationships Get Complicated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 CHAPTER 9: In Due Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 CHAPTER 10: Non-Parametric Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 CHAPTER 11: Introducing Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 CHAPTER 12: Probability Meets Regression: Logistic Regression . . . . . . . . . . . . . . . . . 415 Book 4: Learning from Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 CHAPTER 1: Tools and Data for Machine Learning Projects . . . . . . . . . . . . . . . . . . . . 425 CHAPTER 2: Decisions, Decisions, Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 449 CHAPTER 3: Into the Forest, Randomly . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 CHAPTER 4: Support Your Local Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 CHAPTER 5: K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 CHAPTER 6: Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 CHAPTER 7: Exploring Marketing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 CHAPTER 8: From the City That Never Sleeps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 557
📄 Page 6
Book 5: Harnessing R: Some Projects to Keep You Busy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 CHAPTER 1: Working with a Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 CHAPTER 2: Dashboards — How Dashing! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 603 Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
📄 Page 7
Table of Contents v Table of Contents INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 About This All-in-One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Book 1: Introducing R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Book 2: Describing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Book 3: Analyzing Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Book 4: Learning from Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Book 5: Harnessing R: Some Projects to Keep You Busy . . . . . . . . . . 3 What You Can Safely Skip . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Foolish Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Icons Used in This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Beyond This Book . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Where to Go from Here . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 BOOK 1: INTRODUCING R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 CHAPTER 1: R: What It Does and How It Does It . . . . . . . . . . . . . . . . . . . . 7 The Statistical (and Related) Ideas You Just Have to Know . . . . . . . . . . . 7 Samples and populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 Variables: Dependent and independent . . . . . . . . . . . . . . . . . . . . . . . 8 Types of data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 A little probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .10 Inferential statistics: Testing hypotheses . . . . . . . . . . . . . . . . . . . . . .12 Null and alternative hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . . . .12 Two types of error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .13 Getting R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 Getting RStudio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .15 A Session with R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 The working directory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .18 Getting started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .19 R Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .22 User-Defined Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .24 Comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 R Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .25 Numerical vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .26 Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .28 Lists . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .31 Data frames . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .32 for Loops and if Statements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .35
📄 Page 8
vi R All-in-One For Dummies CHAPTER 2: Working with Packages, Importing, and Exporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Installing Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .37 Examining Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39 Heads and tails . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .39 Missing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40 Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .40 R Formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41 More Packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43 Exploring the tidyverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .44 Importing and Exporting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47 Spreadsheets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .48 CSV files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49 Text files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .49 BOOK 2: DESCRIBING DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51 CHAPTER 1: Getting Graphic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Finding Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .53 Graphing a distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .54 Bar-hopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .55 Slicing the pie . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .56 The plot of scatter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .57 Of boxes and whiskers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .58 Doing the Basics: Base R Graphics, That Is . . . . . . . . . . . . . . . . . . . . . . .59 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .60 Graph features . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .61 Bar plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63 Pie graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65 Dot charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .65 Bar plots revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66 Scatter plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .70 Box plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73 Kicking It Up a Notch to ggplot2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .73 Histograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .74 Bar plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .76 Dot charts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .78 Bar plots re-revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .81 Scatter plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .84 Box plots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .88 Putting a Bow On It . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .91 CHAPTER 2: Finding Your Center . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 Means: The Lure of Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .93 Calculating the Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .94
📄 Page 9
Table of Contents vii The Average in R: mean() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .95 What’s your condition? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .95 Eliminate $ signs forthwith() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .96 Explore the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .97 Outliers: The flaw of averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .98 Medians: Caught in the Middle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .99 The Median in R: median() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .100 Statistics à la Mode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101 The Mode in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .101 CHAPTER 3: Deviating from the Average . . . . . . . . . . . . . . . . . . . . . . . . . . 103 Measuring Variation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104 Averaging squared deviations: Variance and how to calculate it . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .104 Sample variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .107 Variance in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108 Back to the Roots: Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . .108 Population standard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .109 Sample standard deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .109 Standard Deviation in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .110 Conditions, conditions, conditions . . . . . . . . . . . . . . . . . . . . . . . . . .110 CHAPTER 4: Meeting Standards and Standings . . . . . . . . . . . . . . . . . . . 113 Catching Some Zs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .114 Characteristics of z-scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .114 Bonds versus the Bambino . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .115 Exam scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .116 Standard Scores in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .116 Where Do You Stand? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .119 Ranking in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .119 Tied scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .119 Nth smallest, Nth largest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .120 Percentiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .120 Percent ranks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .122 Summarizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .123 CHAPTER 5: Summarizing It All . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125 How Many? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .125 The High and the Low . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .127 Living in the Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .127 A teachable moment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .128 Back to descriptives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .129 Skewness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .129 Kurtosis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .131
📄 Page 10
viii R All-in-One For Dummies Tuning in the Frequency . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .133 Nominal variables: table() et al. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134 Numerical variables: hist() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .134 Numerical variables: stem() . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .140 Summarizing a Data Frame . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .142 CHAPTER 6: What’s Normal? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145 Hitting the Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .145 Digging deeper . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .146 Parameters of a normal distribution . . . . . . . . . . . . . . . . . . . . . . . .147 Working with Normal Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .148 Distributions in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149 Normal density function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .149 Cumulative density function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .153 Quantiles of normal distributions . . . . . . . . . . . . . . . . . . . . . . . . . . .156 Random sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .158 Meeting a Distinguished Member of the Family . . . . . . . . . . . . . . . . . .160 The standard normal distribution in R . . . . . . . . . . . . . . . . . . . . . . .161 Plotting the standard normal distribution . . . . . . . . . . . . . . . . . . . .162 BOOK 3: ANALYZING DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 CHAPTER 1: The Confidence Game: Estimation . . . . . . . . . . . . . . . . . . 165 Understanding Sampling Distributions . . . . . . . . . . . . . . . . . . . . . . . . .166 An EXTREMELY Important Idea: The Central Limit Theorem . . . . . . .167 (Approximately) simulating the central limit theorem . . . . . . . . . .168 Predictions of the central limit theorem . . . . . . . . . . . . . . . . . . . . .173 Confidence: It Has Its Limits! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .175 Finding confidence limits for a mean . . . . . . . . . . . . . . . . . . . . . . . .175 Using R to find the confidence limits for a mean . . . . . . . . . . . . . .177 Fit to a t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .178 CHAPTER 2: One-Sample Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . 181 Hypotheses, Tests, and Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .181 Hypothesis Tests and Sampling Distributions . . . . . . . . . . . . . . . . . . . .183 Catching Some Z’s Again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .185 Z Testing in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .188 t for One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .189 t Testing in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .190 Working with t-Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .191 Visualizing t-Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .192 Plotting t in base R graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .193 Plotting t in ggplot2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .194 One more thing about ggplot2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . .199
📄 Page 11
Table of Contents ix Testing a Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .200 Manufacturing an Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .200 Testing in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .201 Working with Chi-Square Distributions . . . . . . . . . . . . . . . . . . . . . . . . .202 Visualizing Chi-Square Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . .203 Plotting chi-square in base R graphics . . . . . . . . . . . . . . . . . . . . . . .203 Plotting chi-square in ggplot2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .205 CHAPTER 3: Two-Sample Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . 207 Hypotheses Built for Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .207 Sampling Distributions Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .208 Applying the central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . . .209 Zs once more . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .211 Z-testing for two samples in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .212 t for Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .214 Like Peas in a Pod: Equal Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . .214 t-Testing in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .216 Working with two vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .216 Working with a data frame and a formula . . . . . . . . . . . . . . . . . . . .216 Visualizing the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .218 Like ps and qs: Unequal variances . . . . . . . . . . . . . . . . . . . . . . . . . .221 A Matched Set: Hypothesis Testing for Paired Samples . . . . . . . . . . .222 Paired Sample t-testing in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .224 Testing Two Variances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .224 F testing in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .226 F in conjunction with t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .227 Working with F Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .227 Visualizing F Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .228 CHAPTER 4: Testing More than Two Samples . . . . . . . . . . . . . . . . . . . . . 233 Testing More than Two . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .233 A thorny problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .234 A solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .235 Meaningful relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .239 ANOVA in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .240 Plotting a boxplot to visualize the data . . . . . . . . . . . . . . . . . . . . . .241 After the ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .242 Contrasts in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .244 Unplanned comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .245 Another Kind of Hypothesis, Another Kind of Test . . . . . . . . . . . . . . . .247 Working with repeated measures ANOVA . . . . . . . . . . . . . . . . . . . .247 Repeated measures ANOVA in R . . . . . . . . . . . . . . . . . . . . . . . . . . . .249 Visualizing the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .251 Getting Trendy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .252 Trend Analysis in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .256
📄 Page 12
x R All-in-One For Dummies CHAPTER 5: More Complicated Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 257 Cracking the Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .257 Interactions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .259 The analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .259 Two-Way ANOVA in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .261 Visualizing the two-way results . . . . . . . . . . . . . . . . . . . . . . . . . . . . .263 Two Kinds of Variables . . . at Once . . . . . . . . . . . . . . . . . . . . . . . . . . . . .265 Mixed ANOVA in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .268 Visualizing the mixed ANOVA results . . . . . . . . . . . . . . . . . . . . . . . .270 After the Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .271 Multivariate Analysis of Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .272 MANOVA in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .273 Visualizing the MANOVA results . . . . . . . . . . . . . . . . . . . . . . . . . . . .275 After the MANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .277 CHAPTER 6: Regression: Linear, Multiple, and the General Linear Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279 The Plot of Scatter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .280 Graphing Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .281 Regression: What a Line! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .283 Using regression for forecasting . . . . . . . . . . . . . . . . . . . . . . . . . . . .285 Variation around the regression line . . . . . . . . . . . . . . . . . . . . . . . .285 Testing hypotheses about regression . . . . . . . . . . . . . . . . . . . . . . .287 Linear Regression in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .292 Features of the linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .294 Making predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .294 Visualizing the scatterplot and regression line . . . . . . . . . . . . . . . .295 Plotting the residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .295 Juggling Many Relationships at Once: Multiple Regression . . . . . . . . .297 Multiple regression in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .299 Making predictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .300 Visualizing the 3d scatterplot and regression plane . . . . . . . . . . .300 ANOVA: Another Look . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .303 Analysis of Covariance: The Final Component of the GLM . . . . . . . . .307 But Wait — There’s More . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .313 CHAPTER 7: Correlation: The Rise and Fall of Relationships . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315 Understanding Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .315 Correlation and Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .318 Testing Hypotheses about Correlation . . . . . . . . . . . . . . . . . . . . . . . . .321 Is a correlation coefficient greater than zero? . . . . . . . . . . . . . . . . .321 Do two correlation coefficients differ? . . . . . . . . . . . . . . . . . . . . . . .322
📄 Page 13
Table of Contents xi Correlation in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .324 Calculating a correlation coefficient . . . . . . . . . . . . . . . . . . . . . . . . .324 Testing a correlation coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . .324 Testing the difference between two correlation coefficients . . . .325 Calculating a correlation matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . .326 Visualizing correlation matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . .326 Multiple Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .328 Multiple correlation in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .329 Adjusting R-squared . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .330 Partial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .331 Partial Correlation in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 332 Semipartial Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .333 Semipartial Correlation in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333 CHAPTER 8: Curvilinear Regression: When Relationships Get Complicated . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 335 What Is a Logarithm? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .336 What Is e? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .338 Power Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .341 Exponential Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .346 Logarithmic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .351 Polynomial Regression: A Higher Power . . . . . . . . . . . . . . . . . . . . . . . .354 Which Model Should You Use? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .357 CHAPTER 9: In Due Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 359 A Time Series and Its Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . .359 Forecasting: A Moving Experience . . . . . . . . . . . . . . . . . . . . . . . . . . . . .363 Forecasting: Another Way . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .366 Working with Real Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .368 CHAPTER 10: Non-Parametric Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371 Independent Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .372 Two samples: Wilcoxon rank-sum test . . . . . . . . . . . . . . . . . . . . . . .372 More than two samples: Kruskal-Wallis One-Way ANOVA . . . . . .376 Matched Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .378 Two samples: Wilcoxon matched-pairs signed ranks . . . . . . . . . .379 More than two samples: Friedman two-way ANOVA . . . . . . . . . . .380 More than two samples: Cochran’s Q . . . . . . . . . . . . . . . . . . . . . . . .383 Correlation: Spearman’s rS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .386 Correlation: Kendall’s Tau . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .388 A Heads-Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .391
📄 Page 14
xii R All-in-One For Dummies CHAPTER 11: Introducing Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 393 What Is Probability? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .393 Experiments, trials, events, and sample spaces . . . . . . . . . . . . . . .394 Sample spaces and probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . .394 Compound Events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .395 Union and intersection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .395 Intersection, again . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .396 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .397 Working with the probabilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . .398 The foundation of hypothesis testing . . . . . . . . . . . . . . . . . . . . . . . .398 Large Sample Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .398 Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .399 Combinations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .400 R Functions for Counting Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .401 Random Variables: Discrete and Continuous . . . . . . . . . . . . . . . . . . . .403 Probability Distributions and Density Functions . . . . . . . . . . . . . . . . .403 The Binomial Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .406 The Binomial and Negative Binomial in R . . . . . . . . . . . . . . . . . . . . . . .407 Binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .407 Negative binomial distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . .409 Hypothesis Testing with the Binomial Distribution . . . . . . . . . . . . . . .410 More on Hypothesis Testing: R versus Tradition . . . . . . . . . . . . . . . . .412 CHAPTER 12: Probability Meets Regression: Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Getting the Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .418 Doing the Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .418 Visualizing the Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .421 BOOK 4: LEARNING FROM DATA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 423 CHAPTER 1: Tools and Data for Machine Learning Projects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425 The UCI (University of California-Irvine) ML Repository . . . . . . . . . . . .426 Working with a UCI dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .426 Cleaning up the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .429 Exploring the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .431 Exploring relationships in the data . . . . . . . . . . . . . . . . . . . . . . . . . .432 Introducing the Rattle package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .438 Using Rattle with iris . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .442 Getting and (further) exploring the data . . . . . . . . . . . . . . . . . . . . .442 Finding clusters in the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .445
📄 Page 15
Table of Contents xiii CHAPTER 2: Decisions, Decisions, Decisions . . . . . . . . . . . . . . . . . . . . . . 449 Decision Tree Components . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .449 Roots and leaves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .450 Tree construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .451 Decision Trees in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .451 Growing the tree in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .452 Drawing the tree in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .453 Decision Trees in Rattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .455 Creating the tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .456 Drawing the tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .457 Evaluating the tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .458 Project: A More Complex Decision Tree . . . . . . . . . . . . . . . . . . . . . . . . .459 The data: Car evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .459 Data exploration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .461 Building and drawing the tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .462 Evaluating the tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .463 Quick suggested project: Understanding the complexity parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .464 Suggested Project: Titanic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .465 CHAPTER 3: Into the Forest, Randomly . . . . . . . . . . . . . . . . . . . . . . . . . . . . 467 Growing a Random Forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .467 Random Forests in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .469 Building the forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .469 Evaluating the forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .470 A closer look . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .471 Plotting error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .473 Plotting importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .475 Project: Identifying Glass . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .476 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .476 Getting the data into Rattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .477 Exploring the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .478 Growing the random forest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .480 Visualizing the results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .480 Suggested Project: Identifying Mushrooms . . . . . . . . . . . . . . . . . . . . . .482 CHAPTER 4: Support Your Local Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . 483 Some Data to Work With . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .483 Using a subset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .484 Defining a boundary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .484 Understanding support vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . .485 Separability: It’s Usually Nonlinear . . . . . . . . . . . . . . . . . . . . . . . . . . . . .486 Support Vector Machines in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .489 Working with e1071 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .489 Working with kernlab . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .494
📄 Page 16
xiv R All-in-One For Dummies Project: House Parties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .496 Reading in the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .497 Exploring the data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .499 Creating the SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .500 Evaluating the SVM . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .502 CHAPTER 5: K-Means Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 503 How It Works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .503 K-Means Clustering in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .505 Setting up and analyzing the data . . . . . . . . . . . . . . . . . . . . . . . . . . .505 Understanding the output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .506 Visualizing the clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .508 Finding the optimum number of clusters . . . . . . . . . . . . . . . . . . . .508 Quick suggested project: Adding the sepals . . . . . . . . . . . . . . . . . .513 Project: Glass Clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .514 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .514 Starting Rattle and exploring the data . . . . . . . . . . . . . . . . . . . . . .515 Preparing to cluster . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .516 Doing the clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .516 Going beyond Rattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .517 CHAPTER 6: Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 519 Networks in the Nervous System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .519 Artificial Neural Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .520 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .520 Input layer and hidden layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .521 Output layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .522 How it all works . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .523 Neural Networks in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .523 Building a neural network for the iris data frame . . . . . . . . . . . . .523 Plotting the network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .525 Evaluating the network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .526 Quick suggested project: Those sepals . . . . . . . . . . . . . . . . . . . . . .527 Project: Banknotes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .527 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .527 Taking a quick look ahead . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .528 Setting up Rattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .529 Evaluating the network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .531 Going beyond Rattle: Visualizing the network . . . . . . . . . . . . . . .531 Suggested Projects: Rattling Around . . . . . . . . . . . . . . . . . . . . . . . . . . .533 CHAPTER 7: Exploring Marketing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537 Analyzing Retail Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .537 The data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .538 RFM in R . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .539
📄 Page 17
Table of Contents xv Enter Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .546 Working with k-means clustering . . . . . . . . . . . . . . . . . . . . . . . . . . .547 Working with Rattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .548 Digging into the clusters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .550 The clusters and the classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .552 Quick suggested project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .553 Suggested Project: Another Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . .553 CHAPTER 8: From the City That Never Sleeps . . . . . . . . . . . . . . . . . . . . . 557 Examining the Data Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .557 Warming Up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .558 Glimpsing and viewing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .558 Piping, filtering, and grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .559 Visualizing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .561 Joining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .562 Quick Suggested Project: Airline Names . . . . . . . . . . . . . . . . . . . . . . . .565 Suggested Project: Departure Delays . . . . . . . . . . . . . . . . . . . . . . . . . . .565 Adding a variable: weekday . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .565 Quick Suggested Project: Analyze Weekday Differences . . . . . . . . . . .566 Delay, weekday, and airport . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .566 Delay and flight duration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .570 Suggested Project: Delay and Weather . . . . . . . . . . . . . . . . . . . . . . . . .572 BOOK 5: HARNESSING R: SOME PROJECTS TO KEEP YOU BUSY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 573 CHAPTER 1: Working with a Browser . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 575 Getting Your Shine On . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .575 Creating Your First shiny Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .576 The user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .579 The server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .580 Final steps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .581 Getting reactive . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .582 Working with ggplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .585 Changing the server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .586 A few more changes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .588 Getting reactive with ggplot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .590 Another shiny Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .592 The base R version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .593 The ggplot version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .600 Suggested Project . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .602
📄 Page 18
xvi R All-in-One For Dummies CHAPTER 2: Dashboards — How Dashing! . . . . . . . . . . . . . . . . . . . . . . . . . 603 The shinydashboard Package . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .603 Exploring Dashboard Layouts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .604 Getting started with the user interface . . . . . . . . . . . . . . . . . . . . . .605 Building the user interface: Boxes, boxes, boxes . . . . . . . . . . . . . .605 Lining up in columns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .613 A nice trick: Keeping tabs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .616 Suggested project: Add statistics . . . . . . . . . . . . . . . . . . . . . . . . . . .620 Suggested project: Place valueBoxes in tabPanels . . . . . . . . . . . . .621 Working with the Sidebar . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .622 The user interface . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .624 The server . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .626 Suggested project: Relocate the slider . . . . . . . . . . . . . . . . . . . . . . .629 Interacting with Graphics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .630 Clicks, double-clicks, and brushes — oh, my! . . . . . . . . . . . . . . . . .630 Why bother with all this? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .634 Suggested project: Experiment with airquality . . . . . . . . . . . . . . . .636 INDEX . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639
📄 Page 19
Introduction 1 Introduction In this book, I’ve brought together all the information you need to hit the ground running with R.  It’s heavy on statistics, of course, because R’s creators built this language to analyze data. So it’s necessary that you learn the foundations of statistics. Let me tell you at the outset: This All-in-One is not a cookbook. I’ve never taught statistics that way and I never will. Before I show you how to use R to work with a statistical concept, I give you a strong grounding in what that concept is all about. In fact, Books 2 and 3 of this 5-book compendium are something like an introduc- tory statistics text that happens to use R as a way of explaining statistical ideas. Book 4 follows that path by teaching the ideas behind machine learning before you learn how to use R to implement them. Book 5 gives you a set of projects that give you a chance to exercise your newly minted R skill set. Want some more details? Read on. About This All-in-One The volume you’re holding (or the e-book you’re viewing) consists of five books that cover a lot of the length and breadth of R. Book 1: Introducing R As I said earlier in this introduction, R is a language that deals with statistics. Accordingly, Book 1 introduces you to the fundamental concepts of statistics that you just have to know in order to progress with R. You then learn about R and RStudio, a widely used development environment for working with R. I begin by describing the rudiments of R code, and I discuss R functions and structures.
📄 Page 20
2 R All-in-One For Dummies R truly comes alive when you use its specialized packages, which you learn about early on. Book 2: Describing Data Part of working with statistics is to summarize data in meaningful ways. In Book 2, you find out how to do just that. Most people know about averages and how to compute them. But that’s not the whole story. In Book 2, I tell you about additional descriptive statistics that fill in the gaps, and I show you how to use R to calculate and work with those statistics. You also learn to create graphics that visualize the data descriptions and analyses you encounter in Books 2 and 3. Book 3: Analyzing Data Book 3 addresses the fundamental aim of statistical analysis: to go beyond the data and help you make decisions. Usually, the data are measurements of a sample taken from a large population. The goal is to use these data to figure out what’s going on in the population. This opens a wide range of questions: What does an average mean? What does the difference between two averages mean? Are two things associated? These are only a few of the questions I address in Book 3, and you learn to use the R tools that help you answer them. Book 4: Learning from Data Effective machine learning model creation comes with experience. Accordingly, in Book 4 you gain experience by completing machine learning projects. In addition to the projects you complete along with me, I suggest additional projects for you to try on your own. I begin by telling you about the University of California-Irvine Machine Learning Repository, which provides the data sets for most of the projects you encounter in Book 4. To give you a gentle on-ramp into the field, I show you the Rattle package for creating machine learning applications. It’s a friendly interface to R’s machine learning functionality. I like Rattle a lot, and I think you will, too. You use it to learn about and work with decision trees, random forests, support vector machines, k-means clustering, and neural networks.
The above is a preview of the first 20 pages. Register to read the complete e-book.

💝 Support Author

0.00
Total Amount (¥)
0
Donation Count

Login to support the author

Login Now
Back to List