Regression Analysis in R (Jocelyn E. Bolin) (z-library.sk, 1lib.sk, z-lib.sk)

(This page has no text content)

Regression Analysis in R Regression Analysis in R: A Comprehensive View for the Social Sciences covers the basic applications of multiple linear regression all the way through to more complex regression applications and extensions. Written for graduate level students of social science disciplines this book walks readers through bivariate correlation giving them a solid framework from which to expand into more complicated regression models. Concepts are demonstrated using real data examples and R software without assuming prior familiarity with R. • Comprehensive treatment of most common multiple regression applications for researchers in the social sciences. • Application-based presentation of R code. • Brief primer on R for the unfamiliar user complete with tips for troubleshooting. • End of chapter exercises to check your understanding. Jocelyn H. Bolin is a professor in the Department of Educational Psychology at Ball State University, where she teaches courses on introductory and intermediate statistics, multiple regression analysis, and multilevel modeling to graduate students in social science disciplines. She earned a PhD in educational psychology from Indiana University Bloomington. Her research interests include statistical methods for classification and clustering and the use of multilevel modeling in the social sciences.

Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences Series Series Editors Jeff Gill, Steven Heeringa, Wim J. van der Linden, Tom Snijders Recently Published Titles Big Data and Social Science: Data Science Methods and Tools for Research and Practice, Second Edition Ian Foster, Rayid Ghani, Ron S. Jarmin, Frauke Kreuter and Julia Lane Understanding Elections through Statistics: Polling, Prediction, and Testing Ole J. Forsberg Analyzing Spatial Models of Choice and Judgment, Second Edition David A. Armstrong II, Ryan Bakker, Royce Carroll, Christopher Hare, Keith T. Poole and Howard Rosenthal Introduction to R for Social Scientists: A Tidy Programming Approach Ryan Kennedy and Philip Waggoner Linear Regression Models: Applications in R John P. Hoffman Mixed-Mode Surveys: Design and Analysis Jan van den Brakel, Bart Buelens, Madelon Cremers, Annemieke Luiten, Vivian Meertens, Barry Schouten and Rachel Vis-Visschers Applied Regularization Methods for the Social Sciences Holmes Finch An Introduction to the Rasch Model with Examples in R Rudolf Debelak, Carolin Stobl and Matthew D. Zeigenfuse Regression Analysis in R: A Comprehensive View for the Social Sciences Jocelyn H. Bolin Analysis of Intra-Individual Variation: Systems Approaches to Human Process Analysis Kathleen M. Gates, Sy-Min Chow, and Peter C. M. Molenaar Applied Regression Modeling: Bayesian and Frequentist Analysis of Categorical and Limited Response Variables with R and Stan Jun Xu For more information about this series, please visit: https://www.routledge.com/Chapman-- HallCRC-Statistics-in-the-Social-and-Behavioral-Sciences/book-series/CHSTSOBESCI

Regression Analysis in R A Comprehensive View for the Social Sciences Jocelyn H. Bolin

First edition published 2023 by CRC Press 6000 Broken Sound Parkway NW, Suite 300, Boca Raton, FL 33487-2742 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN CRC Press is an imprint of Taylor & Francis Group, LLC © 2023 Taylor & Francis Group, LLC Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publica- tion and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, trans- mitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www .copyright .com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750- 8400. For works that are not available on CCC please contact mpkbookspermissions @tandf .co .uk Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. Library of Congress Cataloging‑in‑Publication Data Names: Bolin, Jocelyn H., author. Title: Regression analysis in R : a comprehensive view for the social sciences / Jocelyn H. Bolin. Description: First edition. | Boca Raton : CRC Press, [2022] | Series: Chapman & Hall CRC statistics in social and behavioral sciences | Includes bibliographical references and index. Identifiers: LCCN 2022003683 (print) | LCCN 2022003684 (ebook) | ISBN 9780367272586 (pbk) | ISBN 9781032257754 (hbk) | ISBN 9780429295843 (ebk) Subjects: LCSH: Regression analysis. | Social sciences--Statistics. Classification: LCC HA31.3 .B54 2022 (print) | LCC HA31.3 (ebook) | DDC 519.5/36--dc23/eng/20220414 LC record available at https://lccn.loc.gov/2022003683 LC ebook record available at https://lccn.loc.gov/2022003684 ISBN: 978-1-032-25775-4 (hbk) ISBN: 978-0-367-27258-6 (pbk) ISBN: 978-0-429-29584-3 (ebk) DOI: 10.1201/9780429295843 Typeset in Minion by Deanta Global Publishing Services, Chennai, India

v Contents Acknowledgments, xi Chapter 1 ◾ Introduction 1 CONTEXTUALIZING CORRELATION AND REGRESSION ANALYSIS 3 REGRESSION AS PREDICTION 3 REGRESSION AS EXPLANATION 3 CORRELATION, REGRESSION, AND CAUSATION 4 OVERVIEW OF THIS BOOK 5 REFERENCE 6 Chapter 2 ◾ Correlation 7 VISUALIZING RELATIONSHIPS 7 UNDERSTANDING COVARIATION 9 SIMPLE LINEAR RELATIONSHIPS: THE PEARSON PRODUCT MOMENT CORRELATION COEFFICIENT 10 SIGNIFICANCE TESTING FOR THE PEARSON R 11 ASSUMPTIONS OF THE PEARSON R 14 ALTERNATIVE CORRELATIONS: KENDALL TAU AND SPEARMAN RHO 16 The Spearman Rho 17 The Kendall Tau 18 CORRELATION USING R 20 Correlation Using {stats} Package 20

vi ◾ Contents Correlation Using {Hmisc} Package 21 Matrix Scatterplots Using {Performance Analytics} Package 22 CHAPTER SUMMARY 23 REFERENCES 24 CHAPTER 2: END OF CHAPTER EXERCISES 25 Chapter 3 ◾ Simple and Multiple Regression 27 SIMPLE LINEAR REGRESSION 27 Ordinary Least Squares (OLS) Regression 28 The Linear Regression Equation 28 Regression Model Fit 30 Multiple R 31 R2 and Adjusted R2 31 Standard Error of the Estimate 32 Multiple Regression Analysis 33 OLS Regression Using lm() 35 SUMMARY 38 CHAPTER 3: END OF CHAPTER EXERCISES 39 Chapter 4 ◾ Assumptions of Multiple Regression 41 STATISTICAL ASSUMPTIONS OF MULTIPLE REGRESSION 41 THEORETICAL ASSUMPTIONS OR ‘INTERPRETATIONAL CONSIDERATIONS’ 42 The Regression Model Is Theoretically Sound 43 Restriction of Range 43 Absence of Multicollinearity 43 CHECKING ASSUMPTIONS OF MULTIPLE REGRESSION USING R SOFTWARE 44 CHAPTER 4: END OF CHAPTER EXERCISES 54 Chapter 5 ◾ Dummy Variables and Interactions 55 CATEGORICAL VARIABLES IN REGRESSION 55

Contents    ◾   vii DUMMY VARIABLES 57 A Note on the ‘0 0’ Category 58 USING/INTERPRETING DUMMY VARIABLES IN A REGRESSION MODEL 58 INTERACTION EFFECTS IN REGRESSION MODELS 61 A NOTE ON INCLUDING MAIN EFFECTS AND CENTERING FOR PRODUCTS 65 Centering Predictors Using R 65 CHAPTER SUMMARY 67 CHAPTER 5: END OF CHAPTER EXERCISES 68 Chapter 6 ◾ Regression vs. ANOVA? 69 ANALYSIS OF VARIANCE 69 ANOVA AS REGRESSION 71 ANOVA OR REGRESSION? 75 Chapter 7 ◾ Model Comparisons and Hierarchical Regression 79 WHY COMPARE MODELS? 79 WHAT DOES IT MEAN FOR MODELS TO BE NESTED? 81 MODEL COMPARISONS FOR NESTED AND NON-NESTED MODELS 81 Comparisons of Non-Nested Models 82 R Example of Non-Nested Model Comparison 82 COMPARISONS OF NESTED MODELS 84 Types of Nested Model Comparison 85 CHAPTER SUMMARY 90 CHAPTER 7: END OF CHAPTER EXERCISES 92 Chapter 8 ◾ Moderation/Mediation and Regression Discontinuity 93 EXTENSION 1: MODERATION 93 EXTENSION 2: REGRESSION DISCONTINUITY 96 Motivating Example 96

viii ◾ Contents Interpreting Treatment Effects in Regression Discontinuity Design 97 *A Note on the Terminology 97 EXTENSION 3: MEDIATION 102 Baron and Kenny (1986) Requirements for Testing Mediation 103 Tests of Significance for the Indirect Effect 106 END OF CHAPTER SUMMARY 108 RECOMMENDED RESOURCES 108 CHAPTER 8: END OF CHAPTER EXERCISES 109 Chapter 9 ◾ Non-Linearity and Cross-Validation 111 EXTENSION 4: NON-LINEARITY 111 Variable Transformations for Non-Linearity 112 Transformation Selection 112 What to Do with Negative Values? 113 Pros and Cons to the Transformation Approach 113 Use of Non-Linear Terms 116 Watch out for Multicollinearity! 117 Pros and Cons to the Use of Non-Linear Terms 119 EXTENSION 5: CROSS-VALIDATION 120 Cross-Validation Samples 121 Cross-Validation Procedures 121 END OF CHAPTER SUMMARY 125 CHAPTER 9: END OF CHAPTER EXERCISES 126 Chapter 10 ◾ Nested Data 127 FIXED EFFECTS MODELING 128 HIERARCHICAL LINEAR MODELING 130 Random Effects and the Tau Matrix 133 HLM Using R Software 134 CONCLUDING COMMENTS ON HIERARCHICAL LINEAR MODELING 138 SUMMARY 138

Contents    ◾   ix RECOMMENDED RESOURCES 139 CHAPTER 10: END OF CHAPTER EXERCISES 140 APPENDIX A: INTRODUCTION TO R, 141 APPENDIX B: NON-PARAMETRIC ANALYSIS BASED ON RANKS, 151 APPENDIX C: R FUNCTION AND PACKAGE INDEX, 155 APPENDIX D: END OF CHAPTER EXERCISE SCRIPT FILE SOLUTIONS, 159 APPENDIX E: GLOSSARY, 169 INDEX, 177

(This page has no text content)

ix Acknowledgments To my amazing graduate assistants: The students really have surpassed the teacher. Without you I could never have finished this book! To my two beautiful sons: You are my world! But without you this book would have been finished much sooner ☺

(This page has no text content)

1 C h a p t e r 1 Introduction In the most general of terms, the purpose of most academic research is to better understand the world. Sometimes the goal is to determine if certain factors can be manipulated in order to produce desirable outcomes. This is generally the goal of experimental research. The classic example that comes to mind is the randomized controlled drug trial to determine whether new medications should be mass produced. Often, however, the goal is to understand the relationships between factors already existing in the world. In these cases, most often, manipulation and control over factors are not possible. Instead, the goal is to explain relationships between quantities and characteristics as they naturally occur to better understand the world and then potentially use these relationships to predict future performance. Consider the following scenarios. Study 1: Correlates of Academic Test Anxiety (TestAnxiety) Academic test anxiety has been found to significantly impact aca- demic performance. A study of 363 undergraduate students exam- ined the relationship between academic test anxiety, perfectionism, and academic performance. Variables under study included demo- graphic characteristics (age, gender, minority status), undergraduate GPA, mathematics and verbal GRE, physical responses to stress, per- ceived test threat, study skills, and a four-factor measure of perfec- tionism. With a better understanding of the relations between these constructs it was hoped that solutions could be presented to help stu- dents with academic anxiety. Regression Analysis in R Introduction DOI: 10.1201/9780429295843-1 10.1201/9780429295843-1

2 ◾ Regression Analysis in R Study 2: Mask Attitudes (Mask) During the COVID-19 pandemic, mask wearing became a very central issue. The CDC advised mask wearing in order to help end the health crisis and protect especially vulnerable populations. Yet, many Americans resisted this advice. A study of 156 undergraduate students aimed to better understand people’s attitudes toward mask wearing and how personality characteristics and health diagnoses may impact these attitudes. With understanding of the predictors of mask attitudes and mask anxiety potentially better recommen- dations can be made to help increase mask compliance for future situations. Study 3: Understanding Academic Dishonesty: (Cheating) This study aimed to disentangle serious planned cheating offenses from cheating due to momentary panic. Data were collected on a sample of 155 undergraduate students majoring in Business from a large public university. Particular attention was given to the fre- quency and severity of and justification for cheating. The researchers also wondered if the severity of cheating offense might be different depending on whether the cheating was planned or due to panic. The goal of this project was to better understand the motives for cheat- ing but to also better categorize cheating offenses in order to inform remediation and consequences. So, what do these studies have in common? Although these three studies are from very different research areas, their goals can all be similarly aligned. All three studies aim to best explain an outcome of interest through a particular lens or theoretical framework. In a best- case scenario, each of these studies would be interested in the ability to predict the outcome from the characteristics measured. Wouldn’t it be great if we could reliably predict academic achievement from demo- graphic characteristics and knowledge of the level of test anxiety a stu- dent experiences? And wouldn’t it be great if we could reliably predict an individual’s attitude toward mask wearing from their age, personal- ity characteristics, and whether they have hearing or vision loss? In a perfect world, these are the goals of using statistical methods to assess relationships.

Introduction     ◾   3 CONTEXTUALIZING CORRELATION AND REGRESSION ANALYSIS Correlation and regression methodology allows the researcher to assess relationships between variables. This book will begin by looking at cor- relation analysis. Correlation analysis is the simplest way of assessing the relationship between two continuous variables. For example, we could use a simple correlation to assess the degree and type of relationship between cognitive test anxiety and academic performance. Correlation analysis will provide the jumping off point for our introduction to regression anal- ysis. Regression analysis is a natural extension of correlation which uses the relationships between variables to allow one variable to be predicted by another variable (or variables). In such a way, regression analysis is not just an analysis of relationship, but also of prediction and explanation. REGRESSION AS PREDICTION As will be detailed more in Chapter 2, regression is a predictive analysis. Using a regression analysis, a researcher will create an equation that can be used to predict the desired outcome from a set of measured variables. For example, a regression analysis could be used to create a model predict- ing math SAT from a student’s level of test anxiety, the student’s gender, and the student’s level of study skills. This model could easily be used in the future to predict math achievement for any student given their gen- der, level of study skills, and level of test anxiety are known. Although regression analysis can easily provide the information to make such pre- dictions, in the social sciences it is somewhat rare for regression to truly be used for a predictive purpose. In order for a predictive model to be of use in making actual predictions, the predictions it provides need to be rela- tively accurate. Unfortunately, in the social sciences, due to the difficulty of measurement and complexity of constructs, it is often difficult to create a regression model of high enough predictive accuracy to be useful. This is not to say that it cannot be done, but rather that it cannot be done often. REGRESSION AS EXPLANATION In the social sciences, it is far more common to use regression for a second purpose; that of construct explanation. As mentioned in the previous sec- tion, it is not often that regression models are of high enough accuracy for use as an actual predictive model. That is not to say, however, that models

4 ◾ Regression Analysis in R of lower predictive accuracy cannot be informative. Even models with rea- sonably low predictive accuracy can provide useful information regarding the theoretical structure of construct relations. It is extremely important, however, knowing the limitations of data collection and the complexities of social constructs, to remember that an explanatory regression model is only as strong as the theory it is based on. It is entirely possible to cre- ate a regression model that has a high degree of predictive accuracy and appears to have strong explanatory power but may not be meaningful in the context of theory and real-world explanation. CORRELATION, REGRESSION, AND CAUSATION Generally speaking, correlation and regression analysis are not often capable of providing causal statements. The common cry is often heard when first learning correlation and regression analysis, ‘Correlation does not mean causation!’. This is generally very good advice. Correlation and regression are generally thought of as quasi experimental designs, or designs where random assignment of levels of the independent variable is not possible.* Random assignment is a necessary condition for causal statements to be made. Random assignment helps ensure that indepen- dent variable groups begin on an even playing field and are not conflated with nuisance or confounding variables. Some variables, however, are nat- urally unable to be randomly assigned. Take self-esteem as an example. A researcher cannot simply collect a sample of participants and then tell each participant what their level of self-esteem will be. Rather, the researcher must work with the existing level of self-esteem of each participant. This opens the study design up to internal validity threats that make interpre- tation more difficult. When a third variable (or more) is uncontrolled, it is impossible to disentangle whether the relationship observed is due to an actual true relationship between the two variables of interest or whether it is due to the third variable. Also of concern is the directionality of the relationship between vari- ables. Demonstrating that two variables are related does not imply a causal direction between the two. If, for example, a relationship is found between motivation and school absences, is it lack of motivation that caused the absences? Or repeated absence causing a lack of motivation? * It should be noted that sometimes it is possible to use a randomly assigned independent variable in a regression context. If this is the case then yes, causal statements may be possible. Since this is more the exception than the rule in correlation and regression analyses, it is best to proceed from a more conservative stance on causal statements.

Introduction     ◾   5 Or if a relationship is found between the age a student took algebra and SAT scores, this does not imply that all students should be put in algebra as young as possible. The take home point, here, is that correlational research can be extremely informative but the researcher does need to be careful not to extend the results beyond what they can actually do. Correlation and regression indi- cate when relationships are present and allow the researcher to describe them. If further speculation is to be done regarding the mechanisms behind these relationships, the researcher must turn to theory and potentially more advanced statistical techniques (see the final chapter of this text). OVERVIEW OF THIS BOOK This book will begin the conversation about relationships with Chapter 2 on bivariate correlation. This provides a good starting point for the con- versation as the study of relationships generally extends out of this basic framework. Chapter 3 will extend the concept of relationships into predic- tion and explanation by introducing simple linear regression and multiple regression. This will allow us to get our feet wet with regression concepts and interpretation before extending the discussion to interpretational issues. Chapter 4 will continue the discussion with multiple regression assumptions, and diagnostic statistics encountered in multiple regres- sion. At this point, readers will have learned the basic multiple regres- sion framework. The rest of the book is devoted to customization options and related methodologies. Chapter 5 extends into the use of categorical predictors and interaction effects allowing for more complex designs to be investigated. Following this discussion, Chapter 6 will briefly move away from regression methodology for a discussion of when to use regres- sion versus when to use ANOVA. Chapter 7 will return to the regression framework and introduce the notion of comparisons among regression models and using hierarchical regression for systematic comparison. The last three chapters will combine the knowledge from the previous chapters to illustrate several extensions of the multiple regression model to differ- ent types of research questions and data types. Mediation, moderation, regression discontinuity designs, cross-validation, and methods for nested data will all be discussed. Throughout the text, examples and syntax walkthrough will be pro- vided using R software. Examples and syntax will be presented for the introductory or causal user of R. No prior familiarity with R is assumed. A primer on basic R use is included in Appendix A for the interested reader.

6 ◾ Regression Analysis in R REFERENCE Stone, T. H., Kisamore, J. L., Jawahar, I. M., & Bolin, J. H. (2014). Making our measures match perceptions: Do severity and type matter when assess- ing academic misconduct offenses? Journal of the Academy of Ethics. DOI 10.1007/s10805-014-9216-0.

7 C h a p t e r 2 Correlation Chapter 1 described a study about understanding variables affect- ing academic test anxiety. As a researcher examining the variables affecting academic test anxiety, the following questions may surface: is there a relationship between test anxiety and academic performance? What kind of relationships exist between physical symptoms of anxiety and academic performance? Does higher GPA lead to a lower fear of exam taking? Each of these questions can easily be addressed through the use of correlations. Correlation is a measure of the relationship between two variables. The present chapter will describe several common methods for assessing relationships between variables and provide the tools necessary to analyze relationships using R software. VISUALIZING RELATIONSHIPS When considering the relationship between two variables, the easiest way to begin is to look at the relationship visually. A relationship can easily be represented visually through the use of a scatterplot. A scatterplot places one variable on the x axis and one variable on the y axis and each point rep- resents a specific case. This allows the researcher to see the general trend of what happens to one variable as the other variable increases/decreases. For example, Figure 2.1a shows an example of a positive relationship. As one variable, X, increases, the other variable, Y, also increases. Examples of positive relationships might include the relationship between math SAT scores and verbal SAT scores (the higher the math SAT, the higher the ver- bal SAT also tends to be), the correlation between ambition and stress tol- erance (the more ambitious the individual, the more likely they are to be Regression Analysis in R Correlation DOI: 10.1201/9780429295843-2 10.1201/9780429295843-2

Statistics

Uploader

Regression Analysis in R (Jocelyn E. Bolin) (z-library.sk, 1lib.sk, z-lib.sk)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Recommended for You

Statistics

Uploader

Regression Analysis in R (Jocelyn E. Bolin) (z-library.sk, 1lib.sk, z-lib.sk)

Tags

Text Preview (First 20 pages)

Registered users can read the full content for free

Comments 0

Reply to Comment

Edit Comment

Recommended for You