Multilevel Modeling Using R 3 (Finch W. Holmes Bolin Jocelyn E.) (z-library.sk, 1lib.sk, z-lib.sk)
其他Author:Finch W. Holmes & Bolin Jocelyn E.
No description
Tags
Support Statistics
¥.00 ·
0times
Text Preview (First 20 pages)
Registered users can read the full content for free
Register as a Gaohf Library member to read the complete e-book online for free and enjoy a better reading experience.
Page
1
(This page has no text content)
Page
2
Multilevel Modeling Using R Like its bestselling predecessor, Multilevel Modeling Using R, Third Edition provides the reader with a helpful guide to conducting multilevel data modeling using the R software environment. After reviewing standard linear models, the authors present the basics of multilevel models and explain how to fit these models using R. They then show how to employ multilevel modeling with longitudinal data and demonstrate the valuable graphical options in R. The book also describes models for categorical dependent variables in both single-level and multilevel data. The third edition of the book includes several new topics that were not present in the second edition. Specifically, a new chapter has been included, focusing on fitting multilevel latent variable modeling in the R environment. With R, it is possible to fit a variety of latent variable models in the multilevel context, including factor analysis, structural models, item response theory, and latent class models. The third edition also includes new sections in Chapter 11 describing two useful alternatives to standard multilevel models, fixed effects models and generalized estimating equations. These approaches are particularly useful with small samples and when the researcher is interested in modeling the correlation structure within higher-level units (e.g., schools). The third edition also includes a new section on mediation modeling in the multilevel context in Chapter 11. This thoroughly updated revision gives the reader state-of-the-art tools to launch their own investigations in multilevel modeling and gain insight into their research.
Page
3
Chapman & Hall/CRC Statistics in the Social and Behavioral Sciences Series Series Editors: Jeff Gill, Steven Heeringa, Wim J. van der Linden, and Tom Snijders Recently Published Titles Linear Regression Models: Applications in R John P. Hoffman Mixed-Mode Surveys: Design and Analysis Jan van den Brakel, Bart Buelens, Madelon Cremers, Annemieke Luiten, Vivian Meertens, Barry Schouten and Rachel Vis-Visschers Applied Regularization Methods for the Social Sciences Holmes Finch An Introduction to the Rasch Model with Examples in R Rudolf Debelak, Carolin Stobl and Matthew D. Zeigenfuse Regression Analysis in R: A Comprehensive View for the Social Sciences Jocelyn H. Bolin Intensive Longitudinal Analysis of Human Processes Kathleen M. Gates, Sy-Min Chow, and Peter C. M. Molenaar Applied Regression Modeling: Bayesian and Frequentist Analysis of Categorical and Limited Response Variables with R and Stan Jun Xu The Psychometrics of Standard Setting: Connecting Policy and Test Scores Mark Reckase Crime Mapping and Spatial Data Analysis using R Juanjo Medina and Reka Solymosi Computational Aspects of Psychometric Methods: With R Patricia Martinková and Adéla Hladká Mixed-Mode Official Surveys: Design and Analysis Barry Schouten, Jan van den Brakel, Bart Buelens, Deirdre Giesen, Annemieke Luiten, Vivian Meertens Principles of Psychological Assessment With Applied Examples in R Isaac T. Petersen Multilevel Modeling Using R, Third Edition W. Holmes Finch and Jocelyn E. Bolin For more information about this series, please visit: https://www. routledge.com/Chapman--HallCRC-Statistics-in-the-Social-and-Behavioral- Sciences/book-series/CHSTSOBESCI
Page
4
Multilevel Modeling Using R Third Edition W. Holmes Finch Jocelyn E. Bolin
Page
5
Third edition published 2024 by CRC Press 2385 NW Executive Center Drive, Suite 320, Boca Raton FL 33431 and by CRC Press 4 Park Square, Milton Park, Abingdon, Oxon, OX14 4RN First edition published 2014 Second edition published 2019 CRC Press is an imprint of Taylor & Francis Group, LLC © 2024 W. Holmes Finch and Jocelyn E. Bolin Reasonable efforts have been made to publish reliable data and information, but the author and publisher cannot assume responsibility for the validity of all materials or the consequences of their use. The authors and publishers have attempted to trace the copyright holders of all material reproduced in this publication and apologize to copyright holders if permission to publish in this form has not been obtained. If any copyright material has not been acknowledged please write and let us know so we may rectify in any future reprint. Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced, transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying, microfilming, and recording, or in any information storage or retrieval system, without written permission from the publishers. For permission to photocopy or use material electronically from this work, access www.copyright.com or contact the Copyright Clearance Center, Inc. (CCC), 222 Rosewood Drive, Danvers, MA 01923, 978- 750-8400. For works that are not available on CCC please contact mpkbookspermissions@tandf.co.uk Trademark notice: Product or corporate names may be trademarks or registered trademarks and are used only for identification and explanation without intent to infringe. Library of Congress Cataloguing-in-Publication Data Names: Finch, W. Holmes (William Holmes), author. Title: Multilevel modeling using R / W. Holmes Finch, Jocelyn E. Bolin, Ken Kelley. Description: Third edition. | Boca Raton : Taylor and Francis, 2024. | Series: Chapman & Hall/CRC statistics in the social & behavioral | Includes bibliographical references and index. Identifiers: LCCN 2023045806 (print) | LCCN 2023045807 (ebook) | ISBN 9781032363967 (hardback) | ISBN 9781032363943 (paperback) | ISBN 9781003331711 (ebook) Subjects: LCSH: Social sciences‐‐Statistical methods. | Multivariate analysis. | R (Computer program language) Classification: LCC HA31.35 .F56 2024 (print) | LCC HA31.35 (ebook) | DDC 005.5/5‐‐dc23/eng/ 20231204 LC record available at https://lccn.loc.gov/2023045806 LC ebook record available at https://lccn.loc.gov/2023045807 ISBN: 978-1-032-36396-7 (hbk) ISBN: 978-1-032-36394-3 (pbk) ISBN: 978-1-003-33171-1 (ebk) DOI: 10.1201/b23166 Typeset in Palatino by MPS Limited, Dehradun
Page
6
Contents Preface......................................................................................................................ix About the Authors ................................................................................................xi 1. Linear Models.................................................................................................1 Simple Linear Regression.............................................................................. 2 Estimating Regression Models with Ordinary Least Squares................ 2 Distributional Assumptions Underlying Regression ............................... 3 Coefficient of Determination ........................................................................ 4 Inference for Regression Parameters........................................................... 5 Multiple Regression ....................................................................................... 7 Example of Simple Linear Regression by Hand....................................... 9 Regression in R ............................................................................................. 11 Interaction Terms in Regression ................................................................ 14 Categorical Independent Variables ........................................................... 15 Checking Regression Assumptions with R.............................................. 18 Summary ........................................................................................................ 21 2. An Introduction to Multilevel Data Structure ......................................22 Nested Data and Cluster Sampling Designs ........................................... 22 Intraclass Correlation ........................................................................... 23 Pitfalls of Ignoring Multilevel Data Structure................................. 27 Multilevel Linear Models ............................................................................ 28 Random Intercept ................................................................................. 28 Random Slopes...................................................................................... 30 Centering................................................................................................ 32 Basics of Parameter Estimation with MLMs ................................... 34 Maximum Likelihood Estimation ............................................... 34 Restricted Maximum Likelihood Estimation............................ 35 Assumptions Underlying MLMs ....................................................... 35 Overview of Level-2 MLMs................................................................ 36 Overview of Level-3 MLMs................................................................ 37 Overview of Longitudinal Designs and Their Relationship to MLMs................................................................................................. 38 Summary ........................................................................................................ 39 3. Fitting Level-2 Models in R ......................................................................41 Simple (Intercept Only) Multilevel Models ............................................. 41 Interactions and Cross-Level Interactions Using R ........................ 48 Random Coefficients Models Using R.............................................. 51 Centering Predictors............................................................................. 56 v
Page
7
Additional Options....................................................................................... 57 Parameter Estimation Method............................................................ 57 Estimation Controls.............................................................................. 58 Comparing Model Fit .......................................................................... 58 Lme4 and Hypothesis Testing............................................................ 59 Summary ........................................................................................................ 63 4. Level-3 and Higher Models.......................................................................65 Defining Simple Level-3 Models Using the lme4 Package ................... 65 Defining Simple Models with More Than Three Levels in the lme4 Package ................................................................................................. 72 Random Coefficients Models with Three or More Levels in the lme4 Package .......................................................................................... 74 Summary ........................................................................................................ 77 5. Longitudinal Data Analysis Using Multilevel Models ......................79 The Multilevel Longitudinal Framework................................................. 79 Person Period Data Structure..................................................................... 81 Fitting Longitudinal Models Using the lme4 package .......................... 82 Benefits of Using Multilevel Modeling for Longitudinal Analysis ..... 87 Summary ........................................................................................................ 88 6. Graphing Data in Multilevel Contexts...................................................89 Plots for Linear Models ............................................................................... 94 Plotting Nested Data.................................................................................... 97 Using the Lattice Package ........................................................................... 98 Plotting Model Results Using the Effects Package............................... 108 Summary ...................................................................................................... 119 7. Brief Introduction to Generalized Linear Models .............................120 Logistic Regression Model for a Dichotomous Outcome Variable ... 121 Logistic Regression Model for an Ordinal Outcome Variable ........... 125 Multinomial Logistic Regression ............................................................. 128 Models for Count Data.............................................................................. 131 Poisson Regression ............................................................................. 131 Models for Overdispersed Count Data .......................................... 134 Summary ...................................................................................................... 136 8. Multilevel Generalized Linear Models (MGLMs).............................138 MGLMs for a Dichotomous Outcome Variable .................................... 138 Random Intercept Logistic Regression ........................................... 139 Random Coefficient Logistic Regression........................................ 142 Inclusion of Additional Level-1 and Level-2 Effects in MGLM .................................................................................................. 145 MGLM for an Ordinal Outcome Variable ............................................. 148 vi Contents
Page
8
Random Intercept Logistic Regression ........................................... 149 MGLM for Count Data.............................................................................. 152 Random Intercept Poisson Regression ........................................... 153 Random Coefficient Poisson Regression ........................................ 155 Inclusion of Additional Level-2 Effects to the Multilevel Poisson Regression Model ................................................................ 157 Summary ...................................................................................................... 165 9. Bayesian Multilevel Modeling ...............................................................167 MCMCglmm for a Normally Distributed Response Variable............ 170 Including Level-2 Predictors with MCMCglmm .................................. 177 User-Defined Priors.................................................................................... 183 MCMCglmm for a Dichotomous-Dependent Variable........................ 187 MCMCglmm for a Count-Dependent Variable .................................... 190 Summary ...................................................................................................... 197 10. Multilevel Latent Variable Modeling ...................................................199 Multilevel Factor Analysis ........................................................................ 199 Fitting a Multilevel CFA Model Using lavaan...................................... 202 Estimating the Proportion of Variance Associated with Each Level of the Data ........................................................................................ 220 Multilevel Structural Equation Modeling .............................................. 223 Fitting Multilevel SEM Using lavaan...................................................... 225 Multilevel Growth Curve Models ........................................................... 228 Multilevel Item Response Theory Models ............................................. 231 Fitting a Multilevel IRT Model Using R ................................................ 232 Multilevel Latent Class Models ............................................................... 240 Estimating MLCA in R .............................................................................. 242 Summary ...................................................................................................... 250 11. Additional Modeling Frameworks for Multilevel Data ...................251 Fixed Effects Models .................................................................................. 252 Generalized Estimating Equations .......................................................... 256 Mediation Models with Multilevel Data................................................ 263 Multilevel Lasso.......................................................................................... 272 Fitting the Multilevel Lasso in R ............................................................. 274 Multivariate Multilevel Models ............................................................... 277 Multilevel Generalized Additive Models............................................... 279 Fitting GAMM Using R............................................................................. 280 Summary ...................................................................................................... 286 12. Advanced Issues in Multilevel Modeling ...........................................288 Robust Statistics in the Multilevel Context............................................ 288 Identifying Potential Outliers in Single-Level Data ............................. 289 Identifying Potential Outliers in Multilevel Data................................. 291 Contents vii
Page
9
Identifying Potential Multilevel Outliers Using R ............................... 293 Robust and Rank-Based Estimation for Multilevel Models................ 302 Fitting Robust and Rank-Based Multilevel Models in R..................... 305 Predicting Level-2 Outcomes with Level-1 Variables.......................... 309 Power Analysis for Multilevel Models................................................... 312 Summary ...................................................................................................... 317 References............................................................................................................318 Index .....................................................................................................................322 viii Contents
Page
10
Preface The goal of this third edition of the book is to provide you, the reader, with a comprehensive resource for the conduct of multilevel modeling using the R software package. Multilevel modeling, sometimes referred to as hierarchical modeling, is a powerful tool that allows the researcher to account for data collected at multiple levels. For example, an educational researcher might gather test scores and measures of socioeconomic status (SES) for students who attend a number of different schools. The students would be considered level-1 sampling units, and the schools would be referred to as level-2 units. Ignoring the structure inherent in this type of data collection can, as we discuss in Chapter 2, lead to incorrect parameter and standard error estimates. In addition to modeling the data structure correctly, we will see in the following chapters that the use of multilevel models can also provide us with insights into the nature of relationships in our data that might otherwise not be detected. After reviewing standard linear models in Chapter 1, we will turn our attention to the basics of multilevel models in Chapter 2, before learning how to fit these models using the R software package in Chapters 3 and 4. Chapter 5 focuses on the use of multilevel modeling in the case of longitudinal data, and Chapter 6 demonstrates the very useful graphical options available in R, particularly those most appropriate for multilevel data. Chapters 7 and 8 describe models for categorical dependent variables, first for single-level data, and then in the multilevel context. In Chapter 9, we describe an alternative to standard maximum likelihood estimation of multilevel models in the form of the Bayesian framework. Chapter 10 moves the focus from models for observed variables and places it on dealing with multilevel structure in the context of models in which the variables of interest are latent or unobserved. In this context, we deal with multilevel models for factor analysis, structural equation models, item response theory, and latent class analysis. We conclude the book with two chapters dealing with advanced topics in multilevel modeling such as fixed effects models, generalized estimating equations, mediation for multilevel data, penalized estimators, and nonlinear relationships, as well as robust estimators, outlier detection, prediction of level-2 outcomes with level-1 variables, and power analysis for multilevel models. The third edition of the book includes several new topics that were not present in the second edition. Specifically, we have included a new chapter (10) focused on fitting multilevel latent variable modeling in the R environment. With R, it is possible to fit a variety of latent variable models in the multilevel context, including factor analysis, structural models, item response theory, and latent class models. The third edition also includes new sections in ix
Page
11
Chapter 11 describing two useful alternatives to standard multilevel models, fixed effects models and generalized estimating equations. These approaches are particularly useful with small samples and when the researcher is interested in modeling the correlation structure within higher-level units (e.g., schools). The third edition also includes a new section on mediation modeling in the multilevel context in Chapter 11. The datasets featured in this book are available at the website www.mlminr.com. We hope that you find this book to be helpful as you work with multilevel data. Our goal is to provide you with a guidebook that will serve as the launching point for your own investigations in multilevel modeling. The R code and discussion of its interpretation contained in this text should provide you with the tools necessary to gain insights into your own research, in whatever field it might be. We appreciate your taking the time to read our work, and hope that you find it as enjoyable and informative to read, as it was for us to write. x Preface
Page
12
About the Authors W. Holmes Finch is a Professor in the Department of Educational Psychology at Ball State University where he has been since 2003. He received his PhD from the University of South Carolina in 2002. Dr. Finch teaches courses in factor analysis, structural equation modeling, categorical data analysis, regression, multivariate statistics and measurement to graduate students in psychology and education. His research interests are in the areas of multilevel models, latent variable modeling, methods of prediction and classification, and nonparametric multivariate statistics. Holmes is also an Accredited Professional Statistician (PStat®). Jocelyn E. Bolin received her PhD in Educational Psychology from Indiana University Bloomington in 2009. Her dissertation consisted a comparison of statistical classification analyses under situations of training data misclassification. She is now an Assistant Professor in the Department of Educational Psychology at Ball State University where she has been since 2010. Dr. Bolin teaches courses on introductory and intermediate statistics, multiple regression analysis, and multilevel modeling for graduate students in social science disciplines. Her research interests include statistical methods for classification and clustering and use of multilevel modeling in the social sciences. She is a member of the American Psychological Association, the American Educational Research Association, and the American Statistical Association. Jocelyn is also an Accredited Professional Statistician (PStat®). xi
Page
13
(This page has no text content)
Page
14
1 Linear Models Statistical models provide powerful tools to researchers in a wide array of disciplines. Such models allow for the examination of relationships among multiple variables, which in turn can lead to a better understanding of the world. For example, sociologists use linear regression to gain insights into how factors such as ethnicity, gender, and level of education are related to an individual’s income. Biologists can use the same type of model to understand the interplay between sunlight, rainfall, industrial runoff, and biodiversity in a rainforest. And using linear regression, educational researchers can develop powerful tools for understanding the role that different instructional strategies have on student achievement. In addition to providing a path by which various phenomena can be better understood, statistical models can be used as predictive tools. For example, econome- tricians might develop models to predict labor market participation given a set of economic inputs, whereas higher education administrators might use similar types of models to predict grade point average for prospective incoming freshmen in order to identify those who might need academic assistance during their first year of college. As can be seen from these few examples, statistical modeling is very important across a wide range of fields, providing researchers with tools for both explanation and prediction. Certainly, the most popular of such models over the last 100 years of statistical practice has been the general linear model (GLM). The GLM links a dependent or outcome variable to one or more independent variables, and can take the form of such popular tools as analysis of variance (ANOVA) and regression. Given its popularity and utility, and the fact that it serves as the foundation for many other models, including the multilevel models featured in this book, we will start with a brief review of the linear model, particularly focusing on regression. This review will include a short technical discussion of linear regression models, followed by a description of how they can be estimated using the R language and environment (R Development Core Team, 2012). The technical aspects of this discussion are purposefully not highly detailed, as we focus on the model from a conceptual perspective. However, sufficient detail is presented so that the reader having only limited familiarity with the linear regression model will be provided with a basis for moving forward to multilevel models so that particular features of these more complex models that are shared with the linear model can be explicated. Readers particularly familiar with linear DOI: 10.1201/b23166-1 1
Page
15
regression and using R to conduct such analyses may elect to skip this chapter with no loss of understanding in future chapters. Simple Linear Regression As noted above, the GLM framework serves as the basis for the multilevel models that we describe in subsequent chapters. Thus, in order to provide the foundation for the rest of the book, we will focus in this chapter on the linear regression model, although its form and function can easily be translated to ANOVA as well. The simple linear regression model in population form is y x= + +i i i0 1 (1.1) where yi is the dependent variable for individual i in the dataset, and xi is the independent variable for subject i i N( = 1,…, ). The terms β0 and β1 are the intercept and slope of the model, respectively. In a graphical sense, the intercept is the point where the line in equation (1.1) crosses the y-axis at x = 0. It is also the mean, specifically the conditional mean, of y for individuals with a value of 0 on x, and it is this latter definition that will be most useful in actual practice. The slope β1 expresses the relationship between y and x. Positive slope values indicate that larger values of x are associated with correspondingly larger values of y, while negative slopes mean that larger x values are associated with smaller y values. Holding everything else constant, larger values of β1 (positive or negative) indicate a stronger linear relationship between y and x. Finally, ει represents the random error inherent in any statistical model, including regression. It expresses the fact that for any individual i, the model will not generally provide a perfect predicted value of yi, denoted as ŷi and obtained by applying the regression model as y xˆ = +i i0 1 (1.2) Conceptually, this random error is representative of all factors that might influence the dependent variable other than x. Estimating Regression Models with Ordinary Least Squares In virtually all real-world contexts, the population is unavailable to the researcher. Therefore, β0 and β1 must be estimated using sample data taken 2 Multilevel Modeling Using R
Page
16
from the population. There exists in the statistical literature several methods for obtaining estimated values of the regression model parameters (b0 and b1, respectively) given a set of x and y. By far, the most popular and widely used of these methods is ordinary least squares (OLS). A vast number of other approaches are useful in special cases involving small samples or data that do not conform to the distributional assumptions undergirding OLS. The goal of OLS is to minimize the sum of the squared differences between the observed values of y and the model predicted values of y, across the sample. This difference, known as the residual, is written as e y y= ˆi i i (1.3) Therefore, the method of OLS seeks to minimize e y y= ( ˆ ) i n i i n i i =1 2 =1 2 (1.4) The actual mechanism for finding the linear equation that minimizes the sum of squared residuals involves the partial derivatives of the sum of squared function with respect to the model coefficients, β0 and β1. We will leave these mathematical details to excellent references such as Fox (2016). It should be noted that in the context of simple linear regression, the OLS criteria reduce to the following equations, which can be used to obtain b0 and b1 as b r s s = y x 1 (1.5) and b y b x=0 1 (1.6) where, r is the Pearson product moment correlation coefficient between x and y, sy is the sample standard deviation of y, sx is the sample standard deviation of x, ȳ is the sample mean of y, and x̄ is the sample mean of x. Distributional Assumptions Underlying Regression The linear regression model rests upon several assumptions about the distribution of the residuals in the broader population. Although the researcher can typically never collect data from the entire population, it is possible to assess empirically whether these assumptions are likely to hold Linear Models 3
Page
17
true based on the sample data. The first assumption that must hold true for linear models to function optimally is that the relationship between yi and xi is linear. If the relationship is not linear, then clearly an equation for a line will not provide adequate fit and the model is thus misspecified. The second assumption is that the variance in the residuals is constant regardless of the value of xi. This assumption is typically referred to as homoscedasticity and is a generalization of the homogeneity of error variance assumption in ANOVA. Homoscedasticity implies that the variance of yi is constant across values of xi. The distribution of the dependent variable around the regression line is literally the distribution of the residuals, thus making clear the connection of homoscedasticity of errors with the distribution of yi around the regression line. The third assumption is that the residuals are normally distributed in the population. Fourth, it is assumed that the independent variable x is measured without error and that it is unrelated to the model error term, ε. It should be noted that the assumption of x measured without error is not as strenuous as one might first assume. In fact, for most real-world problems, the model will work well even when the independent variable is not error-free (Fox, 2016). Fifth and finally, the residuals for any two individuals in the population are assumed to be independent of one another. This independence assumption implies that the unmeasured factors influencing y are not related from one individual to another. It is this assumption that is directly addressed with the use of multilevel models, as we will see in Chapter 2. In many research situations, individuals are sampled in clusters such that we cannot assume that individuals from the same such cluster will have uncorrelated residuals. For example, if samples are obtained from multiple neighborhoods, indivi- duals within the same neighborhoods may tend to be more like one another than they are like individuals from other neighborhoods. A prototypical example of this is children within schools. Due to a variety of factors, children attending the same school will often have more in common with one another than they do with children from other schools. These “common” things might include neighborhood socioeconomic status, school administration policies, and school learning environment, to name just a few. Ignoring this clustering, or not even realizing it is a problem, can be detrimental to the results of statistical modeling. We explore this issue in great detail later in the book, but for now we simply want to mention that a failure to satisfy the assumption of independent errors is (a) a major problem but (b) often something that can be overcome with appropriate models such as multilevel models that explicitly consider the nesting of the data. Coefficient of Determination When the linear regression model has been estimated, researchers generally want to measure the relative magnitude of the relationship between the 4 Multilevel Modeling Using R
Page
18
variables. One useful tool for ascertaining the strength of relationship between x and y is the coefficient of determination, which is the squared multiple correlation coefficient, denoted as R2 in the sample. R2 reflects the proportion of the variation in the dependent variable that is explained by the independent variable. Mathematically, R2 is calculated as R SS SS y y y y y y y y SS SS = = ( ˆ ¯) ( ¯) 1 ( ˆ) ( ¯) = 1R T i n i i n i i n i i n i E T 2 =1 2 =1 2 =1 2 =1 2 (1.7) The terms in equation (1.7) are as defined previously. The value of this statistic always lies between 0 and 1, with larger numbers indicating a stronger linear relationship between x and y, implying that the independent variable is able to account for more variance in the dependent variable. R2 is a very commonly used measure of the overall fit of the regression model and, along with the parameter inference discussed below, serves as the primary mechanism by which the relationship between the two variables is quantified. Inference for Regression Parameters A second method for understanding the nature of the relationship between x and y involves making inferences about the relationship in the population given the sample regression equation. Because b0 and b1 are sample estimates of the population parameters β0 and β1, respectively, they are subject to sampling error as is any sample estimate. This means that, although the estimates are unbiased given that the aforementioned assump- tions hold, they are not precisely equal to the population parameter values. Furthermore, were we to draw multiple samples from the population and estimate the intercept and slope for each, the values of b0 and b1 would differ across samples, even though they would be estimating the same population parameter values for β0 and β1. The magnitude of this variation in parameter estimates across samples can be estimated from our single sample using a statistic known as the standard error. The standard error of the slope, denoted as b1 in the population, can be thought of as the standard deviation of slope values obtained from all possible samples of size n, taken from the population. Similarly, the standard error of the intercept, σb0, is the standard deviation of the intercept values obtained from all such samples. Clearly, it is not possible to obtain census data from a population in an applied research context. Therefore, we will need to estimate the standard errors of both the slope (sb1) and intercept (sb0) using data from a single sample, much as we did with b0 and b1. Linear Models 5
Page
19
In order to obtain sb1, we must first calculate the variance of the residuals, S e N p = 1e i n i2 =1 2 (1.8) where ei is the residual value for individual i, N is the sample size, and p is the number of independent variables (1 in the case of simple regression). Then S R S x x = 1 1 ( ¯) b e i n i 2 =1 21 (1.9) The standard error of the intercept is calculated as S S x n =b b i n i=1 2 0 1 (1.10) Given that the sample intercept and slope are only estimates of the population parameters, researchers are quite often interested in testing hypotheses to infer whether the data represent a departure from what would be expected in what is commonly referred to as the null case, that the idea of the null value holding true in the population can be rejected. Most frequently (though not always), the inference of interest concerns testing that the population parameter is 0. In particular, a non-zero slope in the population means that x is linearly related to y. Therefore, researchers typically are interested in using the sample to make inference of whether the population slope is 0 or not. Inference can also be made regarding the intercept, and again the typical focus is on whether this value is 0 in the population. Inference about regression parameters can be made using confidence intervals and hypothesis tests. Much as with the confidence interval of the mean, the confidence interval of the regression coefficient yields a range of values within which we have some level of confidence (e.g. 95%) that the population parameter value resides. If our particular interest is in whether x is linearly related to y, then we would simply determine whether 0 is in the interval for β1. If so, then we would not be able to conclude that the population value differs from 0. The absence of a statistically significant result (i.e. an interval not containing 0) does not imply that the null hypothesis is true, but rather it means that there is no sufficient evidence available in the sample data to reject the null. Similarly, we can construct a confidence interval for the intercept, and if 0 is within the interval, we would conclude that the value of y for an individual with x = 0 could 6 Multilevel Modeling Using R
Page
20
plausibly be but is not necessarily 0. The confidence intervals for the slope and intercept take the following forms: b t S± CV b1 1 (1.11) and b t S± CV b0 0 (1.12) Here, the parameter estimates and their standard errors are as described previously, while tcv is the critical value of the t distribution for 1 − α/2 (e.g., the 0.975 quantile if α = 0.05) with n − p − 1 degrees of freedom. The value of α is equal to 1 minus the desired level of confidence. Thus, for a 95% confidence interval (0.95 level of confidence), α would be 0.05. In addition to confidence intervals, inference about the regression parameters can be made using hypothesis tests. In general, the forms of this test for the slope and intercept, respectively, are t b S =b b 1 1 1 1 (1.13) t b S =b b 0 0 0 0 (1.14) The terms β1 and β0 are the parameter values under the null hypothesis. Again, most often the null hypothesis posits that there is no linear relation- ship between x and y (β1 = 0) and that the value of y = 0 when x = 0 (β0 = 0). For simple regression, each of these tests is conducted with n − 2 degrees of freedom. Multiple Regression The linear regression model can be easily extended to allow for multiple independent variables at once. In the case of two regressors, the model takes the form y x x= + + +i i i i0 1 1 2 2 (1.15) In many ways, this model is interpreted as that for simple linear regression. The only major difference between simple and multiple regression interpre- tation is that each coefficient is interpreted in turn holding constant the value of Linear Models 7
Comments 0
Loading comments...
Reply to Comment
Edit Comment