(This page has no text content)
Data Science with R : An Introduction to Statistical Computing and Graphics Caroline Davis Published by CCL Publishing, 2023.
While every precaution has been taken in the preparation of this book, the publisher assumes no responsibility for errors or omissions, or for damages resulting from the use of the information contained herein. DATA SCIENCE WITH R : AN INTRODUCTION TO STATISTICAL COMPUTING AND GRAPHICS First edition. July 5, 2023. Copyright © 2023 Caroline Davis. Written by Caroline Davis.
Table of Contents Title Page Copyright Page Data Science with R : An Introduction to Statistical Computing and Graphics History and Overview of R What is R? Basic Features Design of the R System Limitations of R Installation R Getting started with the R interface | R Nuts and Bolts Entering Input Evaluation Objects Numbers Attributes Creating Vectors Mixing Objects Explicit Coercion
Matrices Lists Factors Missing Values Data Frames Names Getting Data In and Out of R | Reading and Writing Data Reading Data Files with read.table() Reading in Larger Datasets with read.table Calculating Memory Requirements for Objects Using the readr Package Using Textual and Binary Formats for Storing Data Using dput() and dump() Binary Formats Interfaces to the Outside World File Connections Reading Lines of a Text File Reading From a URL Connection Subsetting R Objects Subsetting a Vector
Subsetting a Matrix Subsetting Lists Subsetting Nested Elements of a List Extracting Multiple Elements of a List Partial Matching Removing NA Values Vectorized Operations Vectorized Matrix Operations Dates and Times Dates Times Operations on Dates and Times Managing Data Frames with the dplyr package | Data Frames The dplyr Package dplyr Grammar Installing the dplyr package select() filter() arrange() rename()
mutate() group_by() %>% if-else for Loops Nested for loops while Loops repeat Loops next, break Functions Your First Function Argument Matching Lazy Evaluation The ... Argument Arguments Coming After the ... Argument in R A Diversion on Binding Values to Symbol Scoping Rules Lexical Scoping: Why Does It Matter? Lexical vs. Dynamic Scoping Application: Optimization
kelihood Loop Functions Looping on the Command Line lapply() sapply() split() Splitting a Data Frame tapply apply() Col/Row Sums and Means mapply() Vectorizing a Function Debugging | Debugging Tools in R Using traceback() Using debug() Profiling R Code Using system.time() | Timing Longer Expressions The R Profiler Using summaryRprof() Simulation | Generating Random Numbers Setting the random number seed
Simulating a Linear Model Random Sampling Sign up for Caroline Davis's Mailing List
introduction History and Overview of R What is R? Basic Features Design of the R System Limitations of R Installation R Getting started with the R interface R Nuts and Bolts Entering Input Evaluation Objects Numbers Attributes Creating Vectors Mixing Objects Explicit Coercion Matrices Lists
Factors Missing Values Data Frames Names Getting Data In and Out of R Reading and Writing Data Reading Data Files with read.table() Reading in Larger Datasets with read.table Calculating Memory Requirements for Objects Using the readr Package Using Textual and Binary Formats for Storing Data Using dput() and dump() Binary Formats Interfaces to the Outside World File Connections Reading Lines of a Text File Reading From a URL Connection Subsetting R Objects Subsetting a Vector Subsetting a Matrix
Subsetting Lists Subsetting Nested Elements of a List Extracting Multiple Elements of a List Partial Matching Removing NA Values Vectorized Operations Vectorized Matrix Operations Dates and Times Dates Times Operations on Dates and Times Managing Data Frames with the dplyr package Data Frames The dplyr Package dplyr Grammar Installing the dplyr package select() filter() arrange() rename()
mutate() group_by() %>% Control Structures if-else for Loops Nested for loops while Loops repeat Loops next, break Functions Functions Your First Function Argument Matching Lazy Evaluation The ... Argument
Arguments Coming After the ... Argument in R Scoping Rules of R A Diversion on Binding Values to Symbol Scoping Rules Lexical Scoping: Why Does It Matter? Lexical vs. Dynamic Scoping Application: Optimization kelihood Coding Standards for R Loop Functions Looping on the Command Line lapply() sapply() split() Splitting a Data Frame tapply
apply() Col/Row Sums and Means mapply() Vectorizing a Function Debugging Debugging Tools in R Using traceback() Using debug() Profiling R Code Using system.time() Timing Longer Expressions The R Profiler Using summaryRprof() Simulation Generating Random Numbers Setting the random number seed Simulating a Linear Model Random Sampling The end
introduction The R programming language is a powerful tool for data analysis and visualization, used by a diverse community of researchers, data scientists, and statisticians around the world. Whether you're working with small datasets or large-scale projects, R offers a flexible and customizable environment for manipulating, exploring, and visualizing data. In this book, we'll take you through the basics of R and introduce you to the essential concepts and tools you need to get started with data analysis. You'll learn how to use R's syntax and data structures to manipulate and clean datasets, and how to apply statistical methods to analyze your data and draw meaningful insights. We'll also cover the fundamentals of data visualization, showing you how to create clear and engaging graphics that convey your findings to others. With hands-on examples and exercises, you'll develop a practical understanding of how to use R to solve real-world problems. Whether you're a complete beginner or have some experience with data analysis and programming, this book will help you develop the skills you need to harness the power of R for your own data projects.
History and Overview of R Ris an open-source programming language and environment for statistical computing and graphics. It was initially created by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand in the mid-1990s as a successor to the S language. R is now maintained by the R Development Core Team, which is a group of volunteer developers from around the world. The primary purpose of R is to provide a free and flexible tool for statistical analysis and data visualization. It includes a wide variety of statistical and graphical techniques, from simple data summary functions to complex machine learning algorithms. R also has a vibrant and active community of users who contribute to its development, documentation, and package ecosystem. One of the strengths of R is its ability to handle large and complex datasets, making it a popular choice for data analysis in many different fields, including finance, healthcare, social sciences, and more. It can read and write data in a variety of formats, including CSV, Excel, and SQL databases, and can also interact with other programming languages like Python and C++.
R has a command-line interface, which may be intimidating to some users at first. However, there are many user-friendly integrated development environments (IDEs) and graphical user interfaces (GUIs) available, such as RStudio, which make it easier to work with R. Another strength of R is its package system, which provides access to a vast library of pre-written code for a wide variety of statistical techniques, data manipulation, and visualization. R packages are maintained by the R community and can be easily installed and updated.
What is R? Ris a free and open -source programming language and environment for statistical computing and graphics. It was developed by Ross Ihaka and Robert Gentleman at the University of Auckland, New Zealand in the mid- 1990s as a successor to the S language. R provides a wide variety of statistical and graphical techniques for data analysis, including linear and nonlinear modeling, classical statistical tests, time-series analysis, clustering, and more. It also has a rich package ecosystem with thousands of pre-written packages available for a variety of tasks, including data manipulation, visualization, and machine learning. One of the strengths of R is its ability to handle large and complex datasets, making it a popular choice for data analysis in many different fields, including finance, healthcare, social sciences, and more. It can read and write data in a variety of formats, including CSV, Excel, and SQL databases, and can also interact with other programming languages like Python and C++. R has a command-line interface, which may be intimidating to some users at first. However, there are many user-friendly integrated development environments (IDEs) and graphical user interfaces (GUIs) available, such as RStudio, which make it easier to work with R.
is widely used in academia, research, and industry for data analysis, statistical modeling, and visualization. It is also popular in the data science community, where it is often used in conjunction with other tools like Python and SQL to build data pipelines and perform machine learning tasks.
Comments 0
Loading comments...
Reply to Comment
Edit Comment