First Attempt at R

5 minute read

In this post, I will go over what I learned during my first attempt at R.

Motivation

R is one of the two main tools that data scientists and statisticians use in their work. Given its prominence in the industry, I had heard about the language countless numbers of times in books and videos alike, but over the past few days, I have finally had the chance to study it in detail. As usual, this post serves to document my progress with what I have studied, in addition to being a reference guide for my future self. (Also, just as a side note, this post was originally written in R Markdown, but later converted to .ipynb and .md files using Jupyter.) With that being said, let’s get to some R.

RStudio vs Jupyter

As with any newcomer to R, I first started with RStudio, studying basic grammar with the widely-known books R for Data Science and Hands-On Programming with R by Garrett Grolemund and Hadley Wickham, as well as Data Analysis with R by Hoon Park. While taking my first steps down the R journey, I first started working on this post using the syntax editor in RStudio–integrating some R code blocks and commentry in a R Markdown file. Then by using a clever trick provided by this guide, I converted (or “knitted”) the R Markdown file to a .md file, and later test-uploaded the file to my Github repository. This worked by adding output:md_document:variant: markdown_github to the .rmd file’s YAML front matter, the style of which I was not familiar with.

Although RStudio worked fine, after some trial and error, I decided to use Jupyter to finish the rest of this post–which was because I found RStudio to be somewhat slow with downloading packages, as well as my penchant for Jupyter’s simple, elegant interface. To give RStudio some credit though, I also decided to come back to it later–that is, if I need to knit .rmd files to HTML or Word, publish R files on the web using rpubs, or create applications using shinyapp.

Some Notes on R Grammar

Lists: Appending Lists

listFruit <- c('Apples', 'Apples', 'Bananas', 'Bananas', 'Pineapples', 'Pineapples', 'Oranges', 'Cucumbers')
print(unique(listFruit))

[1] "Apples"     "Bananas"    "Pineapples" "Oranges"    "Cucumbers"

Lists: Reversing Logical Elements

logical_var <- c(FALSE, TRUE, FALSE, TRUE, TRUE)
logical_var # prints: FALSE, TRUE, FALSE, TRUE, TRUE
!logical_var # prints: T, F, T, F, F

<ol class=list-inline> <li>FALSE</li> <li>TRUE</li> <li>FALSE</li> <li>TRUE</li> <li>TRUE</li> </ol>

<ol class=list-inline> <li>TRUE</li> <li>FALSE</li> <li>TRUE</li> <li>FALSE</li> <li>FALSE</li> </ol>

Data Frame: Create & Access

Create a Data Frame

id <- c('43', '44', '45', '46')
name <- c('Bush', 'Obama', 'Trump', 'Biden')
age <- c(75,59,75,78)
mStatus <- c(T, T, T, T)

df <- data.frame(id,name,age,mStatus)
df # displays dataframe chart
str(df) # displays type / values

id	name	age	mStatus
43	Bush	75	TRUE
44	Obama	59	TRUE
45	Trump	75	TRUE
46	Biden	78	TRUE

'data.frame':	4 obs. of  4 variables:
 $ id     : Factor w/ 4 levels "43","44","45",..: 1 2 3 4
 $ name   : Factor w/ 4 levels "Biden","Bush",..: 2 3 4 1
 $ age    : num  75 59 75 78
 $ mStatus: logi  TRUE TRUE TRUE TRUE

Access Data in Rows, Columns

df[2,3] # 2nd row, 3rd column, value = 56
df[c(2,3), c(2,4)] # print (row, column) values = (2,2), (2,4), (3,2), (3,4)

	name	mStatus
2	Obama	TRUE
3	Trump	TRUE

df$name # access column data using $
df$name[3:4] # column 'name', third, fourth rows

<ol class=list-inline> <li>Bush</li> <li>Obama</li> <li>Trump</li> <li>Biden</li> </ol>

<summary style=display:list-item;cursor:pointer> Levels: </summary> <ol class=list-inline>

'Biden'

'Bush'

'Obama'

'Trump'

</ol>

<ol class=list-inline> <li>Trump</li> <li>Biden</li> </ol>

<summary style=display:list-item;cursor:pointer> Levels: </summary> <ol class=list-inline>

'Biden'

'Bush'

'Obama'

'Trump'

</ol>

Structure Display

str(df) # display information, such as #objects, categories, etc

'data.frame':	4 obs. of  4 variables:
 $ id     : Factor w/ 4 levels "43","44","45",..: 1 2 3 4
 $ name   : Factor w/ 4 levels "Biden","Bush",..: 2 3 4 1
 $ age    : num  75 59 75 78
 $ mStatus: logi  TRUE TRUE TRUE TRUE

id <- c('39', '40', '41', '42')
name <- c('Carter', 'Reagan', 'Bush', 'Clinton')
age <- c(96,93,94,74)
mStatus <- c(T, T, T, T)

df1 <- data.frame(id,name,age,mStatus)
df1

id	name	age	mStatus
39	Carter	96	TRUE
40	Reagan	93	TRUE
41	Bush	94	TRUE
42	Clinton	74	TRUE

Combine Data Frames

bothdf <- rbind(df1, df)
bothdf

id	name	age	mStatus
39	Carter	96	TRUE
40	Reagan	93	TRUE
41	Bush	94	TRUE
42	Clinton	74	TRUE
43	Bush	75	TRUE
44	Obama	59	TRUE
45	Trump	75	TRUE
46	Biden	78	TRUE

Retrieve: `head`, `tail`, `min`, `max`, `median`, `quantile`

head(bothdf,3)
tail(bothdf, 3)

min(bothdf$age) # min
max(bothdf$age) # max
median(bothdf$age) # median
quantile(bothdf$age) # quartile 

df3 <- bothdf

id	name	age	mStatus
39	Carter	96	TRUE
40	Reagan	93	TRUE
41	Bush	94	TRUE

	id	name	age	mStatus
6	44	Obama	59	TRUE
7	45	Trump	75	TRUE
8	46	Biden	78	TRUE

76.5

Retrieve: Sections of the Data Frame

subset(df3, age > 80) # only those with age above 80. 

id	name	age	mStatus
39	Carter	96	TRUE
40	Reagan	93	TRUE
41	Bush	94	TRUE

Add: New Column

Nationality <- c("American1", "American2", "American3", "American4")

df3$new_column <- Nationality # adds a new column, "Nationality", with alternating values
df3

id	name	age	mStatus	new_column
39	Carter	96	TRUE	American1
40	Reagan	93	TRUE	American2
41	Bush	94	TRUE	American3
42	Clinton	74	TRUE	American4
43	Bush	75	TRUE	American1
44	Obama	59	TRUE	American2
45	Trump	75	TRUE	American3
46	Biden	78	TRUE	American4

Delete: Columns

new_df3 <- df3[ , -c(3,4)] # creates a copy of the df3 data frame, deletes columns "age", "mStatus"
new_df3

id	name	new_column
39	Carter	American1
40	Reagan	American2
41	Bush	American3
42	Clinton	American4
43	Bush	American1
44	Obama	American2
45	Trump	American3
46	Biden	American4

df3[ , c(5)] <- list(NULL) # delete the column "Nationality" from the original data frame, df3
head(df3)

id	name	age	mStatus
39	Carter	96	TRUE
40	Reagan	93	TRUE
41	Bush	94	TRUE
42	Clinton	74	TRUE
43	Bush	75	TRUE
44	Obama	59	TRUE

Change: Column Names

colnames(df3)

<ol class=list-inline> <li>‘id’</li> <li>‘name’</li> <li>‘age’</li> <li>‘mStatus’</li> </ol>

colnames(df3) <- c("C1", "C2", "C3", "C4")
head(df3)

colnames(df3) <- c("ID", "Name", "Age", "Marital Status")
head(df3)

C1	C2	C3	C4
39	Carter	96	TRUE
40	Reagan	93	TRUE
41	Bush	94	TRUE
42	Clinton	74	TRUE
43	Bush	75	TRUE
44	Obama	59	TRUE

ID	Name	Age	Marital Status
39	Carter	96	TRUE
40	Reagan	93	TRUE
41	Bush	94	TRUE
42	Clinton	74	TRUE
43	Bush	75	TRUE
44	Obama	59	TRUE

The rest of this post dealing with visualization with R has been relocated to the next post, due to the length of this document.

Twitter Facebook LinkedIn

Junhyung Park

First Attempt at R

Motivation

RStudio vs Jupyter

Some Notes on R Grammar

Lists: Appending Lists

Lists: Reversing Logical Elements

Data Frame: Create & Access

Create a Data Frame

Access Data in Rows, Columns

Structure Display

Combine Data Frames

Retrieve: `head`, `tail`, `min`, `max`, `median`, `quantile`

Retrieve: Sections of the Data Frame

Add: New Column

Delete: Columns

Change: Column Names

You May Also Enjoy

Graph Theory I

Recursion I

SVD, PCA

Distributions II

Junhyung Park

Motivation

RStudio vs Jupyter

Some Notes on R Grammar

Lists: Appending Lists

Lists: Reversing Logical Elements

Data Frame: Create & Access

Create a Data Frame

Access Data in Rows, Columns

Structure Display

Combine Data Frames

Retrieve: head, tail, min, max, median, quantile

Retrieve: Sections of the Data Frame

Add: New Column

Delete: Columns

Change: Column Names

You May Also Enjoy

Graph Theory I

Recursion I

SVD, PCA

Distributions II

Retrieve: `head`, `tail`, `min`, `max`, `median`, `quantile`