Data Scientist with R ::

Data Scientist with R

ID: 2b4e9403-1106-4ec9-844b-5d144eb7890f

R Programming 101- Datacamp Beginner

R Programming 101- Datacamp Intermediate

Introduction to TidyVerse

ID: e32d5ba7-25c5-496b-97fe-23eefa52d5f2

Gapminder dataset

library(gapminder) : For dataset
- Country, Continent, Year, LifeExpectancy, Population and GdpPerCapita.
library(datasets) : For other more datasets

More Libraries

library(dplyr) For filtering sorting and summarizing data

Dplyr

ID: 912f038f-a063-4c4f-8c7f-5ab3f163ed25

From the tidyverse site, Dplyr is defined as “grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges” R Programming 101- Dplyr

R Programming 101- Ggplot

Importing Data from Files

ID: 591fd07b-646b-4015-85cf-3d1b03f6facb

R Programming 102- Importing Data

Importing Data from Databases

ID: 9844cb4f-1165-4a42-bec7-c0c889f2119a

R Programming 102- Importing Databases

Cleaning Data

ID: 143f6193-7537-402d-a422-2af3dc1d6fa3

R Programming 102- Cleaning data

Working with Dates and Times

ID: 4ab04bbf-3a45-4402-89d5-3c678e3fc145

R Programming 102- Dates and Times

Functions in R

ID: 1e3fc3e7-af83-40fc-9e7c-a413e6e1cfc5

R Programming 102- Functions in R

Exploratory Data Analysis

ID: 8faf9012-011c-49bb-ae04-17aba1b5aa04

R Programming 103- Exploratory Data Analysis

Correlation and Regression

ID: d3d245cf-ae25-4809-8286-64e60a73a41a

R Programming 104- Correlation and Regression

Supervised Learning

ID: 984e254a-5aec-4a2c-a471-09ac98d41a2b

R Programming 105- Supervised Learning 1 R Programming 105- Supervised Learning 2 Regression

Unsupervised Learning

ID: 2729b65a-1d9c-4832-b745-3f24dbf31753

R Programming 105- Unsupervised Learning

Some useful functionalities

Creating generic names using paste

c("id", paste0("filename_", 1:50))

This command can create number of columns by pasting some string with a list. Often in naming columns in dataframe, this might be useful.

Remove the first column of dataframe

Often times you have the value to be predicted in the dataframe and might need to be removed.

df[-1]

Remove NA values

na.omit(dataframe)

Summary for categorical variables using table function

Shows the number of columns in each type if the data is categorical.

table(df$col_name)

Get a subset of dataframe: similar to filter from dplyr

subset(df_name, col_name == "sth")

Gather some rows into one using key-value pair

gather(key = col_key, value = new_col_name, old_col1, old_col2)

What gather helps us doing is when we have different kind of values of same type: say predictions and actual value and want to plot both of them into some graph. Rather than working with different columns, we can create a new column specified by name col_key which will have the type of data it is: old_col1 or old_col2. Then the column specified in new_col_name will be the resulting column you will have containing the value of the respective old columns specified by col_key. Now you can just group by that column and can do many stuff with the plot.

Rows to Columns [ Matrix to Dataframe]

rownames_to_column(as.data.frame(df_name), var = "new_col_name")