2 minutes
Data Scientist with R
Data Scientist with R
ID: 2b4e9403-1106-4ec9-844b-5d144eb7890f
R Programming 101- Datacamp Beginner
R Programming 101- Datacamp Intermediate
Introduction to TidyVerse
ID: e32d5ba7-25c5-496b-97fe-23eefa52d5f2
Gapminder dataset
- library(gapminder) : For dataset
- Country, Continent, Year, LifeExpectancy, Population and GdpPerCapita.
- library(datasets) : For other more datasets
More Libraries
- library(dplyr) For filtering sorting and summarizing data
Dplyr
ID: 912f038f-a063-4c4f-8c7f-5ab3f163ed25
From the tidyverse site, Dplyr is defined as “grammar of data manipulation, providing a consistent set of verbs that help you solve the most common data manipulation challenges” R Programming 101- Dplyr
R Programming 101- Ggplot
Importing Data from Files
ID: 591fd07b-646b-4015-85cf-3d1b03f6facb
R Programming 102- Importing Data
Importing Data from Databases
ID: 9844cb4f-1165-4a42-bec7-c0c889f2119a
R Programming 102- Importing Databases
Cleaning Data
ID: 143f6193-7537-402d-a422-2af3dc1d6fa3
R Programming 102- Cleaning data
Working with Dates and Times
ID: 4ab04bbf-3a45-4402-89d5-3c678e3fc145
R Programming 102- Dates and Times
Functions in R
ID: 1e3fc3e7-af83-40fc-9e7c-a413e6e1cfc5
R Programming 102- Functions in R
Exploratory Data Analysis
ID: 8faf9012-011c-49bb-ae04-17aba1b5aa04
R Programming 103- Exploratory Data Analysis
Correlation and Regression
ID: d3d245cf-ae25-4809-8286-64e60a73a41a
R Programming 104- Correlation and Regression
Supervised Learning
ID: 984e254a-5aec-4a2c-a471-09ac98d41a2b
R Programming 105- Supervised Learning 1 R Programming 105- Supervised Learning 2 Regression
Unsupervised Learning
ID: 2729b65a-1d9c-4832-b745-3f24dbf31753
R Programming 105- Unsupervised Learning
Some useful functionalities
Creating generic names using paste
c("id", paste0("filename_", 1:50))
This command can create number of columns by pasting some string with a list. Often in naming columns in dataframe, this might be useful.
Remove the first column of dataframe
Often times you have the value to be predicted in the dataframe and might need to be removed.
df[-1]
Remove NA values
na.omit(dataframe)
Summary for categorical variables using table function
Shows the number of columns in each type if the data is categorical.
table(df$col_name)
Get a subset of dataframe: similar to filter from dplyr
subset(df_name, col_name == "sth")
Gather some rows into one using key-value pair
gather(key = col_key, value = new_col_name, old_col1, old_col2)
What gather helps us doing is when we have different kind of values of same type: say predictions and actual value and want to plot both of them into some graph. Rather than working with different columns, we can create a new column specified by name col_key which will have the type of data it is: old_col1 or old_col2. Then the column specified in new_col_name will be the resulting column you will have containing the value of the respective old columns specified by col_key. Now you can just group by that column and can do many stuff with the plot.
Rows to Columns [ Matrix to Dataframe]
rownames_to_column(as.data.frame(df_name), var = "new_col_name")
417 Words
2020-09-20 00:00 +0545