Datacamp Intermediate Course Notes

Logical Operators

  • & means and and | means or operator.
  • && and || will first do element wise comparison on just the first element and give a single result.

Conditionals

  • if (condition){ } else if (condition2) {} else {}
  • However the else statement should be written on the same line as closing brace of if statement.
  • while(condition) {expression}
  • break for breaking the loop
  • for(item in item_list) { operate on item }
  • for (i in 1:length(list)) { operate on list[i] }
  • Double brackets when subsetting over a list.
  • nchar : Number of characters

Functions

  • help(fn_name) or ?fn_name to get documentation
  • Argument matching by position
  • fn_name <- function(args) {fn_body}
  • R passes arguments by values.

R Packages

  • install.packages(“pkg_name”) from CRAN Comprehensive R Archive Network
  • Load packages: library(“pkg_name”) and library(pkg_name) both works
  • View loaded packages: search()
  • 7 packages loaded by default such as base package, globalEnv and so on.
  • require() : Also works with and without string

Lapply()

  • Apply a functions to all items of a list one by one similar to map() in python.
  • Also it maintains the output list names as well from the original list.
  • For using operations on simple lists, we may not want names then we can use unlist() to actually get the same thing as operating one by one.
  • Also we can define some other function that we want to apply over all items one by one and use lapply for it.
  • We can have additional arguments as well for the operations.
  • Can also use similar to λ function in python using function() {} directly inside the lapply call.

Sapply(): Simplify apply

  • Generally lapply() can return heterogenous contents which may not be useful in all the cases that we are working with.
  • We can give option to not use original names from the original object using USE.NAMES = FALSE.
  • Also with some functions that return NULL when called with lapply, sapply works pretty fine. e.g. sum(), mean() and so on.

Vapply()

  • Need to specify output type
  • Lapply under the hood
  • FUN.VALUE => to specify what to return. e.g. numeric(1) will say that the output should be a single number.
  • There can be problems if output size differs from what you specify.

Useful Functions

  • seq(1,10,by=3)
  • seq(8,2,by=-2)
  • rep(list, times=n)
  • rep(list, each=n) : repeat element by element
  • sort(list, decreasing=TRUE)
  • str(): structure of component
  • is.*(): listobj.islist()
  • append(list, another_list)
  • as.*() : convert other objects to list. e.g. vector to list
  • rev(list) : reverses a list

Regex

  • grep, grepl, sub, gsub
  • grepl(pattern, list): Returns truth values
  • grep(pattern, list): Returns the indexes of matches.
  • which(grepl(…)): returns similar thing to grep
  • sub(pattern, replacement, list): replace one regex with the other. similar to re.sub() in python. Looks only for first match in the string.

Times and Dates

    • Sys.Date() for today
    • Sys.time() with time in small case gets time and date as well.
    • as.Date(“YYYY-MM-DD”) ISO format
    • else pass format to as.Date(format= “%Y-%d-%m”) and it will parse it properly for all kinds of dates
    • date can be added to numbers and also subtracted from each other.
    • time object increments time object by 1 second, date object increments date object by 1 day when +1 is used.
    • unclass(my_date) shows that the date is represented by simple numbers by which such operations can be done.
    • Other packages for this: “lubridate”, “zoo”, “xts”
    • Date Formats:
      • %Y: 4-digit year (1982)
      • %y: 2-digit year (82)
      • %m: 2-digit month (01)
      • %d: 2-digit day of the month (13)
      • %A: weekday (Wednesday)
      • %a: abbreviated weekday (Wed)
      • %B: month (January)
      • %b: abbreviated month (Jan)
    • format(dateobject, “%b %Y”) would show date as Jan 1987. Similarly, more can be done.
    • Time Formats:
      • %H: hours as a decimal number (00-23)
      • %I: hours as a decimal number (01-12)
      • %M: minutes as a decimal number
      • %S: seconds as a decimal number
      • %T: shorthand notation for the typical format %H:%M:%S
      • %p: AM/PM indicator
  • Example

    # Definition of character strings representing times
    str1 <- "May 23, '96 hours:23 minutes:01 seconds:45"
    str2 <- "2012-3-12 14:23:08"
    
    # Convert the strings to POSIXct objects: time1, time2
    time1 <- as.POSIXct(str1, format = "%B %d, '%y hours:%H minutes:%M seconds:%S")
    time2 <- as.POSIXct(str2)
    
    # Convert times to formatted strings
    format(time1, "%M")
    format(time2, "%I:%M %p")
    
    df = data.frame(
    x = c(4,5,6),
    y = c("a", "b", "c")
    )
    class(df)
    df["x"] <- c(1,2,3)
    df