Databases

Importing Library

library(DBI)

Creating Connection

con <-dbConnect(RMySQL::MySQL(),
                dbname = "db_name",
                host = "localhost",
                port = port_num,
                user = "username",
                password = "my_password")

This connection object is specific to MySQL database.

Importing data

  • Read Whole Tables

    dbListTables(con)
    dbReadTable(con, "table_name")
    
  • Read selected data from tables

    dbGetQuery(con, "SELECT * FROM TABLE_NAME WHERE date > '2020-02-01'")
    res <- dbSendQuery(con, "query string")
    dbFetch(res)
    dbClearResult(res)
    

    The dbGetQuery first sends the query and fetches it. We can do it using dbsendQuery and then dbFetch(obj) to get the result we want. The fetch function can be useful when we want to import records by records rather than having everything at once.

URLs and Other data formats

Reading data from URLs

library(readr)
read_csv("http://xyz.com/filename.csv")
read_tsv("http://xyz.com/filename.tsv")
read_csv("https://xyz.com/filename.csv")
read.csv("https://xyz.com/filename.csv")

Both read.csv method from utils package and read_csv from readr package work fine with urls. It works for fine for all kinds of files if they can be parsed into tabular formats. However read_excel function from readxl package doesn’t support urls.

file_path <- file.path("~", "Downloads", "filename.xlsx")
file_url <- "https://xyz.com/filename.xlsx"
download.file(file_url, file_path)
load("file_name.RData")

In loading RData files, you can’t pass URLs to it. Instead you need to save the file first and then load it using load function.

HTTP Queries

library(httr)
response <- GET(url)
raw_content <- content(response, as="raw") # parsing
raw_content <- content(response, as="text") # parsing
raw_content <- content(response) # parsing json works by default

JSON Files

ID: 2ccc63df-4382-43a4-a611-0a929a24d8e5
library(jsonlite)
fromJSON(url)
fromJSON(json_object)
toJSON(df)
minify(json_obj)
prettify(json_obj)
toJSON(df, pretty=TRUE) # else minified is used by default

Haven Package

library(haven)
read_sas("ontime.sas7bdat")
read_stata()
read_dta()
read_spss()
read_por()
read_sav()

Foreign Package

library(foreign)
read.dta("filename.dta", convert.factors = FALSE, convert.types= TRUE, missing.type = TRUE, convert.underscore = FALSE)
read.spss(file, use.value.labels = FALSE, to.data.frame = FALSE)

Since here the arguments to different functions have different names, it is less consistent. Easier to go through the documentation than explaining it all.