R Programming 102- Importing Data ::

Flat Files

read.csv()
read.table()
read.delim()

The utils package in R loads by default and contains all these read functions.

Arguments
```
read.csv("filname.csv", header = FALSE)
read.csv("filname.csv", header = FALSE, col_names= c("Column 1", "Column 2"), col_values = c("numeric", "factor"))
read.delim("filename.txt", header = TRUE)
read.table("filname.csv", header = FALSE, delimiter = "\t")
```
Read_table function is the main function operating behind the scenes even when we use csv or delim to read our files. They are wrappers to the table function with some default arguments for ease of use. There is also read_csv2 function that works with files separated by “;”.
- Header Parameter specifies if the first row of file is header or data.
- Delimiter Argument specifies how the column values are are separated.
- Column Names can be specified for ease of using columns.
- Data Types for columns can also be specified using the values argument.

Reading filename
```
read.csv(file.path("~", "Downloads", "filename.csv"))
```
To read filenames that are not in the main directory, we use file.path functionality. This also helps make paths platform independent.

Other Libraries

Readr Package
```
library(readr)
read_csv("filename.csv")
read_tsv("filename.tsv")
read_delim("filename.txt", col_names = FALSE, col_types ="ccdd", delim= "/", skip = num, n_max = num2)
```
Here tsv function is to work with files delimited by tabs. delim function in this library works similar to table function from utils package.
- Arguments
  - col_names : This argument specifies whether to use the first row as header or not. If you pass it a list, it will use the first row and give the column names from your list.
  - col_types: This is used to set the type of data in the respective column. It can be written shorthanded as e.g. “ccdd” in the above code block which means there are 4 columns out of which first 2 are characters and last 2 are doubles. You can also pass it a custom list of objects if you have one.
  - skip : No of rows to skip before returning the value.
  - n_max: No of rows to return.

Data.Table Package
```
library(data.read)
fread("csvfilename.csv")
fread("filename.csv", select = c(2,3))
fread("filename.csv", drop = c(2,3))
```
It infers column types and separators by default. It is extremely fast. Improved version of read.table function.
- Arguments
  - Select or Drop certain columns from the csv file based on number.

Excel Files

Readxl Package

library(readxl)
excel_sheets("filename.xlsx") # gives the name of sheets
read_excel("filename.xlsx", sheet = Number)
lapply(excel_sheets("filename.xlsx"),
       read_excel,
       path = "filename.xlsx")

The final lapply method here calls read_excel multiple times with path parameter that we specified. Since, each call returns a sheet, we get a list of sheets from the file.

Parameters
- col_names : FALSE or your vector/list of columns.
- col_types: NULL or any vector of data types.
- skip : No of rows to skip.
- n_max: No of rows to get.

Gdata Package

More traditional package that supports only xls files by default. Follows similar syntax to utils package.

library(gdata)
read.xls("filename.xls", sheet= sheetNum or sheetName)

XLConnect Package

Working with excel through R. Needs JDK.

library("XLConnect")
loadWorkBook("filename.xlsx") # the workbook_Obj we need.
getSheets(workbook_Obj)
readWorksheet(workbook_Obj, sheet= sheet_name,
              startRow = num,
              endRow = num,
              startCol = num,
              header = TRUE)

You can list the sheets available and read worksheet from the workbook object.

Other functionalities

createSheet(book_obj, "new_sheet_name")
writeWorksheet(book_obj, df, sheet="sheet_name")
renameSheet(book_obj, "old_name", "new_name")
removeSheet(book_obj, "sheet_name")
saveWorkbook(book_obj, file= "outputFileName.xlsx")

Backlinks

Data Scientist with R