Tuesday, 29 November 2016

data.table

data.table package allows R to handle very large data sets, typically 10's or 100's of millions of rows, efficiently. This includes loading/importing the data and aggregating the data.
 
To import a flat file with very large number of rows, data.table provides fread function.
 
library(data.table)
Data<- fread("data.csv", sep = ",", header = TRUE)

  
To aggregate the data set: 
  
Agg <- as.data.table(iris)[, list(Avg_Sepal_Length = mean(Sepal.Length)), by = "Species"]
 
When aggregating multiple columns at the same time:
 
AggMC <- as.data.table(iris)[, list(Avg_Sepal_Length = mean(Sepal.Length), Avg_Petal_Length = mean(Petal.Length)), by = "Species"]
 
When aggregating all columns other than the grouping column:
 
AggAC <- as.data.table(iris)[, lapply(.SD, mean), by = "Species"]
 
   
When aggregating by multiple grouping columns:

AggMCMG <- as.data.table(CO2)[, list(Avg_Conc = mean(conc), Total_Uptake = sum(uptake)), by = c("Plant", "Type")]