Tuesday 29 November 2016

data.table

data.table package allows R to handle very large data sets, typically 10's or 100's of millions of rows, efficiently. This includes loading/importing the data and aggregating the data.
 
To import a flat file with very large number of rows, data.table provides fread function.
 
library(data.table)
Data<- fread("data.csv", sep = ",", header = TRUE)

  
To aggregate the data set: 
  
Agg <- as.data.table(iris)[, list(Avg_Sepal_Length = mean(Sepal.Length)), by = "Species"]
 
When aggregating multiple columns at the same time:
 
AggMC <- as.data.table(iris)[, list(Avg_Sepal_Length = mean(Sepal.Length), Avg_Petal_Length = mean(Petal.Length)), by = "Species"]
 
When aggregating all columns other than the grouping column:
 
AggAC <- as.data.table(iris)[, lapply(.SD, mean), by = "Species"]
 
   
When aggregating by multiple grouping columns:

AggMCMG <- as.data.table(CO2)[, list(Avg_Conc = mean(conc), Total_Uptake = sum(uptake)), by = c("Plant", "Type")]





Tuesday 25 October 2016

Passing parameters to R script from command line


To pass parameters to the R script when running the script from the command line, commandArgs( ) can be used.

Example: 

Save the below script in a file called 'DateRange.r'

Para <- commandArgs() 
DATE <- as.Date(as.character(Para[6]), format = "%Y%m%d")
N <- as.numeric(Para[7])
DateRge <- data.frame(Date = seq(from = DATE, length.out = N, by = 1), Value = rnorm(N))

Then, run the below command with the parameters inserted at the end


For Windows

If the below path is saved in your environment variable, you can simply use 'Rscript' without writing out the full path.

"C:\Program Files\R\R-3.2.3\bin\Rscript.exe" DateRange.r [date in yyyymmdd format (DATE)] [length of sequence (N)] 

"C:\Program Files\R\R-3.2.3\bin\Rscript.exe" DateRange.r 20161005 5
will return:
       Date       Value
 2016-10-05  1.61637011
 2016-10-06 -0.08534756
 2016-10-07 -2.24108808
 2016-10-08  0.05773242
 2016-10-09  0.73725642 


For Linux
 
Similar to Windows, you can use Rscript command

Rscript DateRange.r yyyymmdd N 

Rscript DateRange.r 20161005 5 
        Date      Value
 2016-10-05 -0.7931385
 2016-10-06 -0.4229764
 2016-10-07 -0.3338677
 2016-10-08 -1.0844999








Friday 20 May 2016

send emails from R through Outlook


This assumes Outlook Application is installed and your account is set up etc...

Also, you may need to restart Outlook after installing the package in R, if you get an error like 'Error: Exception occurred.'.


library(RDCOMClient)

OutApp <- COMCreate("Outlook.Application")  
outMail = OutApp$CreateItem(0) 

outMail[["To"]] = "recipient's email address" 
outMail[["subject"]] = "subject" 
outMail[["body"]] = "body text" 

outMail$Send()



To send emails to multiple recipients, use semicolon (;) to separate email addresses:

OutApp <- COMCreate("Outlook.Application") 
outMail = OutApp$CreateItem(0)

outMail[["To"]] = "recipient's email address 1; recipient's email address 2"
outMail[["subject"]] = "subject" 
outMail[["body"]] = "body text" 

outMail$Send()



To send emails with attachment(s):

OutApp <- COMCreate("Outlook.Application") 
outMail = OutApp$CreateItem(0)

outMail[["To"]] = "recipient's email address"
outMail[["subject"]] = "subject" 
outMail[["body"]] = "body text" 

outMail[["Attachments"]]$Add("full path to file")     
#e.g. "C:/Users/Documents/someFile.txt"
#note the use of forward slash instead of back slash as you'd normally do in R when setting path to the attachment 

outMail$Send()



To embed table within the body of the email:
library(pander) 

panderOptions('table.split.table', Inf)

OutApp <- COMCreate("Outlook.Application") 
outMail = OutApp$CreateItem(0)

outMail[["To"]] = "recipient's email address"
outMail[["subject"]] = "subject" 
outMail[["body"]] = paste("Hello!", "", "The below summarises xxx:", pandoc.table.return(data.frame(V1 = 1:5, V2 = LETTERS[1:5])), sep = "\n")

outMail$Send()