Monday, 15 December 2014

boxplot & bwplot


'boxplot' is perhaps thought of as another way of representing 'densityplot' or distribution of the data. Hence, it is useful for comparing data by groups or categories within the data.

It works well with large dataset, compared to scatter plots which struggles when there are too many data points to plot.

The graphic parameters used for boxplot() are similar to plot().

Example:

boxplot(count~spray,InsectSprays)


























The above example show outliers. To remove these outliers from the graph, set 'outline' to FALSE.

boxplot(count~spray,InsectSprays,outline=FALSE)




























'bwplot' is 'lattice' package's version of boxplot, and the graphic parameters used for boxplot() are similar to xyplot().


Example:


bwplot(uptake~Type,CO2)




To remove outliers, set do.out=FALSE.

bwplot(uptake~Type,CO2,do.out=FALSE)



























With split panels,

bwplot(uptake~Treatment|Type,CO2)























To change colour of the median point

bwplot(uptake~Treatment|Type,CO2,col="red")






















To change colour of the median point and rectangle frame,

bwplot(uptake~Treatment|Type,CO2,col="red",
par.settings=list(box.rectangle=list(col="salmon",alpha=0.4)))























To change colour of the median point, rectangle frame and rectangle body,

bwplot(uptake~Treatment|Type,CO2,col="red",
par.settings=list(box.rectangle=list(col="salmon",fill="salmon",alpha=0.4)))























To change colour of the median point, rectangle frame, rectangle body and umbrella,


bwplot(uptake~Treatment|Type,CO2,col="red",
par.settings=list(box.rectangle=list(col="salmon",fill="salmon",alpha=0.4),box.umbrella=list(col="salmon",alpha=0.4)))





















To change colour of the median point, rectangle frame, rectangle body, umbrella and outlier,


bwplot(uptake~Treatment|Type,CO2,col="red",
par.settings=list(box.rectangle=list(col="salmon",fill="salmon",alpha=0.4),box.umbrella=list(col="salmon",alpha=0.4),plot.symbol=list(col="salmon",alpha=0.4)))























To overlap two bwplots for comparison,


bwplot(Sepal.Length~Species,iris,col="red",scale=list(y=list(lim=c(0,10))),key=list(space="top",column=2,text=list(label=c("Sepal Length","Petal Length"),cex=0.8,col="darkgrey"),
rectangles=list(col=c("salmon","dodgerblue"),rectangles=c("salmon","dodgerblue"),alpha=0.4,size=3)),
par.settings=list(box.rectangle=list(col="salmon",fill="salmon",alpha=0.4),box.umbrella=list(col="salmon",alpha=0.4),plot.symbol=list(col="salmon",alpha=0.4)))+
as.layer(bwplot(Petal.Length~Species,iris,col="blue",
par.settings=list(box.rectangle=list(col="dodgerblue",fill="dodgerblue",alpha=0.4),box.umbrella=list(col="dodgerblue",alpha=0.4),plot.symbol=list(col="dodgerblue",alpha=0.4)))
)








Blank Plots & Graphic Devices

To create blank plots, there are few options:

from 'base' package:

plot.new()

frame()

plot(1,type="n",axes=FALSE,xlab="",ylab="")



blank plots without margins

par(mar=rep(0,4)) 
plot(1,type="n",axes=FALSE,xlab="",ylab="")



using 'lattice' package:

xyplot(1~1,col="transparent",scales=list(draw=FALSE),xlab="",ylab="",par.settings=list(axis.line=list(col=0)))


To open graphic device:

x11()       #for linux, but also works for other OS if installed  

windows()     #for windows  

quartz()      #for Mac 

Friday, 17 October 2014

Regular Expression

Regular expression can be used in R to substitute, extract or search of pattern of texts.


To extract a part of texts, we can use sub() with "\\n" as the replacement (n is an integer, usually 1 or 2). The example extracts date from a file name "ABC_2014_10_23.csv".

sub("^[A-Za-z]+(_)([0-9]{4}(_)[0-9]{2}(_)[0-9]{2})(.)[A-Za-z]+","\\2","ABC_2014_10_23.csv")

"2014_10_23"


To replace certain words with another| word, we can do the following.

gsub("\\s*\\b(meters|m|meter)\\b","m",
"DAS FE TDAS 5meter ADSA RE 12 meters SA 23 Meters DASD 3 m",
ignore.case=TRUE)

"DAS FE TDAS 5meter ADSA RE 12m SA 23m DASD 3m"


To find fields matching the pattern, we can use grep().

The below looks for numbers followed by alphabet:

A<-c("SDA09DA","ADCZFD","081382","ASDF8673")
A[grep("[A-Z]+[0-9]+",A)]

"SDA09DA"  "ASDF8673"


The below looks for numbers between alphabets:

A<-c("SDA09DA","ADCZFD","081382","ASDF8673")
A[grep("[A-Z]+[0-9]+[A-Z]",A)]

"SDA09DA"


The below looks for more complex pattern. This example looks for typical address format.

A<-c("100 ASD ST DASER","CSADAS SS ASD ADA",
"321 DSA XCSACXZ SADAS 213","321/32 ASA ASDDD RD WEASF",
"DSA 231 DDG BGBVCVB","SUITE 1/43 SAFDSA AVE ASDA",
"UNIT 21/2 ADSD AV SDADFF")

A[grep("^[A-Z]*\\s*[0-9]+\\s*(/)*\\s*[0-9]*\\s*[A-Z]+\\s*[A-Z]*\\s*(ST|RD|AVE|AV)\\s*[A-Z]*$",A)]

"100 ASD ST DASER"           
"321/32 ASA ASDDD RD WEASF" 
"SUITE 1/43 SAFDSA AVE ASDA" 
"UNIT 21/2 ADSD AV SDADFF"  



For more regular expression language, go to either of the following websites:


http://msdn.microsoft.com/en-us/library/az24scfc(v=vs.110).aspx

http://www.rexegg.com/regex-quickstart.html











Functions for Arithmatic

The following are some of the useful functions of arithmatic in R apart from +, -, *, /, ^ and sqrt()


To set limits before using scientific notation for large numbers, you can set the 'scipen' parameter within options(). The default limit is 10^12 for displaying in R console and 10^7 for the graphic device.

The below is larger than 10^13, and R reverts to using scientific notation.

31423190532586

3.142319e+13


By setting the option to scipen=15, the same number is retained in its natural form.

options(scipen=15)
31423190532586

31423190532586


For plotting:

plot(1:10,seq(0,100000000,length=10))
 


















options(scipen=8)
plot(1:10,seq(0,100000000,length=10))




















To round the figure to significant figures, you can use the below function.

signif(12645.654,digits=2)

13000


signif(0.00034526,digits=2)

0.00035



For rounding figures, you can do the following.

round(12645.654)  #rounds to integer

12646


round(12645.654,digits=2) #rounds to second decimal place

12645.65


round(0.00034526,digits=5)

0.00035



To find ceiling (smallest integer large than the given number) of a figure, you can do the following.

ceiling(12645.654)  

12646


ceiling(0.00034526)  

1


To find ceiling (largest integer smaller than the given number) of a figure, you can do the following.

floor(12645.654)

12645


floor(0.00034526)  

0



To find absolute figure, you can do the following.

abs(-243)

243


To find the remainder of a division, you can use '%%', but this does not work well with large numbers

15%%2

1


To find quotient of a division (integer part of division output), you can use '%/%'.

15%/%2

7




Thursday, 16 October 2014

arranging layout of lattice graphs using gridExtra package

gridExtra allows arranging multiple graphs in the device, even when those graphs are drawn from different data sources. This package only work for lattice and ggplot graphs.


library(lattice)
library(grid)
library(gridExtra)

dat<-aggregate(peri~perm,rock,sum)

dat1<-aggregate(rock[,c("peri","shape")],by=list(rock$perm),sum)
colnames(dat1)[1]<-"perm"


p1<-barchart(peri~factor(perm),dat,xlab="perm",ylab="peri",
main="data: rock aggregated 1",horizontal=FALSE,col="coral2",border="coral2")

p2<-xyplot(shape~factor(perm),dat1,xlab="perm",ylab="shape",col="seagreen2",
type="l",lty=1,lwd=2,main="data: rock aggregated 2")

p3<-xyplot(shape~peri,rock,xlab="peri",ylab="shape",col="maroon2",type="p",
cex=1.5,main="data: rock",pch=16)



grid.arrange(p1,p2,p3,ncol=1)





























grid.arrange(p2,arrangeGrob(p1,p3,widths=c(3/5,2/5),ncol=2),ncol=1)























To add the header or title,


grid.arrange(p2,arrangeGrob(p1,p3,widths=c(3/5,2/5),ncol=2),ncol=1,
main=textGrob("Rock",gp=gpar(cex=1.5,col="red")))


gridExtra versions later than 2.0.0 need to use "top" instead of "main" and "bottom" instead of "sub" for titles and subtitles respectively.


  



  
  
























To include a table in the display:


TBL<-summary(dat)

grid.arrange(arrangeGrob(
tableGrob(TBL, gp = gpar(cex = 0.8), show.rownames = FALSE, 
padding.h = unit(5, "mm")), 
p2, ncol = 2, widths = c(2/5, 3/5)), 
arrangeGrob(p1, p3, widths = c(3/5, 2/5), ncol = 2), ncol = 1,
main = textGrob("Rock", gp = gpar(cex = 1.5, col = "red")))



gridExtra versions later than 2.0.0 do not use gpar for tableGrob, hence the below code will need to be used instead.
  

grid.arrange(arrangeGrob(
tableGrob(TBL, theme = ttheme_default(core = list(fg_params = list(cex=0.8, hjust = 0.5)),
colhead = list(fg_params = list(cex=0.8))), rows = NULL), 
p2, ncol = 2, widths = c(2/5, 3/5)), 
arrangeGrob(p1, p3, widths = c(3/5, 2/5), ncol = 2), ncol = 1,
top = textGrob("Rock", gp = gpar(cex = 1.5, col = "red")))






























The gridExtra version >= 2.0.0 allows more formatting options for tableGrob. For example, changing colours of background:
  

grid.arrange(arrangeGrob(
tableGrob(TBL, theme = ttheme_default(core = list(fg_params = list(cex=0.8, hjust = 0.5),
bg_params = list(fill = c("tan", "wheat"))),
colhead = list(fg_params = list(cex = 0.8, col = "white"), bg_params = list(fill = "brown"))), rows = NULL), 
p2, ncol = 2, widths = c(2/5, 3/5)), 
arrangeGrob(p1, p3, widths = c(3/5, 2/5), ncol = 2), ncol = 1, 
top = textGrob("Rock", gp = gpar(cex = 1.5, col = "red")))
 
 







 














 
 
 
 

 

overlaying graphs from different sources (lattice)

To superimpose graphs (lattice) from different sources, we need "latticeExtra" package.

require("lattice")
require("latticeExtra")


#data source 1

dat<-aggregate(peri~perm,rock,sum)


#data source 2

dat1<-aggregate(rock[,c("peri","shape")],by=list(rock$perm),sum)
colnames(dat1)[1]<-"perm"


Draw barchart using data source 1

barchart(peri~factor(perm),dat,xlab="perm",ylab="peri",
main="data: rock",horizontal=FALSE,col="tan",border="tan")






















This graph shows tick marks on the right hand side, but we want to have secondary y-axis with different scales showing there. par.settings=list(axis.components=list(right=list(tck=0))) will remove tick marks from the designated side of axis.

barchart(peri~factor(perm),dat,xlab="perm",ylab="peri",main="data: rock",
horizontal=FALSE,col="tan",border="tan",
par.settings=list(axis.components=list(right=list(tck=0))))























Now, to add the second graph, we use as.layer() function. The controls such as "y.same=FALSE" allows y-axis to have different scales, and control "under" will add the second graph on top (FALSE) or underneath (TRUE) the first graph.

barchart(peri~factor(perm),dat,xlab="perm",ylab="peri",main="data: rock",
horizontal=FALSE,col="tan",border="tan",
par.settings=list(axis.components=list(right=list(tck=0))))+
as.layer(xyplot(shape~factor(perm),dat1,col="brown",type="l",lty=1,lwd=2),
under=FALSE,y.same=FALSE,x.same=TRUE)























This graph now displays the new y-axis scale, but there are no tick marks. To add tick marks on the right hand side, we do the following. tck in the second graph is set to -1 to make the tick mark appear outside the plotting region.

barchart(peri~factor(perm),dat,xlab="perm",ylab="peri",main="data: rock",
horizontal=FALSE,col="tan",border="tan",
par.settings=list(axis.components=list(right=list(tck=0))))+
as.layer(xyplot(shape~factor(perm),dat1,col="brown",type="l",lty=1,lwd=2,
par.settings=list(axis.components=list(right=list(tck=-1)))),
under=FALSE,y.same=FALSE,x.same=TRUE)























This graph now displays the new y-axis scale with tick marks, but there is no axis label. To add y-axis label on the right hand side, we use "ylab.right" in the first graph.

barchart(peri~factor(perm),dat,xlab="perm",ylab="peri",main="data: rock",
horizontal=FALSE,col="tan",border="tan",ylab.right="shape",
par.settings=list(axis.components=list(right=list(tck=0))))+
as.layer(xyplot(shape~factor(perm),dat1,col="brown",type="l",lty=1,lwd=2,
par.settings=list(axis.components=list(right=list(tck=-1)))),
under=FALSE,y.same=FALSE,x.same=TRUE)























This graph still shows tick labels inside the plotting region. To place these labels outside, we can set pad1 to -4 in the second graph. The numeric value determines the position of labels against the axis. To move the axis label further away from axis to avoid overlapping of tick labels and axis label, we can control "axis.key.padding" in the first graph.

barchart(peri~factor(perm),dat,xlab="perm",ylab="peri",main="data: rock",
horizontal=FALSE,col="tan",border="tan",ylab.right="shape",
par.settings=list(axis.components=list(right=list(tck=0)),
layout.widths=list(axis.key.padding=4)))+
as.layer(xyplot(shape~factor(perm),dat1,col="brown",type="l",lty=1,lwd=2,
par.settings=list(axis.components=list(right=list(tck=-1,pad1=-4)))),
under=FALSE,y.same=FALSE,x.same=TRUE)
























To allow bigger margin for right hand side to leave some space beyond axis label, we can control "right.padding" in the first graph.

barchart(peri~factor(perm),dat,xlab="perm",ylab="peri",main="data: rock",
horizontal=FALSE,col="tan",border="tan",ylab.right="shape",
par.settings=list(axis.components=list(right=list(tck=0)),
layout.widths=list(axis.key.padding=4,right.padding=3)))+
as.layer(xyplot(shape~factor(perm),dat1,col="brown",type="l",lty=1,lwd=2,
par.settings=list(axis.components=list(right=list(tck=-1,pad1=-4)))),
under=FALSE,y.same=FALSE,x.same=TRUE)




























Wednesday, 15 October 2014

Plot - Colours and Plotting Characters

When plotting scatterplot graphs 'pch' determines the shape of the plots used. Usually, there are 25 symbols used for plotting in general, but depending on your R session locale settings, you can have more options. You can use sessionInfo() or Sys.getlocale() to check your locale settings, and change it with Sys..setlocale().

For  LC_CTYPE="English_Australia", there are 247 characters you can choose from pch parameter. The below will plot these options in the graphic device. Note, pch 26 to 31 inclusive are often left blank.

plot(rep(c(1:20),times=13),rep(seq(2,26,by=2),each=20),
pch=1:260,yaxt="n",xaxt="n",xlab="",ylab="",
main="pch values and corresponding displays",cex.main=1,frame=FALSE)
text(rep(c(1:20),times=13),rep(seq(2.7,26.7,by=2),each=20),
label=c(1:260),cex=0.8,col="darkgrey")



































R has 657 predefined colours that is included in the base package. These can be listed by colours(). The below chart shows the names and colours available.

par(mai=c(0,0,0.5,0))
plot(rep(seq(0,9,by=1),each=66),rep(seq(1,66,by=1),times=10),pch=15,
col=colours(),cex=1.5,yaxt="n",xaxt="n",xlab="",ylab="",
main="colour options",cex.main=1,frame=FALSE,xlim=c(0,10.5))
text(rep(seq(0.5,9.5,by=1),each=66),rep(seq(1,66,by=1),times=10),
label=colours(),cex=0.7,col="darkgrey")





























Converting named-colour to RGB 
  
col2rgb("brown")  
              [,1]  
red           165  
green       42  
blue        42  
  

Converting RGB to hexadecimal colour (hex colour)  
  
rgb(165, 42, 42, maxColorValue = 255)  
   
[1] "#A52A2A"  
  


Converting named-colour to hex colour
  
colName <- "purple"  
  
rgb(col2rgb(colName)[1], col2rgb(colName)[2], col2rgb(colName)[3],  maxColorValue = 255)  
  
[1] "#A020F0"  
  
  

Colour gradient is often useful for beautifying graphs and visualising values of continuous variables. The following shows grey scale colour gradient, set at 100 units.

col_gradient<-colorRampPalette(c("grey","white"))
COL1<-col_gradient(100)

plot(1:100,1:100,pch=18,xlab="",ylab="",
main="Colour gradient - grey scale",cex.main=1,frame=TRUE,col=COL1)


























The following is for single colour gradient.


col_gradient<-colorRampPalette(c("red","grey"))
COL1<-col_gradient(100)

plot(1:100,1:100,pch=18,xlab="",ylab="",
main="Colour gradient - single colour",cex.main=1,frame=TRUE,col=COL1)


























R can have gradient between 2 colours as shown below.

col_gradient<-colorRampPalette(c("red","blue"))
COL2<-col_gradient(100)

plot(1:100,1:100,pch=18,xlab="",ylab="",
main="Colour gradient - two colours",cex.main=1,frame=TRUE,col=COL2)



























You can also have gradient between multiple colours. This example has 3 colours.

col_gradient<-colorRampPalette(c("red","green","blue"))
COL3<-col_gradient(100)

plot(1:100,1:100,pch=18,xlab="",ylab="",
main="Colour gradient - multiple colours",cex.main=1,frame=TRUE,col=COL3)


























The colours can be converted to rgb code by using col2rgb() function as shown below.

col2rgb("red")

      [,1]
red    255
green    0
blue     0


col2rgb("blue")

      [,1]
red      0
green    0
blue   255


More than one colour can be converted at the same time as shown below which produces output as a matrix.

col2rgb(c("red","blue"))

         [,1]         [,2]
red      255        0
green    0          0
blue      0        255


Mixing colour is not so straight forward but is not so complex neither. However, logical outcome of a colour mix is not always same in R as it is normally understood in practice. For example, mixing blue and red should produce purple, but rgb of respective colours show that purple has a hint of green in the mix. 

rbp<-col2rgb(c("red","blue","purple"))
colnames(rbp)<-c("red","blue","purple")
rbp

          red    blue    purple
red     255      0       160
green   0        0        32
blue     0      255     240


If you follow the simplistic approach and take a mean of rgb values between source colours (red and blue), you have the following colour that is compared with predefined purple colour.

Colrgb<-col2rgb(c("red","blue"))
p<-apply(Colrgb,1,mean)

plot(1:2,rep(1,2),col=c("purple",rgb(p[1],p[2],p[3],max=255)),pch=16,cex=10,
yaxt="n",xaxt="n",xlab="",ylab="",frame=TRUE,xlim=c(0,3),ylim=c(0,2))
text(1:2,rep(1.5,2),label=c("purple","rgb mean"),cex=1,col="darkgrey")

























Alternatively, we can use colorRampPalette() as seen above and pick the mid-point between 2 colours.

col_gradient<-colorRampPalette(c("red","blue"))
COL2<-col_gradient(100)

plot(1:3,rep(1,3),col=c("purple",rgb(p[1],p[2],p[3],max=255),COL2[50]),pch=16,
cex=10,yaxt="n",xaxt="n",xlab="",ylab="",frame=TRUE,xlim=c(0,4),ylim=c(0,2))
text(1:3,rep(1.5,3),label=c("purple","rgb mean","palette"),cex=1,col="darkgrey")



















There is a minor difference between these colours and they can be seen from rgb values shown below.

             purple     average     palette
red          160        127.5         128
green       32            0               0
blue        240        127.5         126


The practical application is when you have overlapping regions in the shaded areas of the graph. 

The below is when you do not assign colour to the overlapping region, but use lighter density to display he overlap. In other graph functions, this would be equivalent to alpha parameter.

plot(sin(seq(-pi,pi,length=100)),cos(seq(-pi,pi,length=100)),type="l",
col=rgb(255,0,0,max=255),xlim=c(-1,3),yaxt="n",xaxt="n",xlab="",ylab="",frame=FALSE,
main="Shaded Region between Red and Blue Circles without colouring",cex.main=1)

polygon(sin(seq(-pi,pi,length=100)),cos(seq(-pi,pi,length=100)),density=60,angle=90,
col=rgb(255,0,0,max=255),fillOddEven=FALSE,border=NA)

lines(sin(seq(-pi,pi,length=100))+1.5,cos(seq(-pi,pi,length=100)),type="l",
col=rgb(0,0,255,max=255),yaxt="n",xaxt="n",xlab="",ylab="")

polygon(sin(seq(-pi,pi,length=100))+1.5,cos(seq(-pi,pi,length=100)),density=60,angle=0,
col=rgb(0,0,255,max=255),fillOddEven=FALSE,border=NA)




















The below compares the use of purple, average rgb and palette for the overlapping region. You can decide which one looks more natural and suit the purpose.

A<-data.frame(theta=seq(-pi,pi,length=100),
cbind(sin(seq(-pi,pi,length=100)),rev(sin(seq(-pi,pi,length=100))+1.5)))
A$ind<-A$X1-A$X2
A<-A[which(A$ind>0),]
A$Y<-cos(A$theta)

B<-data.frame(theta=seq(-pi,pi,length=100),
cbind(rev(sin(seq(-pi,pi,length=100))),sin(seq(-pi,pi,length=100))+1.5))
B$ind<-B$X1-B$X2
B<-B[which(B$ind>0),]
B$Y<-cos(B$theta)


par(mfrow=c(3,1))

plot(sin(seq(-pi,pi,length=100)),cos(seq(-pi,pi,length=100)),type="l",col="red",
xlim=c(-1,3),yaxt="n",xaxt="n",xlab="",ylab="",frame=FALSE,
main=paste("Shaded Region between Red and Blue Circles",
"with purple colour",sep="\n"),cex.main=1)
polygon(sin(seq(-pi,pi,length=100)),cos(seq(-pi,pi,length=100)),col="red",
fillOddEven=FALSE,border=NA)
lines(sin(seq(-pi,pi,length=100))+1.5,cos(seq(-pi,pi,length=100)),type="l",col="blue")
polygon(sin(seq(-pi,pi,length=100))+1.5,cos(seq(-pi,pi,length=100)),col="blue",
fillOddEven=FALSE,border=NA)
polygon(c(A$X1,B$X2),c(A$Y,B$Y),col="purple",border="purple")

plot(sin(seq(-pi,pi,length=100)),cos(seq(-pi,pi,length=100)),type="l",col="red",
xlim=c(-1,3),yaxt="n",xaxt="n",xlab="",ylab="",frame=FALSE,
main=paste("Shaded Region between Red and Blue Circles",
"with average rgb of red and blue",sep="\n"),cex.main=1)
polygon(sin(seq(-pi,pi,length=100)),cos(seq(-pi,pi,length=100)),col="red",
fillOddEven=FALSE,border=NA)
lines(sin(seq(-pi,pi,length=100))+1.5,cos(seq(-pi,pi,length=100)),type="l",col="blue")
polygon(sin(seq(-pi,pi,length=100))+1.5,cos(seq(-pi,pi,length=100)),col="blue",
fillOddEven=FALSE,border=NA)
polygon(c(A$X1,B$X2),c(A$Y,B$Y),col=rgb(p[1],p[2],p[3],max=255),
border=rgb(p[1],p[2],p[3],max=255))

plot(sin(seq(-pi,pi,length=100)),cos(seq(-pi,pi,length=100)),type="l",col="red",
xlim=c(-1,3),yaxt="n",xaxt="n",xlab="",ylab="",frame=FALSE,
main=paste("Shaded Region between Red and Blue Circles",
"with colour gradient function",sep="\n"),cex.main=1)
polygon(sin(seq(-pi,pi,length=100)),cos(seq(-pi,pi,length=100)),col="red",
fillOddEven=FALSE,border=NA)
lines(sin(seq(-pi,pi,length=100))+1.5,cos(seq(-pi,pi,length=100)),type="l",col="blue")
polygon(sin(seq(-pi,pi,length=100))+1.5,cos(seq(-pi,pi,length=100)),col="blue",
fillOddEven=FALSE,border=NA)
polygon(c(A$X1,B$X2),c(A$Y,B$Y),col=COL2[50],border=COL2[50])










Monday, 13 October 2014

Publishing HTML or PDF from R - 'knitr' and 'rmarkdown'

This shows how html and pdf documents could be produced from R console, outside of RStudio. While RStudio offers simpler and more convenient interface to produce these documents, using R console offer options to schedule these documents to be generated through batch processing which would come in handy for daily reports etc.



With 'knitr' package, you can publish html documents out of rmd files.

library(knitr)


knit2html("C:/your/file//location/Input.Rmd","C:/your/file//location/Output.html")



With 'rmarkdown' package, you can convert rmd to pdf.

At the time of writing this blog, in Windows 8 or higher, pandoc installation program does not register its path to exe file automatically, hence you need to set the path manually. One option is to use Sys.setenv() function within R to point to where pandoc application is.

library(rmarkdown)


Sys.setenv(PATH=paste(Sys.getenv("PATH"),"C:\\your\\file\\location\\Pandoc",sep=";"))

render("C:/your/file//location/Input.Rmd","pdf_document",
output_file="C:/your/file//location/Output.pdf",
output_dir=getwd())





Date & Time

Date and Time formats and conversions in R


#as.Date
#yyyy/mm/dd
Date1<-c("2013/07/15","2013/08/13","2013/07/23","2014/01/31","2013/02/13","2014/07/13")
as.Date(Date1,format="%Y/%m/%d",tz="Australia/Sydney")

[1] "2013-07-15" "2013-08-13" "2013-07-23" "2014-01-31" "2013-02-13"
[6] "2014-07-13"


#yy/mm/dd
Date2<-c("13/07/15","13/08/13","13/07/23","14/01/31","13/02/13","14/07/13")
as.Date(Date2,format="%y/%m/%d",tz="Australia/Sydney")

[1] "2013-07-15" "2013-08-13" "2013-07-23" "2014-01-31" "2013-02-13"
[6] "2014-07-13"


#yy/m/dd
Date3<-c("15/7/13","13/8/13","23/7/13","14/1/31","13/2/13","14/7/13")
as.Date(Date3,format="%y/%m/%d",tz="Australia/Sydney")

[1] "2015-07-13" "2013-08-13" "2023-07-13" "2014-01-31" "2013-02-13"
[6] "2014-07-13"


#dd mmm yyyy
Date4<-c("15 jul 2013","13 aug 2013","23 jul 2013","31 jan 2014","13 Feb 2013",
"13 Jul 2014")
as.Date(Date4,format="%d %b %Y",tz="Australia/Sydney")

[1] "2013-07-15" "2013-08-13" "2013-07-23" "2014-01-31" "2013-02-13"
[6] "2014-07-13"


#dd month yyyy
Date5<-c("15 july 2013","13 August 2013","23 july 2013","31 january 2014",
"13 February 2013","13 July 2014")
as.Date(Date5,format="%d %B %Y",tz="Australia/Sydney")

[1] "2013-07-15" "2013-08-13" "2013-07-23" "2014-01-31" "2013-02-13"
[6] "2014-07-13"


#weekday  abbreviated
DateProper<-as.Date(Date5,format="%d %B %Y",tz="Australia/Sydney")
format(DateProper,"%a")

[1] "Mon" "Tue" "Tue" "Fri" "Wed" "Sun"


#weekday full
format(DateProper,"%A")

[1] "Monday"    "Tuesday"   "Tuesday"   "Friday"    "Wednesday" "Sunday"


#weekdays alternative
weekdays(DateProper)

[1] "Monday"    "Tuesday"   "Tuesday"   "Friday"    "Wednesday" "Sunday"


#ISO C99 date format
format(DateProper,"%D")

[1] "07/15/13" "08/13/13" "07/23/13" "01/31/14" "02/13/13" "07/13/14"


#ISO 8601 date format
format(DateProper,"%F")

[1] "2013-07-15" "2013-08-13" "2013-07-23" "2014-01-31" "2013-02-13"
[6] "2014-07-13"


#day of month
format(DateProper,"%e")

[1] "15" "13" "23" "31" "13" "13"


#day of year
format(DateProper,"%j")

[1] "196" "225" "204" "031" "044" "194"


#week of year where week 1 is first full week staring Sunday
format(DateProper,"%U")

[1] "28" "32" "29" "04" "06" "28"


#week of year where week 1 is first week containing 4 or more days of the new year
format(DateProper,"%V")

[1] "29" "33" "30" "05" "07" "28"


#week of year where week 1 is first full week staring Monday
format(DateProper,"%W")

[1] "28" "32" "29" "04" "06" "27"


#strptime
#24 hour format
TimeStamp1<-c("2013/07/15 13:10:23","2013/08/13 13:10:23",
"2013/07/23 19:10:25","2014/01/31 03:10:23","2013/02/13 13:50:53","2014/07/13 21:13:43")

strptime(TimeStamp1,format="%Y/%m/%d %H:%M:%S",tz="Australia/Sydney")


[1] "2013-07-15 13:10:23 EST" "2013-08-13 13:10:23 EST"
[3] "2013-07-23 19:10:25 EST" "2014-01-31 03:10:23 EST"
[5] "2013-02-13 13:50:53 EST" "2014-07-13 21:13:43 EST"



#AM/PM format
TimeStamp2<-c("2013/07/15 1:10:23 PM","2013/08/13 1:10:23 PM",
"2013/07/23 7:10:25 PM","2014/01/31 3:10:23 AM","2013/02/13 1:50:53 PM",
"2014/07/13 9:13:43 PM")

TSProper<-strptime(TimeStamp2,
format="%Y/%m/%d %I:%M:%S %p",tz="Australia/Sydney")

TSProper

[1] "2013-07-15 13:10:23 EST" "2013-08-13 13:10:23 EST"
[3] "2013-07-23 19:10:25 EST" "2014-01-31 03:10:23 EST"
[5] "2013-02-13 13:50:53 EST" "2014-07-13 21:13:43 EST"

 
#fraction of seconds 
Use %OSn instead of %S. 
  
%OSn reads fractions of seconds i.e. decimal points. ‘n’ is a digit beteen 0 and 6 inclusive, indicating number of decimal places to display.
  
strptime("2018-10-31 15:33:33.677", "%Y-%m-%d %H:%M:%OS")
   
[1] "2018-10-31 15:33:33 AEDT"


#2 dec points
format(strptime("2018-10-31 15:33:33.677", "%Y-%m-%d %H:%M:%OS"), "%H:%M:%OS2")
  
[1] "15:33:33.67"
   

#3 dec points
format(strptime("2018-10-31 15:33:33.677","%Y-%m-%d %H:%M:%OS"),"%H:%M:%OS3")

   
[1] "15:33:33.676"



#hour & minute AM/PM
format(TSProper,"%r")

[1] "01:10:23 PM" "01:10:23 PM" "07:10:25 PM" "03:10:23 AM" "01:50:53 PM"
[6] "09:13:43 PM"


#hour & minute AM/PM without leading zeros
format(TSProper,"%X")

[1] "1:10:23 PM" "1:10:23 PM" "7:10:25 PM" "3:10:23 AM" "1:50:53 PM"
[6] "9:13:43 PM"


#hour & minute
format(TSProper,"%R")

[1] "13:10" "13:10" "19:10" "03:10" "13:50" "21:13"


#hour, minute & second
format(TSProper,"%T")

[1] "13:10:23" "13:10:23" "19:10:25" "03:10:23" "13:50:53" "21:13:43"


#full
format(TSProper,"%c")

[1] "Mon Jul 15 13:10:23 2013" "Tue Aug 13 13:10:23 2013"
[3] "Tue Jul 23 19:10:25 2013" "Fri Jan 31 03:10:23 2014"
[5] "Wed Feb 13 13:50:53 2013" "Sun Jul 13 21:13:43 2014"



#system time
Sys.time()
  
[1] "2015-04-20 13:26:46 AEST"


#system date
Sys.Date()
  
[1] "2015-04-20"


#time zones
The below function will list all time zones used.
OlsonNames()  

Example: 

OlsonNames()[1:10]   
  
 [1] "Africa/Abidjan"     "Africa/Accra"       "Africa/Addis_Ababa"
 [4] "Africa/Algiers"     "Africa/Asmara"      "Africa/Asmera"     
 [7] "Africa/Bamako"      "Africa/Bangui"      "Africa/Banjul"     
[10] "Africa/Bissau"   



#Converting times to another time zone use as.POSIXlt()

LocalTime <- Sys.time()  
  
#this is Australian Eastern Standard time (i.e. Sydney or Canberra)  
  
LocalTime   
  
[1] "2019-12-18 12:30:38 AEDT"   
  
  
#If we want to check what time it is in New York  
  
OlsonNames()[169]  
  
[1] "America/New_York"
  
  
NewYork <- as.POSIXlt(LocalTime, tz = OlsonNames()[169])
  
NewYork  
  
[1] "2019-12-17 20:30:38 EST"  
  
  
#To create a date time object use as.POSIXct()  
  
NewDateTime <- as.POSIXct("2018-01-01 05:00:00", "%Y-%m-%d %H:%M:%S", tz = "Europe/London")  
  
NewDateTime   
  
[1] "2018-01-01 05:00:00 GMT"