Friday, 6 February 2015

Working on Excel file from R

There are few packages available in R to let you work with (import/export) Excel files from R. This blog page will focus on 'xlsx' package.


'xlsx' package provides a command to import excel file, but this did not work well in my case. If the excel document is in Excel 97-2003 (.xls) format, the following works quite well.

Note: if ODBC installed in Windows is 32-bit, you need to use 32-bit R to do this.

library(RODBC) 
library(xlsx) 

excel.connect<-odbcConnectExcel("file_name.xls") 
Data<-sqlFetch(excel.connect,"Sheet_Name") 
odbcClose(excel.connect)


For exporting data into excel document, the following works well.

For a simple export:
library(xlsx)
wb<-createWorkbook() 
sheet<-createSheet(wb,sheetName="Sheet_Name") 
addDataFrame(dataframe,sheet,row.names=FALSE)  
saveWorkbook(wb,"file_name")


To create multiple worksheets:
wb<-createWorkbook() 
sheet<-createSheet(wb,sheetName="Sheet_Name_1") 
addDataFrame(dataframe_1,sheet,row.names=FALSE)  
saveWorkbook(wb,"file_name")

sheet<-createSheet(wb,sheetName="Sheet_Name_2") 
addDataFrame(dataframe_2,sheet,row.names=FALSE)  
saveWorkbook(wb,"file_name")


For exporting data with formatting:

wb<-createWorkbook()
 
sheet<-createSheet(wb,sheetName="Sheet_Name") 

for(i in c(required row numbers){
 
ROW<-createRow(sheet,rowIndex=i)

#populate selected columns (selected respective to different formatting)

for(j in c(required column numbers)){ 
CELL<-createCell(ROW,colIndex=j)[[1,1]]  

#data format written as per the excel specification (go to cell format and custom format for other examples)

cs<-CellStyle(wb,dataFormat=DataFormat("#,##0;[Red]-#,##0"))+
Font(wb,isBold=FALSE)+
Alignment(h="ALIGN_RIGHT",wrapText=TRUE) 
setCellValue(CELL,dataframe[i,j],showNA=FALSE)  
setCellStyle(CELL,cs)
}

#populate remaining columns (selected respective to different formatting)

for(j in c(required column numbers)){ 
CELL<-createCell(ROW,colIndex=j)[[1,1]]   
cs<-CellStyle(wb,dataFormat=DataFormat("$#,##0.00;[Red]-$#,##0.00"))+
Font(wb,isBold=FALSE)+
Alignment(h="ALIGN_RIGHT",wrapText=TRUE) 
setCellValue(CELL,dataframe[i,j],showNA=FALSE)  
setCellStyle(CELL,cs)
}














Number formatting

For presentation of numeric tables in reports, one often has to use '000 separators and use equal spacing to make the results look neater and pretty.

formatC() in the base package is useful for this purpose.


Example 1:

Given the below input
data.frame(V1=c((-1:1)/0,c(1,100)*pi))
          V1
1       -Inf
2        NaN
3        Inf
4   3.141593
5 314.159265


converted to below
 data.frame(V1=formatC(c((-1:1)/0,c(1,100)*pi), width = 8, digits = 1))
        V1
1     -Inf
2      NaN
3      Inf
4        3
5    3e+02


Alternatively for 2 digits displayed
data.frame(V1=formatC(c((-1:1)/0,c(1,100)*pi), width = 8, digits = 2))
        V1
1     -Inf
2      NaN
3      Inf
4      3.1
5  3.1e+02


Example 2:

Given the below numbers

12344448.2 
3244149.9210001 
321.21 
78186271.32155

To only show integers
data.frame(V1=formatC(c(12344448.2,3244149.9210001,3217747.214,78186271.32155),format="d"))
        V1
1 12344448 
2  3244149 
3  321 
4 78186271

To show numbers with scientific format

data.frame(V1=formatC(c(12344448.2,3244149.9210001,321.21,78186271.32155),format="e"))
          V1 
1 1.2344e+07 
2 3.2441e+06 
3 3.2121e+02 
4 7.8186e+07


To show numbers with scientific format when the number is big

data.frame(V1=formatC(c(12344448.2,3244149.9210001,321.21,78186271.32155),format="g")) 
         V1
1 1.234e+07
2 3.244e+06
3     321.2
4 7.819e+07


To show numbers in its actual form (this is limited by the options setting in R)

data.frame(V1=formatC(c(12344448.2,3244149.9210001,321.21,78186271.32155),format="f")) 
             V1
1 12344448.2000
2  3244149.9210
3      321.2100
4 78186271.3215


To show remainder of digits in decimal points, specify 'digits' parameter (but format="f" will change the original input to comply with the new setting - so this should be used when you want to round off figures to smaller number of digits)

data.frame(V1=formatC(c(12344448.2,3244149.9210001,321.21,78186271.32155),format="f",digits=15)) 
                        V1
1 12344448.199999999254942
2  3244149.921000100206584
3      321.209999999999980
4 78186271.321549996733665


Use format="fg" in combination with 'digits' specification to display the actual input

data.frame(V1=formatC(c(12344448.2,3244149.9210001,321.21,78186271.32155),format="fg",digits=15)) 
                V1
1       12344448.2
2  3244149.9210001
3           321.21
4   78186271.32155



To insert '000 separator

data.frame(V1=formatC(c(12344448.2,3244149.9210001,321.21,78186271.32155),format="fg",digits=15,big.mark=",")) 
                 V1
1      12,344,448.2
2 3,244,149.9210001
3            321.21
4  78,186,271.32155


To make the decimal point more obvious

data.frame(V1=formatC(c(12344448.2,3244149.9210001,321.21,78186271.32155),format="fg",digits=15,big.mark=",",decimal.mark="\\")) 
                  V1
1      12,344,448\\2
2 3,244,149\\9210001
3            321\\21
4  78,186,271\\32155


To insert separators in the decimal points

data.frame(V1=formatC(c(12344448.2,3244149.9210001,321.21,78186271.32155),format="fg",digits=15,big.mark=",",decimal.mark="\\",small.mark=" ")) 
                   V1
1       12,344,448\\2
2 3,244,149\\92100 01
3             321\\21
4   78,186,271\\32155