Monday, 20 April 2015

memory consumption and processing times

To measure a time taken by a process the following function can be useful

system.time() e.g.   
system.time(sum(seq(1,10000000,by=1)))     
   user  system elapsed
   0.44    0.08    0.53  


To display the output of the process as well, use print() inside

system.time(print(sum(seq(1,10000000,by=1))))  
[1] 5e+13
   user  system elapsed
   0.41    0.02    0.42  
   


To check the size of the object, use object_size() from 'pryr' package

library(pryr)   
object_size(iris)   
7.03 kB

To check how much memory has been used by the session,

mem_used()   
21 MB

To check how much memory change a particular process makes,

mem_used()   
21 MB
mem_change(1+100)  
10.6 kB   
mem_used()   
21.1 MB


Garbage collection gc() runs automatically in R to release memory when an object is removed. You can still run this command manually.

mem_used()   
21.1 MB 
dat<-iris[rep(1:nrow(iris),1000),]
mem_used()  
37.3 MB 
rm(dat)   
mem_used() 
24 MB 
gc()   
         used (Mb) gc trigger (Mb) max used  (Mb)
Ncells 343524 18.4     741108 39.6   707448  37.8
Vcells 591474  4.6    2028694 15.5 21067777 160.8 
mem_used()   
24 MB


For Windows, to check memory in use and change limits of memory allocation
 
library(utils)  

 
To check size of memory currently in use
 
memory.size(max=FALSE)  

 
To check maximum amount of memory obtained from OS
 
memory.size(max=TRUE)  

e.g.
#memory immediately after opening a new R session  
memory.size(max=TRUE)    
[1]  16.56 

memory.size(max=FALSE)   
 [1]  12.58 

dat<-matrix(1:2700, 30, 90)  #create a matrix 30 x 90  

memory.size(max=TRUE)  #maximum memory obtained remains same  
[1]  16.56  

memory.size(max=FALSE)  #memory currently in use increased after the matrix was created  
 [1]  12.61  
 

To change memory allocation limit to specific amount (in Mb)
 
memory.limit(size=8000)  
   

To check current memory allocation limit  
 
memory.limit(size=NA)  

e.g.

#at the opening of a new R session (32 bit)
memory.limit(size=NA)  
[1]  3583   
memory.limit(size=3600)  #increased size to 3600  
[1]  3600  
memory.limit(size=NA)   
[1]  3600   
memory.limit(size=3800)  #increased size to 3800  
[1]  3800   
memory.limit(size=NA)   
[1]  3800   

N.B. the memory allocation limit cannot be set to a lower limit (i.e. you can only increase it)




r updates and packages

The package 'installr' makes it simple to upgrade R to a new version with the following command

library(installr) 


To install the latest version of R:

install.R()


To install the associated program used with R:

The below lists only some of the examples

install.ImageMagick()  
install.Rtools() 
install.MikTex()  
install.pandoc()   


To update installed R to the latest version:

updateR()


To install packages, you can set the source by specifying 'repos=' and specify the path to the library by setting 'lib=' (in Windows, .libPaths() can be used to set up a default path to the library folder at the beginning for the session)

install.packages("dplyr", repos="http://cran.csiro.au/", lib="C:/Users/Documents/MyLibrary")

   
If you are installing packages from the local directory, provide path and name of the file to be installed.

          install.packages("[path to the directory]/[name of file (e.g. ada_2.0-5.zip)]", repos=NULL,  type="source")
 
 
The list of all repos url's can be found at http://cran.r-project.org/mirrors.html


To list all the packages installed at a specified library path

installed.packages(lib.loc="path")

  
To check if any packages in the library are old,   
  

old.packages()

    
To check for new packages,   
  
new.packages()
 

To update existing packages to the latest version  


update.packages()

 


For those work places using Windows where firewall prevents downloading of the packages, you may need to download zip files from the source and install them locally by using the tool bar options in the R console.  

   




To check what packages are available:

(.packages())


To check all available packages:

(.packages(all.available=TRUE))

alternatively,

library()

 


To check for all packages in the repository:
available.packages()


To remove packages:

remove.packages("name_of_package","library_path")


  
  
To download the package in its compressed form (e.g. *.tar.gz):

download.packages("name_of_package",destdir="destination_path")








  
  
To download multiple packages at once,

download.packages(c("name_of_package1","name_of_package2","name_of_package3"), destdir="destination_path")

   
  
A contributed package called 'tools' provides useful function to check for dependencies of the installed packages:

library(tools)   
inst_pack<-installed.packages()   
package.dependencies(inst_pack)
     










Friday, 10 April 2015

Intersection & Union


Let's set vectors
Group1<-c("A","R","t","Y","p","Q") 
Group2<-c("Z","R","t","O","p","X")

> Group1  
[1] "A" "R" "t" "Y" "p" "Q"


> Group2   
[1] "Z" "R" "t" "O" "p" "X"


Intersection: there are 2 ways
> intersect(Group1,Group2)
[1] "R" "t" "p"

> Group1[which(Group1%in%Group2)]
[1] "R" "t" "p"


























Union:
> union(Group1,Group2)   
[1] "A" "R" "t" "Y" "p" "Q" "Z" "O" "X"

























Elements exclusive to Group 1: there are 2 ways
> setdiff(Group1,Group2)  
[1] "A" "Y" "Q"   

> Group1[which(!Group1%in%Group2)]   
[1] "A" "Y" "Q"
























Elements exclusive to Group 2: there are 2 ways
> setdiff(Group2,Group1)   
[1] "Z" "O" "X"

> Group2[which(!Group2%in%Group1)]  
 
[1] "Z" "O" "X"

























Mutually exclusive elements
> union(setdiff(Group1,Group2),setdiff(Group2,Group1))   
[1] "A" "Y" "Q" "Z" "O" "X"