Wednesday, 6 September 2017

Reading files into R

The below are some of the options available to import files into R.
 

 For csv files, use read.csv as below.
 
  
read.csv("test.csv")
 
  V1         V2
1  F -0.5786439
2  E  0.2472908
3  U  0.2748309
4  R  1.1791559
5  K -0.1258598
6  X -0.8898289
7  L  0.4627274
8  C -0.7088007
 
To select first 3 rows,
 
 read.csv("test.csv", nrow = 3)
 
   V1         V2
1  F -0.5786439
2  E  0.2472908
3  U  0.2748309

 
To skip first 2 rows and extract next 3 rows,
 
 
read.csv("test.csv", nrow = 3, skip = 2)
 
  E X0.247290774914572
1 U          0.2748309
2 R          1.1791559
3 K         -0.1258598
 
If the file is not csv, use read.table, but separator/delimiter needs to be specified.
 

read.table("test.csv", sep = ",", nrow = 3, skip = 2)
 
  V1        V2
1  E 0.2472908
2  U 0.2748309
3  R 1.1791559
 
For large files, use fread() function in data.table package for improved speed of import.
 

data.table::fread("test.csv", sep = ",")
 
   V1         V2
1:  F -0.5786439
2:  E  0.2472908
3:  U  0.2748309
4:  R  1.1791559
5:  K -0.1258598
6:  X -0.8898289
7:  L  0.4627274
8:  C -0.7088007

 
To read first 3 rows,
 
fread("test.csv", sep = ",", nrow = 3)
 
   V1         V2
1:  F -0.5786439
2:  E  0.2472908
3:  U  0.2748309
 

To skip first 2 lines and read next 3 rows do below. Note the fread will treat header as a row when skipping.
 
fread("test.csv", sep = ",", nrow = 3, skip = 2)

   V1        V2
1:  E 0.2472908
2:  U 0.2748309
3:  R 1.1791559
 
readLines() is best for checking the contents and delimiter of the file prior to importing, as it is not restricted by encoding or delimiters.
 
readLines("test.csv")
 
[1] "\"V1\",\"V2\""            "\"F\",-0.578643919152124"
[3] "\"E\",0.247290774914572"  "\"U\",0.274830888945797"
[5] "\"R\",1.179155856395"     "\"K\",-0.125859842900427"
[7] "\"X\",-0.889828858494609" "\"L\",0.462727351834403"
[9] "\"C\",-0.708800746374982"
 
To read first 4 lines,
 
readLines("test.csv", n = 4)
 
[1] "\"V1\",\"V2\""            "\"F\",-0.578643919152124"
[3] "\"E\",0.247290774914572"  "\"U\",0.274830888945797" 
 

scan() is similar to readLines() but treat each cell as an item, hence the output does not group elements by rows.
 

scan("test.csv", what = "list", nlines = 4)

Read 8 items
[1] "V1"                  ",\"V2\""             "F"                
[4] ",-0.578643919152124" "E"                   ",0.247290774914572"
[7] "U"                   ",0.274830888945797" 
 
To skip first 2 lines and read next 4 lines (note the header is treated as line 1 when skipping),
 
scan("test.csv", what = "list", nlines = 4, skip = 2)
 
Read 8 items
[1] "E"                   ",0.247290774914572"  "U"                
[4] ",0.274830888945797"  "R"                   ",1.179155856395"  
[7] "K"                   ",-0.125859842900427"

 
If file is compressed, e.g. gzip, use gzfile().
 
To read files in,
 
read.csv(gzfile("test.csv.gz", "r"))

 
To write into the gz file,
  
a <- gzfile("test.csv.gz", "w")
  
cat("New1, 1111 \n New2, 22222\n", file = a)








No comments:

Post a Comment