My R Codes Archive: Reading files into R

The below are some of the options available to import files into R.

For csv files, use read.csv as below.

read.csv("test.csv")

V1 V2
1 F -0.5786439
2 E 0.2472908
3 U 0.2748309
4 R 1.1791559
5 K -0.1258598
6 X -0.8898289
7 L 0.4627274
8 C -0.7088007

To select first 3 rows,

read.csv("test.csv", nrow = 3)

V1 V2
1 F -0.5786439
2 E 0.2472908
3 U 0.2748309

To skip first 2 rows and extract next 3 rows,

read.csv("test.csv", nrow = 3, skip = 2)

E X0.247290774914572
1 U 0.2748309
2 R 1.1791559
3 K -0.1258598

If the file is not csv, use read.table, but separator/delimiter needs to be specified.

read.table("test.csv", sep = ",", nrow = 3, skip = 2)

V1 V2
1 E 0.2472908
2 U 0.2748309
3 R 1.1791559

For large files, use fread() function in data.table package for improved speed of import.

data.table::fread("test.csv", sep = ",")

V1 V2
1: F -0.5786439
2: E 0.2472908
3: U 0.2748309
4: R 1.1791559
5: K -0.1258598
6: X -0.8898289
7: L 0.4627274
8: C -0.7088007

To read first 3 rows,

fread("test.csv", sep = ",", nrow = 3)

V1 V2
1: F -0.5786439
2: E 0.2472908
3: U 0.2748309

To skip first 2 lines and read next 3 rows do below. Note the fread will treat header as a row when skipping.

fread("test.csv", sep = ",", nrow = 3, skip = 2)

V1 V2
1: E 0.2472908
2: U 0.2748309
3: R 1.1791559

readLines() is best for checking the contents and delimiter of the file prior to importing, as it is not restricted by encoding or delimiters.

readLines("test.csv")

[1] "\"V1\",\"V2\"" "\"F\",-0.578643919152124"
[3] "\"E\",0.247290774914572" "\"U\",0.274830888945797"
[5] "\"R\",1.179155856395" "\"K\",-0.125859842900427"
[7] "\"X\",-0.889828858494609" "\"L\",0.462727351834403"
[9] "\"C\",-0.708800746374982"

To read first 4 lines,

readLines("test.csv", n = 4)

[1] "\"V1\",\"V2\"" "\"F\",-0.578643919152124"
[3] "\"E\",0.247290774914572" "\"U\",0.274830888945797"

scan() is similar to readLines() but treat each cell as an item, hence the output does not group elements by rows.

scan("test.csv", what = "list", nlines = 4)

Read 8 items
[1] "V1" ",\"V2\"" "F"
[4] ",-0.578643919152124" "E" ",0.247290774914572"
[7] "U" ",0.274830888945797"

To skip first 2 lines and read next 4 lines (note the header is treated as line 1 when skipping),

scan("test.csv", what = "list", nlines = 4, skip = 2)

Read 8 items
[1] "E" ",0.247290774914572" "U"
[4] ",0.274830888945797" "R" ",1.179155856395"
[7] "K" ",-0.125859842900427"

If file is compressed, e.g. gzip, use gzfile().

To read files in,

read.csv(gzfile("test.csv.gz", "r"))

To write into the gz file,

a <- gzfile("test.csv.gz", "w")

cat("New1, 1111 \n New2, 22222\n", file = a)

My R Codes Archive

Wednesday, 6 September 2017

Reading files into R

No comments:

Post a Comment