Skip to content
Advertisement

Danfo.js : read .tsv files with readCSV() / read_csv()?

in node.js environment with Danfo.js, reading .csv files is very easy with readCSV(), formerly read_csv(), as shown in the official example:

const dfd = require("danfojs-node")

dfd.readCSV("file:///home/Desktop/user_names.csv")
  .then(df => {
  
   df.head().print()

  }).catch(err=>{
     console.log(err);
  })

However, I can’t find a way to read .tsv files.

Is there a way to read tab-delimited files with Danfo.js?

In the source I find the follwing comment:

 * @param {config} (Optional). A CSV Config object that contains configurations
 *     for reading and decoding from CSV file(s).

But I’m new to javascript coming from R/Python, didn’t know what to do from there.

Advertisement

Answer

Here is how to use readCSV (formerly read_csv) a tsv:

dfd.readCSV("file.csv", configs={delimiter:'t'} )

Danfo.js documentation says:

Parameters: configs: object, optional Supported params are: … csvConfigs: other supported Tensorflow csvConfig parameters. See https://js.tensorflow.org/api/latest/#data.csv

Then that page says:

csvConfig object optional: … delimiter (string) The string used to parse each line of the input file.

This means that parameter you include in csvConfig in tf.data.csv() can also be included in configs in readCSV(), e.g., if this works:

tf.data.csv(x,csvConfig={y:z})

then this will also work:

dfd.readCSV(x,configs={y:z})

PS: has anyone else noticed thast Danfo.js readCSV is insanely slow? It takes me 9 seconds to dfd.readCSV a 23MB tsv. dfd.read_json brings this down to a still unusably slow 7 seconds. Compare this to 0.015 seconds to read a 22MB apache arrow file of the same data using apache-arrow js.

User contributions licensed under: CC BY-SA
2 People found this is helpful
Advertisement