in node.js environment with Danfo.js, reading .csv files is very easy with readCSV(), formerly read_csv(), as shown in the official example:
const dfd = require("danfojs-node") dfd.readCSV("file:///home/Desktop/user_names.csv") .then(df => { df.head().print() }).catch(err=>{ console.log(err); })
However, I can’t find a way to read .tsv
files.
Is there a way to read tab-delimited files with Danfo.js?
In the source I find the follwing comment:
* @param {config} (Optional). A CSV Config object that contains configurations * for reading and decoding from CSV file(s).
But I’m new to javascript coming from R/Python, didn’t know what to do from there.
Advertisement
Answer
Here is how to use readCSV (formerly read_csv) a tsv:
dfd.readCSV("file.csv", configs={delimiter:'t'} )
Danfo.js documentation says:
Parameters: configs: object, optional Supported params are: … csvConfigs: other supported Tensorflow csvConfig parameters. See https://js.tensorflow.org/api/latest/#data.csv
Then that page says:
csvConfig object optional: … delimiter (string) The string used to parse each line of the input file.
This means that parameter you include in csvConfig
in tf.data.csv()
can also be included in configs
in readCSV()
, e.g., if this works:
tf.data.csv(x,csvConfig={y:z})
then this will also work:
dfd.readCSV(x,configs={y:z})
PS: has anyone else noticed thast Danfo.js readCSV is insanely slow? It takes me 9 seconds to dfd.readCSV a 23MB tsv. dfd.read_json brings this down to a still unusably slow 7 seconds. Compare this to 0.015 seconds to read a 22MB apache arrow file of the same data using apache-arrow js.