Skip to content
Advertisement

Node.js – Reading CSV-file not working with line numbers > 500

I am currently struggling to run my Node.js server.

What I want to do:

  • Upload a CSV-File from mobile device to my local server and save it on the file system
  • Read each line of the .csv-File and save each row to my MongoDB database

Uploading and saving the file works flawlessly. Reading the .csv-File and saving each row to the database only works for files with small line numbers. I don’t know the exact number of lines when it stops working. It seems to differ every time I read a file. Sometimes (if the line numbers are bigger than 1000) the CSV-Reader I use doesn’t even start processing the file. Other times he reads only 100-200 lines and then stops.

Here is my code how I upload the file:

var fs = require('fs');
var sys = require("sys");
var url = require('url');
var http = require('http');

http.createServer(function(request, response) {
    sys.puts("Got new file to upload!");

    var urlString = url.parse(request.url).pathname;

    var pathParts = urlString.split("/");

    var deviceID = pathParts[1];
    var fileName = pathParts[2];

    sys.puts("DeviceID: " + deviceID);
    sys.puts("Filename: " + fileName);

    sys.puts("Start saving file");
    var tempFile = fs.createWriteStream(fileName);
    request.pipe(tempFile);
    sys.puts("File saved");

    // Starting a new child process which reads the file 
    // and inserts each row to the database
    var task = require('child_process').fork('databaseInsert.js');
    task.on('message', function(childResponse) {
        sys.puts('Finished child process!');
    });
    task.send({
        start : true,
        deviceID : deviceID,
        fileName : fileName
    });
    sys.puts("After task");

    response.writeHead(200, {
        "Content-Type" : "text/plain"
    });
    response.end('MESSAGE');
}).listen(8080);

This works all fine. Now the code of the child process (databaseInsert.js):

var sys = require("sys");
var yaCSV = require('ya-csv');
var Db = require('mongodb').Db;
var dbServer = require('mongodb').Server;

process.on('message', function(info) {
    sys.puts("Doing work in child process");

    var fileName = info.fileName;
    var deviceID = info.deviceID;

    sys.puts("Starting db insert!");
    var dbClient = new Db('test', new dbServer("127.0.0.1", 27017, {}), {
        w : 1
    });

    dbClient.open(function(err, client) {
        if (err) {
            sys.puts(err);
        }
        dbClient.createCollection(deviceID, function(err, collection) {
            if (err) {
                sys.puts("Error creating collection: " + err);
            } else {
                sys.puts("Created collection: " + deviceID);

                var csvReader = yaCSV.createCsvFileReader(fileName, {
                    columnsFromHeader : true,
                    'separator' : ';'
                });
                csvReader.setColumnNames([ 'LineCounter', 'Time',  'Activity',
                        'Latitude', 'Longitude' ]);

                var lines = 0;
                csvReader.addListener('data', function(data) {
                    lines++;
                    sys.puts("Line: " + data.LineCounter);
                    var docRecord = {
                        fileName : fileName,
                        lineCounter : data.LineCounter,
                        time : data.Time,
                        activity : data.Activity,
                        latitude : data.Latitude,
                        longitude : data.Longitude
                    };
                    collection.insert(docRecord, {
                        safe : true
                    }, function(err, res) {
                        if (err) {
                            sys.puts(err);
                        }
                    });
                });
            }
        });
    });
    process.send('finished');
});

At first I didn’t use a child process but I had the same behaviour as I have now. So I tested this.

Hopefully someone who has some experience with Node.js can help me.

Advertisement

Answer

I think your issue is that you are trying to read the tempFile while it is still being written to. Right now you are piping the request to the file stream (which proceeds in parallel and asynchronously) and start the reader process. The reader process will then start reading the file in parallel with the write operations. If the reader is faster (it usually will be), it will read the first couple of records but then encounter an end of file and stop reading.

To remedy this, you could only start the reader process after writing has completely finished, i.e., put the part from sys.puts("File.send"); onward into a callback of tempFile.end(...) (see http://nodejs.org/api/stream.html#stream_writable_end_chunk_encoding_callback).

Reading the file while it is still being written to, akin to the tail command in Unix, is fairly hard in my understanding (google for details on how difficult it is to implement a proper tail).

User contributions licensed under: CC BY-SA
4 People found this is helpful
Advertisement