Skip to content
Advertisement

Is there a way to optimize the upload of huge json files to mongodb

I am working on a project where I need to use a database in order to better manage my data, and so I decided to use MongoDB for performance reasons. After setting everything up, I made a script that uses a pipeline to download a zip file, unpack it, grab the XLS file inside and convert the content to JSON (so I can manage it in my code/database). Everything works very well except when I am uploading every JSON part into MongoDB, the download speed decreases exponentially, and seems like the query is blocking the event loop. I had the idea to use Promise.all() to optimize my queries but the problem with it is that the array will grow a lot ending up in a memory exception (doing checks for array size on every data callback may also add further overhead to the execution). That’s my code:

const updateCatalog = async () => {
server.log.info('Updating catalog...');
const progressBar = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic);

const parser = Papa.parse(NODE_STREAM_INPUT, {
    header: false,
    fastMode: true
});

const database = client.db('bonnie');
const collection = database.collection('catalog');

progress(request(ZIP_URI)).on('progress', (state: RequestProgressState) => {
    progressBar.update(Math.floor(state.percent * 100));
}).on('response', () => progressBar.start(100, 0))
    .pipe(unzip.Parse())
    .on('entry', (entry: unzip.Entry) => {
        entry.pipe(parser).on('data', async (data: Array<string>) => {
            await collection.insertOne({
                embedUrl: data[0],
                thumbnailUrl: data[1],
                thumbnailUrls: data[2].split(';'),
                title: data[3],
                tags: data[4].split(';'),
                categories: data[5].split(';'),
                authors: data[6].split(';'),
                duration: parseInt(data[7]) ?? 0,
                views: parseInt(data[8]) ?? 0,
                likes: parseInt(data[9]) ?? 0,
                dislikes: parseInt(data[10]),
                thumbnailHdUrl: data[11],
                thumbnailHdUrls: data[12].split(';')
            }); 
        }).on('end', () => {
            progressBar.stop();
            server.log.info('Catalog updated');
        }).on('error', (err: any) => { client.close(); }); 
});
}

The same issue with memory exception will obviously happen with mongo Bulks too.

Advertisement

Answer

The file seems too big to fit into memory, causing memory exceptions. You need to save the file on disk first. Then you could try command line tools like mongoimport from MongoDB. You will need to convert the XLS file to a CSV file, or JSON Lines, and then import it.

If you want to stick with NodeJS, you then need:

  1. Load the file partially (to prevent the memory exception)
  2. Use collection.insertMany() to insert the loaded data
User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement