I am working on a project where I need to use a database in order to better manage my data, and so I decided to use MongoDB for performance reasons.
After setting everything up, I made a script that uses a pipeline to download a zip file, unpack it, grab the XLS file inside and convert the content to JSON (so I can manage it in my code/database). Everything works very well except when I am uploading every JSON part into MongoDB, the download speed decreases exponentially, and seems like the query is blocking the event loop.
I had the idea to use Promise.all()
to optimize my queries but the problem with it is that the array will grow a lot ending up in a memory exception (doing checks for array size on every data callback may also add further overhead to the execution).
That’s my code:
const updateCatalog = async () => { server.log.info('Updating catalog...'); const progressBar = new cliProgress.SingleBar({}, cliProgress.Presets.shades_classic); const parser = Papa.parse(NODE_STREAM_INPUT, { header: false, fastMode: true }); const database = client.db('bonnie'); const collection = database.collection('catalog'); progress(request(ZIP_URI)).on('progress', (state: RequestProgressState) => { progressBar.update(Math.floor(state.percent * 100)); }).on('response', () => progressBar.start(100, 0)) .pipe(unzip.Parse()) .on('entry', (entry: unzip.Entry) => { entry.pipe(parser).on('data', async (data: Array<string>) => { await collection.insertOne({ embedUrl: data[0], thumbnailUrl: data[1], thumbnailUrls: data[2].split(';'), title: data[3], tags: data[4].split(';'), categories: data[5].split(';'), authors: data[6].split(';'), duration: parseInt(data[7]) ?? 0, views: parseInt(data[8]) ?? 0, likes: parseInt(data[9]) ?? 0, dislikes: parseInt(data[10]), thumbnailHdUrl: data[11], thumbnailHdUrls: data[12].split(';') }); }).on('end', () => { progressBar.stop(); server.log.info('Catalog updated'); }).on('error', (err: any) => { client.close(); }); }); }
The same issue with memory exception will obviously happen with mongo Bulks too.
Advertisement
Answer
The file seems too big to fit into memory, causing memory exceptions. You need to save the file on disk first. Then you could try command line tools like mongoimport
from MongoDB. You will need to convert the XLS file to a CSV file, or JSON Lines, and then import it.
If you want to stick with NodeJS, you then need:
- Load the file partially (to prevent the memory exception)
- Use collection.insertMany() to insert the loaded data