Skip to content Skip to sidebar Skip to footer

Parse Large Json File In Nodejs And Handle Each Object Independently

I need to read a large JSON file (around 630MB) in Nodejs and insert each object to MongoDB. I've read the answer here:Parse large JSON file in Nodejs. However, answers there are

Solution 1:

There is a nice module named 'stream-json' that does exactly what you want.

It can parse JSON files far exceeding available memory.

and

StreamArray handles a frequent use case: a huge array of relatively small objects similar to Django-produced database dumps. It streams array components individually taking care of assembling them automatically.

Here is a very basic example:

constStreamArray = require('stream-json/streamers/StreamArray');
const path = require('path');
const fs = require('fs');

const jsonStream = StreamArray.withParser();

//You'll get json objects here//Key is an array-index here
jsonStream.on('data', ({key, value}) => {
    console.log(key, value);
});

jsonStream.on('end', () => {
    console.log('All done');
});

const filename = path.join(__dirname, 'sample.json');
fs.createReadStream(filename).pipe(jsonStream.input);

If you'd like to do something more complex e.g. process one object after another sequentially (keeping the order) and apply some async operations for each of them then you could do the custom Writeable stream like this:

constStreamArray = require('stream-json/streamers/StreamArray');
const {Writable} = require('stream');
const path = require('path');
const fs = require('fs');

const fileStream = fs.createReadStream(path.join(__dirname, 'sample.json'));
const jsonStream = StreamArray.withParser();

const processingStream = newWritable({
    write({key, value}, encoding, callback) {
        //Save to mongo or do any other async actionssetTimeout(() => {
            console.log(value);
            //Next record will be read only current one is fully processedcallback();
        }, 1000);
    },
    //Don't skip this, as we need to operate with objects, not buffersobjectMode: true
});

//Pipe the streams as follows
fileStream.pipe(jsonStream.input);
jsonStream.pipe(processingStream);

//So we're waiting for the 'finish' event when everything is done.
processingStream.on('finish', () =>console.log('All done'));

Please note: The examples above are tested for 'stream-json@1.1.3'. For some previous versions (presumably proior to 1.0.0) you might have to:

const StreamArray = require('stream-json/utils/StreamArray');

and then

const jsonStream = StreamArray.make();

Post a Comment for "Parse Large Json File In Nodejs And Handle Each Object Independently"