Skip to content
Advertisement

asynchronously iterate over massive array in JavaScript without triggering stack size exceeded

My environment is NodeJS, although this could be a web related problem as well. I have a large set of data from a database which I am attempting to enumerate over. However, for the sake of argument lets say that I have an array of 20,000 strings:

var y = 'strstrstrstrstrstrstrstrstrstr';
var x = [];
for(var i = 0; i < 20000; i++)
  x.push(y);

and I want to enumerate this list asynchronously, lets say using the async library, and lets say because I’m super cautious that I even limit my enumeration to 5 iterations at once:

var allDone = function() { console.log('done!') };
require('async').eachLimit(x, 5, function(item, cb){
  ...
  someAsyncCall(.., cb);
}, allDone);

The expectation is that 5 items of x would be iterated concurrently above and that eventually all 20,000 items would be iterated over and the console would print ‘done!’. What actually happens is:

Uncaught exception: [RangeError: Maximum call stack size exceeded]

And at this point I assumed that this must be some sort of bug with the async library, so I wrote my own version of eachLimit which follows:

function eachLimit(data, limit, iterator, cb) {
    var consumed = 0;
    var consume;
    var finished = false;
    consume = function() {
        if(!finished && consumed >= data.length) {
            finished = true;
            cb();
        }else if(!finished) {
            return iterator(data[consumed++], consume);
        }
    };
    var concurrent = limit > data.length ? data.length : limit;
    for(var i = 0; i < concurrent; i++)
        consume();
}

and interestingly enough, this solved my problem. But then when I moved my experiment from nodeJS over to Chrome, even with my solution above I still receive a stack size exceeded.

Clearly, my method does not increase the stack as large as the eachLimit method contained within async. However, I still consider my approach to be bad because maybe not for 20k items, but for some sized array I can still exceed the stack size using my method. I feel like I need to design some sort of solution to this problem using tail recursion, but I’m not sure if v8 will even optimize for this case, or if it’s possible given the problem.

Advertisement

Answer

I feel like I need to design some sort of solution to this problem using tail recursion, but I’m not sure if v8 will even optimize for this case, or if it’s possible given the problem.

The continuation-passing-style you are using is already tail recursive (or close to anyway). The problem is that most JS engines really tend to do stackoverflows in these sorts of situations.

There are two main ways to work around this issue:

1) Force the code to be async using setTimeout.

What is happening with your code is that you are calling the return callbacks before the original function returns. In some async libraries this will end up resulting in stackoverflow. One simple workaround is to force the callback to run only in the next iteration of the event handling loop, by wrapping it inside a setTimeout. Translate

//Turns out this was actually "someSyncCall"...
someAsyncCall(.., cb);

into

someAsyncCall(..., function(){
    setTimeout(cb, 0)
});

The main advantage here is that this is very simple to do. The disadvantage is that this add some latency to your loop because setTimeout is implemented so that there will always be some nonzero delay to the callback (even if you set it to zero). On the server you can use nextTick (or somethign like that, forgot the precise name) to do something similar as well.

That said, its already a bit weird to have a large loop of sequential async operations. If your operations are all actually async then its going to take years to complete due to the network latency.

2) Use trampolining to handle the sync code.

The only way to 100% avoid a stackoverflow is to use bona-fide while loops. With promises this would be a bit easier to write the pseudocode for:

//vastly incomplete pseudocode
function loopStartingFrom(array, i){
    for(;i<array.length; i++){
        var x = run_next_item(i);
        if(is_promise(x)){
            return x.then(function(){
                loopStartingFrom(array, i+1)
            });
        }
    }
}

Basically, you run your loop in an actual loop, with some way to detect if one of your iterations is returning immediately or deferring to an async computation. When things return immediately you keep the loop running and when you finally get a real async result you stop the loop and resume it when the async iteration result completes.

The downside of using trampolining is that its a bit more complicated. That said, there are some async libraries out there that guarantee that stackoverflow does not occur (by using one of the two tricks I mentioned under the hood).

User contributions licensed under: CC BY-SA
3 People found this is helpful
Advertisement