fix: handle multiple BasicCrawler.stop() calls correctly#3324
Conversation
| .catch((err) => { | ||
| this.log.error('An error occurred when stopping the crawler:', err); | ||
| }); | ||
| if (!this.stoppingPromise) { |
There was a problem hiding this comment.
This seems much more complex than what we have in the Python version (stop() sets a boolean flag and is_finished_function picks it up and terminates the AutoscaledPool). Makes me think... what is the advantage of the approach that we use here?
There was a problem hiding this comment.
I believe both these solutions were developed around the same time, independently. I agree the Python implementation is considerably simpler, let's go with that (I'll edit this PR).
|
Thank you both for your reviews, my answer to all the comments is trying to conform eslint rules :) E.g. without
I'll go with the flag-solution from Python as mentioned here anyway, so all of this should be irrelevant now. |
|
The most recent commits clone the Python implementation from apify/crawlee-python#807 including the logged messages. |
BasicCrawler.stop() calls correctlyBasicCrawler.stop() calls correctly
| this.log.info( | ||
| 'The crawler has finished all the remaining ongoing requests and will shut down now.', | ||
| ); | ||
| return true; |
There was a problem hiding this comment.
Shouldn't we also set shouldLogShuttingDown = false here?
There was a problem hiding this comment.
Once isFinishedFunction returns true, it's not called again (and neither is the isTaskReadyFunction, since AutoscaledPool.run() will resolve).
crawlee/packages/core/src/autoscaling/autoscaled_pool.ts
Lines 677 to 678 in 2037ea9
There was a problem hiding this comment.
Better safe than sorry? I wouldn't bet on the internals of AutoscaledPool here

BasicCrawler.stop()calls asynchronous functions without awaiting them, which can cause unexpected race conditions. This PR ensures that multiple.stop()calls only result in oneAutoscaledPool.abort()call and that the.stop()-induced promises are resolved before the mainBasicCrawler.run()call resolves.Closes #3257