We've successfully solved #185, that's awesome! 🎉 It seems to process redirects correctly now. However, I still struggle to get past a single POST request done.
Running code of my spider through Scrapy's crawl command gives me:
- downloader/request_count: 1919
- downloader/request_method_count/GET: 1723
- downloader/request_method_count/POST: 196
- downloader/response_count: 1919
- downloader/response_status_count/200: 1130
- downloader/response_status_count/302: 789
- dupefilter/filtered: 223
- item_scraped_count: 741
I tried twice without cache and got the same numbers. Running the very same code through Apify integration gives me:
- downloader/request_count: 1721
- downloader/request_method_count/GET: 1720
- downloader/request_method_count/POST: 1
- downloader/response_count: 1721
- downloader/response_status_count/200: 934
- downloader/response_status_count/302: 787
- item_scraped_count: 546
I don't understand why the number of GET requests differs by two, but let's say the difference in POSTs is a bigger concern for now. Looking to the log with debug level turned on, I noticed this one thing repeats:
[apify] DEBUG [nfTk1APL]: rq.add_request.result={
'wasAlreadyPresent': True,
'wasAlreadyHandled': False,
'requestId': 'uPDbxqGboeKtU6J',
'uniqueKey': 'https://api.example.com/api/graphql/widget'}...
The DEBUG [...] part changes, but uniqueKey doesn't change and wasAlreadyPresent is True, suspiciously. Is it possible that Apify's request queue dedupes the requests only based on the URL? Because the POSTs all have the same URL, just different payload. Which should be very common - by definition of what POST is, or even in practical terms with all the GraphQL APIs around.
We've successfully solved #185, that's awesome! 🎉 It seems to process redirects correctly now. However, I still struggle to get past a single
POSTrequest done.Running code of my spider through Scrapy's
crawlcommand gives me:I tried twice without cache and got the same numbers. Running the very same code through Apify integration gives me:
I don't understand why the number of GET requests differs by two, but let's say the difference in POSTs is a bigger concern for now. Looking to the log with
debuglevel turned on, I noticed this one thing repeats:The
DEBUG [...]part changes, butuniqueKeydoesn't change andwasAlreadyPresentisTrue, suspiciously. Is it possible that Apify's request queue dedupes the requests only based on the URL? Because the POSTs all have the same URL, just different payload. Which should be very common - by definition of what POST is, or even in practical terms with all the GraphQL APIs around.