Run Multiple Spiders From Script In Scrapy In Loop
Solution 1:
For running multiple spiders simultaneously you can use this
import scrapy
from scrapy.crawler import CrawlerProcess
classMySpider1(scrapy.Spider):
# Your first spider definition
...
classMySpider2(scrapy.Spider):
# Your second spider definition
...
process = CrawlerProcess()
process.crawl(MySpider1)
process.crawl(MySpider2)
process.start() # the script will block here until all crawling jobs are finished
The answers of this question can help you too.
For more information:
Solution 2:
I was able to implement a similar functionality by removing loop from the script and setting a scheduler for every 3 minutes.
Looping functionality was achieved by maintaining a record of how many spiders are currently running and checking if more spiders need to be run or not.Thus at the end, only 5(can be changed) spiders can run concurrently.
Solution 3:
You need ScrapyD for this purpose
You can run as many spider as you want at the same time, you can constantly check status if a spider is running or not using listjobs API
You can set max_proc=5
in config file that will run maximum of 5 spiders at a single time.
Anyways, talking about your code, your code shoudl work if you do this
process = CrawlerProcess(get_project_settings())
for i inrange(10): #this range is just for demo instead of this i #find the spiders that are waiting to run from database
process.crawl(spider1) #spider name changes based on spider to run
process.crawl(spider2)
print('-------------this is the-----{}--iteration'.format(i))
process.start()
You need to place process.start()
outside of loop.
Post a Comment for "Run Multiple Spiders From Script In Scrapy In Loop"