Scraping Content Using Pyppeteer In Association With Asyncio

January 31, 2024 Post a Comment

I've written a script in python in combination with pyppeteer along with asyncio to scrape the links of different posts from its landing page and eventually get the title of each p

Solution 1:

The problem is in the following lines:

tasks = [await browse_all_links(link, page) for link in linkstorage]
results = await asyncio.gather(*tasks)

The intention is for tasks to be a list of awaitable objects, such as coroutine objects or futures. The list is to be passed to gather, so that the awaitables can run in parallel until they all complete. However, the list comprehension contains an await, which means that it:

executes each browser_all_links to completion in series rather than in parallel;
places the return values of browse_all_links invocations into the list.

Since browse_all_links doesn't return a value, you are passing a list of None objects to asyncio.gather, which complains that it didn't get an awaitable object.

To resolve the issue, just drop the await from the list comprehension.

Python Guru

Scraping Content Using Pyppeteer In Association With Asyncio

Solution 1:

Post a Comment for "Scraping Content Using Pyppeteer In Association With Asyncio"