Skip to content Skip to sidebar Skip to footer

How To Iterate Pages To Scrape Web News

I've been trying to figure out how to iterate pages to scrape multiple news articles. This is the page I want to scrape: (and its following pages) https://www.startribune.com/searc

Solution 1:

As mentioned in the comments, make sure the params are complete:

def scrape(url):
    user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko'}
    params = {
        'q': 'China%20COVID-19',
        'refresh': 'true',
    }
    for page_no in range(1, 10):
        params['page'] = page_no
        response = requests.get(url=url,
                                headers=user_agent,
                                params=params) 
        print(response.request.url)
        # https://www.startribune.com/search/?q=China%2520COVID-19&refresh=true&page=1

scrape('https://www.startribune.com/search/')

Post a Comment for "How To Iterate Pages To Scrape Web News"