How To Iterate Pages To Scrape Web News

December 26, 2023 Post a Comment

I've been trying to figure out how to iterate pages to scrape multiple news articles. This is the page I want to scrape: (and its following pages) https://www.startribune.com/searc

Solution 1:

As mentioned in the comments, make sure the params are complete:

def scrape(url):
    user_agent = {'user-agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; Touch; rv:11.0) like Gecko'}
    params = {
        'q': 'China%20COVID-19',
        'refresh': 'true',
    }
    for page_no in range(1, 10):
        params['page'] = page_no
        response = requests.get(url=url,
                                headers=user_agent,
                                params=params) 
        print(response.request.url)
        # https://www.startribune.com/search/?q=China%2520COVID-19&refresh=true&page=1

scrape('https://www.startribune.com/search/')

Python Guru

How To Iterate Pages To Scrape Web News

Solution 1:

Post a Comment for "How To Iterate Pages To Scrape Web News"