Class Crawler Written In Python Throws Attribute Error

August 17, 2022 Post a Comment

After writing some code in python, I've got stuck in deep trouble. I'm a newbie in writing code following the OOP design in python. The xpaths I've used in my code are flawless. I'

Solution 1:

I'm not sure i understand what you're trying to do in page_crawler.get_link, but i think you should have a different method for collecting "pagination" links.
I renamed Info_grabber.plinks to Info_grabber.links so that the page_crawler.crawler can access them, and managed to extract info from several pages, however the code is far from ideal.

class page_crawler(object):

    main_link = "https://www.yellowpages.com/search?search_terms=pizza&geo_location_terms=San%20Francisco%2C%20CA"
    base_link = "https://www.yellowpages.com"

    def __init__(self):
        self.links = []
        self.pages = []

    def crawler(self):
        for link in self.links:
            self.get_link(link)

    def get_link(self, link):
        print("Running page "+ link)
        page = requests.get(link)
        tree = html.fromstring(page.text)
        item_links = tree.xpath('//h2[@class="n"]/a[@class="business-name"][not(@itemprop="name")]/@href')
        for item_link in item_links:
            if not self.base_link + item_link in self.links:
                self.links += [self.base_link + item_link]

    def get_pages(self, link):
        page = requests.get(link)
        tree = html.fromstring(page.text)
        links = tree.xpath('//div[@class="pagination"]//li/a/@href')
        for url in links:
            if not self.base_link + url in self.pages:
                self.pages += [self.base_link + url]


class Info_grabber(page_crawler):

    def __init__(self, plinks):
        page_crawler.__init__(self)
        self.links += [plinks]

    def passing_links(self):
        for nlink in self.links:
            print(nlink)
            self.crawling_deep(nlink)

    def crawling_deep(self, uurl):
        page = requests.get(uurl)
        tree = html.fromstring(page.text)
        name = tree.findtext('.//div[@class="sales-info"]/h1')
        phone = tree.findtext('.//p[@class="phone"]')
        try:
            email = tree.xpath('//div[@class="business-card-footer"]/a[@class="email-business"]/@href')[0]
        except IndexError:
            email=""
        print(name, phone, email)


if __name__ == '__main__':
    url = page_crawler.main_link
    crawl = Info_grabber(url)
    crawl.crawler()
    crawl.passing_links()

You'll notice that i added a pages property and a get_pages method in page_crawler, i'll leave the implementation part to you.
You might need to add more methods to page_crawler later on, as they could be of use if you develop more child classes. Finally consider looking into composition as it is also a strong OOP feature.

Solution 2:

Your crawl is an instance of the page crawler class, but not the InfoGrabber class, which is the class that has the method passing_links. I think what you want to do is make crawl an instance of InfoGrabber instead.

Then I believe before doing self.crawling_deep you must do:

if n_link:
    page = requests.get(n_link).text            
    tel = re.findall(r'\d{10}', page)[0] if re.findall(r'\d{10}', page) else ""
    print(tel)

Python Guru

Class Crawler Written In Python Throws Attribute Error

Solution 1:

Solution 2:

Post a Comment for "Class Crawler Written In Python Throws Attribute Error"