Scrapy Xpath - Can't Get Text Within Span
Solution 1:
Your example works fine. But I guess your xpath expressions failed on another page or html part.
The problem is the use of indexes (span[3]
) in the headquarters_list xpath expression. Using indexes you heavily depend on:
1. The total number of the span elements
2. On the exact order of the span elements
In general the use of indexes tend to make xpath expressions more fragile and more likely to fail. Thus, if possible, I would always avoid the use of indexes. In your example you actually take the locality of the address info. The span element can also easily be referenced by its class name which makes your expression much more robust:
//li[@class="vcard hq"]/p/span[@class='locality']/text()
Solution 2:
Here is my testing code according to your problem description:
# -*- coding: utf-8 -*-
from scrapy.selector import Selector
html_text = """
<liclass="type"><h4>Type</h4><p>
Privately Held
</p></li><liclass="vcard hq"><h4>Headquarters</h4><pclass="adr"itemprop="address"itemscopeitemtype="http://schema.org/PostalAddress"><spanclass="street-address"itemprop="streetAddress">Kornhamnstorg 49</span><spanclass="street-address"itemprop="streetAddress"></span><spanclass="locality"itemprop="addressLocality">Stockholm,</span><abbrclass="region"title="Stockholm"itemprop="addressRegion">Stockholm</abbr><spanclass="postal-code"itemprop="postalCode">S-11127</span><spanclass="country-name"itemprop="addressCountry">Sweden</span></p></li><liclass="company-size"><h4>Company Size</h4><p>
11-50 employees
</p>
"""
sel = Selector(text=html_text)
companytype_list = sel.xpath(
'''.//li[@class="type"]/p/text()''').extract()
headquarters_list = sel.xpath(
'''.//li[@class="vcard hq"]/p/span[3]/text()''').extract()
companysize_list = sel.xpath(
'''.//li[@class="company-size"]/p/text()''').extract()
It doesn't raise any exception. So chances are there exist web pages with a different structure causing errors.
It's a good practice to not using index directly in xpath rules. dron22's answer gives an awesome explanation.
Post a Comment for "Scrapy Xpath - Can't Get Text Within Span"