Skip to content Skip to sidebar Skip to footer

Crawling Dynamic Content With Scrapy

I am trying to get latest review from Google play store. I'm following this question for getting the latest reviews here Method specified in the above link's answer works fine with

Solution 1:

Seems like you haven't changing the id in the form data.

defparseApp(self, response):
    apps = list(set(response.xpath('//a[@class="card-click-target"]/@href').extract()))
    url = "https://play.google.com/store/getreviews"for app in apps:
        _id = app.strip('/store/apps/details?id=')
        form_data = {"id": _id, "reviewType": '0', "reviewSortOrder": '0', "pageNum":'0'}
        sleep(5)
        yield FormRequest(url=url, formdata=form_data, callback=self.parse_data)

defparse_app(self, response):
    response_data = re.findall("\[\[.*", response.body)
    if response_data:
        try:
            text = json.loads(response_data[0] + ']')
            sell = Selector(text=text[0][2])
        except:
            pass# do whatever you want to extract using sell.xapth('YOUR_XPATH_HERE')

A sample review after cleaning the data you will be getting something like this

<div class="single-review">
    <ahref="/store/people/details?id=106726831005267540508"><imgclass="author-image"alt="Lorence Gerona avatar image"src="https://lh3.googleusercontent.com/uFp_tsTJboUY7kue5XAsGA=w48-c-h48"></a><divclass="review-header"data-expand-target=""data-reviewid="gp:AOqpTOHnsExa_P6JFRJD6HF5h71fpY91tNaEODjtfiTu-zPFki9ZnYsNp1HEcGFpGEfu9xqwJL_j-03Tx0e9lw"><divclass="review-info"><spanclass="author-name"><ahref="/store/people/details?id=106726831005267540508">Lorence Gerona</a></span><spanclass="review-date">3 June 2015</span><aclass="reviews-permalink"href="/store/apps/details?id=com.supercell.boombeach&amp;reviewId=Z3A6QU9xcFRPSG5zRXhhX1A2SkZSSkQ2SEY1aDcxZnBZOTF0TmFFT0RqdGZpVHUtelBGa2k5Wm5Zc05wMUhFY0dGcEdFZnU5eHF3Skxfai0wM1R4MGU5bHc"title="Link to this review"></a><divclass="review-source"style="display:none"></div><divclass="review-info-star-rating"><divclass="tiny-star star-rating-non-editable-container"aria-label="Rated 5 stars out of five stars"><divclass="current-rating"style="width: 100%;"></div></div></div></div><divclass="rate-review-wrapper"><divclass="play-button icon-button small rate-review"title="Spam"data-rating="SPAM"><divclass="icon spam-flag"></div></div><divclass="play-button icon-button small rate-review"title="Helpful"data-rating="HELPFUL"><divclass="icon thumbs-up"></div></div><divclass="play-button icon-button small rate-review"title="Unhelpful"data-rating="UNHELPFUL"><divclass="icon thumbs-down"></div></div></div></div><divclass="review-body"><spanclass="review-title">Team BOOM BEACH</span>
Amazing game I can defeat hammerman
<divclass="review-link"style="display:none"><aclass="id-no-nav play-button tiny"href="#"target="_blank">Full Review</a></div></div></div>

Post a Comment for "Crawling Dynamic Content With Scrapy"