Skip to content Skip to sidebar Skip to footer

Unable To Get Actual Markup From A Page With BeautifulSoup

I am trying to scrape this URL with combination of BeautifulSoup and Selinium http://starwood.ugc.bazaarvoice.com/3523si-en_us/115/reviews.djs?format=embeddedhtml&page=2&sc

Solution 1:

First of all, you are giving us the wrong link, instead of the actual page you are trying to scrape, you give us a link to the participating in the page load js file which would be a unnecessary challenge to parse.

Secondly, you don't need BeautifulSoup in this case, selenium itself is good at locating elements and extracting the text or attributes. No need for an extra step here.

Here's a working example using the actual page with reviews you want to get:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

driver = webdriver.Chrome()  # or webdriver.Firefox()
driver.get('http://www.starwoodhotels.com/sheraton/property/reviews/index.html?propertyID=115&language=en_US')

# wait for the reviews to load
WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "span.BVRRReviewText")))

# get reviews
for review_div in driver.find_elements_by_css_selector("span.BVRRReviewText"):
    print(review_div.text)
    print("---")

driver.close()

Prints:

This is not a low budget hotel . Yet the hotel offers no amenities. Nothing and no WiFi. In fact, you block the wifi that comes with my celluar plan. I am a part of 2 groups that are loyal to the Sheraton, Alabama A&M and the 9th Episcopal District AMEChurch but the Sheraton is not loyal to us.
---
We are a company that had (5) guest rooms at the hotel. Despite having a credit card on file for room and tax charges, my guest was charged the entire amount to her personal credit card. It has taken me (5) PHONE CALLS and my own time and energy to get this bill reversed. I guess leaving a message with information and a phone number numerous times is IGNORED at this hotel. You can guarantee that we will not return with our business. YOu may thank Kimerlin or Kimberly in your accounting office for her lack of personal service and follow through for the lost business in the future.
---
...


Post a Comment for "Unable To Get Actual Markup From A Page With BeautifulSoup"