Skip to content Skip to sidebar Skip to footer

Can't Get All Span Tag Inside Div Element Beautifulsoup

I am scraping this site and I need to get the salary value from it as shown in the image I have tried to do the flowing: import requests from bs4 import BeautifulSoup result = requ

Solution 1:

Since Beautiful Soup is just a parser that works with the content you provide it with, it has nothing to do with page retrieval or rendering.

The solution that I found in my case is to use selenium to get JS rendered page.

The working code:

from bs4 import BeautifulSoup
from webdriver_manager import driver
from webdriver_manager.chrome import ChromeDriver, ChromeDriverManager
from selenium import webdriver

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://wuzzuf.net/jobs/p/xGYIYbJlYhsC-Senior-Python-Developer-Cairo-Egypt?o=1&l=sp&t=sj&a=python|search-v3|hpb")

page = driver.page_source
soup = BeautifulSoup(page, "lxml")
salaries_div = soup.find_all("div",{"class":"css-rcl8e5"})
for span in salaries_div[3].select("span"):
    print (span)

Solution 2:

If the content on your page is generated by JavaScript, try Selenium. I think it has all the functionality you need. Your code will then look like this:

### Let's import Selenium!from selenium.webdriver import Firefox,FirefoxOptions
### At first, we need to say Selenium it should not show graphical window, so we will use Firefox in headless mode.### We do so by creating instance of FirefoxOptions and setting its attribute 'headless' to True
opt=FirefoxOptions()
opt.headless=True### Now, we create the actual Firefox instance and we pass it our FirefoxOptions as keyword argument 'options'
ffx=Firefox(options=opt)
### We visit your website with ffx.get()
ffx.get("https://wuzzuf.net/jobs/p/xGYIYbJlYhsC-Senior-Python-Developer-Cairo- Egypt?o=1&l=sp&t=sj&a=python|search-v3|hpb")
### Let's now search for your spans with ffx.find_elements_by_css_selector()
elems=ffx.find_elements_by_css_selector("div.css-rcl8e5:nth-child(5)>span")
### And print the elementsfor elem in elems:
    print(elem.get_attribute('outerHTML'))

This (at least at my case) outputs:

<spanclass="css-wn0avc">Salary<!-- -->:</span><spanclass="css-47jx3m"><spanclass="css-4xky9y">Confidential</span></span>

To access the second element, use elems[-1], and elems[-1].get_attribute('outerHTML') to get its html source.

But do not forget to install Selenium with

pip install selenium

And you should have Firefox with geckodriver installed.

Post a Comment for "Can't Get All Span Tag Inside Div Element Beautifulsoup"