Can't Get All Span Tag Inside Div Element Beautifulsoup

September 26, 2023 Post a Comment

I am scraping this site and I need to get the salary value from it as shown in the image I have tried to do the flowing: import requests from bs4 import BeautifulSoup result = requ

Solution 1:

Since Beautiful Soup is just a parser that works with the content you provide it with, it has nothing to do with page retrieval or rendering.

The solution that I found in my case is to use selenium to get JS rendered page.

The working code:

from bs4 import BeautifulSoup
from webdriver_manager import driver
from webdriver_manager.chrome import ChromeDriver, ChromeDriverManager
from selenium import webdriver

driver = webdriver.Chrome(ChromeDriverManager().install())
driver.get("https://wuzzuf.net/jobs/p/xGYIYbJlYhsC-Senior-Python-Developer-Cairo-Egypt?o=1&l=sp&t=sj&a=python|search-v3|hpb")

page = driver.page_source
soup = BeautifulSoup(page, "lxml")
salaries_div = soup.find_all("div",{"class":"css-rcl8e5"})
for span in salaries_div[3].select("span"):
    print (span)

Solution 2:

If the content on your page is generated by JavaScript, try Selenium. I think it has all the functionality you need. Your code will then look like this:

### Let's import Selenium!from selenium.webdriver import Firefox,FirefoxOptions
### At first, we need to say Selenium it should not show graphical window, so we will use Firefox in headless mode.### We do so by creating instance of FirefoxOptions and setting its attribute 'headless' to True
opt=FirefoxOptions()
opt.headless=True### Now, we create the actual Firefox instance and we pass it our FirefoxOptions as keyword argument 'options'
ffx=Firefox(options=opt)
### We visit your website with ffx.get()
ffx.get("https://wuzzuf.net/jobs/p/xGYIYbJlYhsC-Senior-Python-Developer-Cairo- Egypt?o=1&l=sp&t=sj&a=python|search-v3|hpb")
### Let's now search for your spans with ffx.find_elements_by_css_selector()
elems=ffx.find_elements_by_css_selector("div.css-rcl8e5:nth-child(5)>span")
### And print the elementsfor elem in elems:
    print(elem.get_attribute('outerHTML'))

This (at least at my case) outputs:

Baca Juga

<spanclass="css-wn0avc">Salary<!-- -->:</span><spanclass="css-47jx3m"><spanclass="css-4xky9y">Confidential</span></span>

To access the second element, use elems[-1], and elems[-1].get_attribute('outerHTML') to get its html source.

But do not forget to install Selenium with

pip install selenium

And you should have Firefox with geckodriver installed.

Python Guru

Can't Get All Span Tag Inside Div Element Beautifulsoup

Solution 1:

Solution 2:

Post a Comment for "Can't Get All Span Tag Inside Div Element Beautifulsoup"