Python Crawler Ieee Paper Keywords

February 16, 2024 Post a Comment

i trying to use crawler to get ieee paper keywords but now i get a error how can to fix my crawler? my code is here import requests import json from bs4 import BeautifulSoup ieee

Solution 1:

Here's another answer. I don't know what you are doing with 's' in your code after the load (replace) in my code.

The code below doesn't thrown an error, but again how are you using 's'

import requests
import json
from bs4 import BeautifulSoup

ieee_content = requests.get("http://ieeexplore.ieee.org/document/8465981", timeout=180)
soup = BeautifulSoup(ieee_content.text, 'xml')
tag = soup.find_all('script')

# i is a listfor i in tag[9]:
   metadata_format = re.compile(r'global.document.metadata=.*', re.MULTILINE)
   metadata = re.findall(metadata_format, i)
   iflen(metadata) != 0:
      # convert the list 
      convert_to_json = json.dumps(metadata)
      x = json.loads(convert_to_json)
      s = x[0].replace("'", '"').replace(";", '')
      ############################################ I don't know what you plan to do with 's'###########################################print (s)

Solution 2:

Apparently in line 65 some of the data provided in i did not suite the regex pattern you're trying to use. Therefor your [0] will not work as the data returned is not an array of suitable length.

Answer :

x = json.loads(re.findall('global.document.metadata=(.*;)', i)
if x:
    s = x[0].replace("'", '"').replace(";", ''))

Python Guru

Python Crawler Ieee Paper Keywords

Solution 1:

Solution 2:

Post a Comment for "Python Crawler Ieee Paper Keywords"