Skip to content Skip to sidebar Skip to footer

Python Crawler Ieee Paper Keywords

i trying to use crawler to get ieee paper keywords but now i get a error how can to fix my crawler? my code is here import requests import json from bs4 import BeautifulSoup ieee

Solution 1:

Here's another answer. I don't know what you are doing with 's' in your code after the load (replace) in my code.

The code below doesn't thrown an error, but again how are you using 's'

import requests
import json
from bs4 import BeautifulSoup

ieee_content = requests.get("http://ieeexplore.ieee.org/document/8465981", timeout=180)
soup = BeautifulSoup(ieee_content.text, 'xml')
tag = soup.find_all('script')

# i is a listfor i in tag[9]:
   metadata_format = re.compile(r'global.document.metadata=.*', re.MULTILINE)
   metadata = re.findall(metadata_format, i)
   iflen(metadata) != 0:
      # convert the list 
      convert_to_json = json.dumps(metadata)
      x = json.loads(convert_to_json)
      s = x[0].replace("'", '"').replace(";", '')
      ############################################ I don't know what you plan to do with 's'###########################################print (s)

Solution 2:

Apparently in line 65 some of the data provided in i did not suite the regex pattern you're trying to use. Therefor your [0] will not work as the data returned is not an array of suitable length.

Answer :

x = json.loads(re.findall('global.document.metadata=(.*;)', i)
if x:
    s = x[0].replace("'", '"').replace(";", ''))

Post a Comment for "Python Crawler Ieee Paper Keywords"