Following Links In Python Assignment Using Beautifulsoup
Solution 1:
[Edit: Cut+pasted this line from comments] Hi! I had to work in a similar exercise, and because i had some doubts i found your question. Here is my code and I think it works. I hope it will be helpful for you
import urllib
from bs4 import BeautifulSoup
url = 'http://py4e-data.dr-chuck.net/known_by_Fikret.html'
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
count = 8
position = 18
tags_lst = []
for x in xrange(count-1):
tags = soup('a')
my_tags = tags[position-1]
needed_tag = my_tags.get('href', None)
tags_lst.append(needed_tag)
url = str(needed_tag)
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
Solution 2:
I put the solution below, tested and working well as of today.
importing the require modules
import urllib.request, urllib.parse, urllib.error
from bs4 import BeautifulSoup
import re
accessing websites
url = "http://py4e-data.dr-chuck.net/known_by_Vairi.html"html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
all_num_list = list()
link_position = 18Process_repeat = 7
Retrieve all of the anchor tags
tags = soup('a')
while Process_repeat - 1 >= 0 :
print("Process round", Process_repeat)
target = tags[link_position - 1]
print("target:", target)
url = target.get('href', 2)
print("Current url", url)
html = urllib.request.urlopen(url).read()
soup = BeautifulSoup(html, 'html.parser')
tags = soup('a')
Process_repeat = Process_repeat - 1
Solution 3:
Try this. You can leave entering the URL. There is sample of your former link. Good Luck!
import urllib.request
from bs4 import BeautifulSoup
import ssl
# Ignore SSL certificate errors
ctx = ssl.create_default_context()
ctx.check_hostname = False
ctx.verify_mode = ssl.CERT_NONE
url = input('Enter url ')
cn = input('Enter count: ')
cnint = int(cn)
pos = input('Enter position: ')
posint = int(pos)
html = urllib.request.urlopen('http://py4e-data.dr-chuck.net/known_by_Fikret.html''''url''', context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
tags_lst = list()
for x inrange(0,cnint):
tags = soup('a')
my_tags = tags[posint-1]
needed_tag = my_tags.get('href', None)
url = str(needed_tag)
html = urllib.request.urlopen(url,context=ctx).read()
soup = BeautifulSoup(html, 'html.parser')
print(my_tags.get('href', None))
Solution 4:
Your BeautifulSoup import was wrong. I don't think it works with the code you show. Also your lower loop was confusing. You can get the list of urls you want by slicing the completely retrieved list.
I've hardcoded your url in my code because it was easier than typing it in each run.
Try this:
import urllib
from bs4 import BeautifulSoup
#url = raw_input('Enter - ')
url = 'http://python-data.dr-chuck.net/known_by_Fikret.html'
html = urllib.urlopen(url).read()
soup = BeautifulSoup(html)
# print soup
count = int(raw_input('Enter count: '))+1
position = int(raw_input('Enter position: '))
tags = soup('a')
# next line gets count tags starting from position
my_tags = tags[position: position+count]
tags_lst = []
for tag in my_tags:
needed_tag = tag.get('href', None)
tags_lst.append(needed_tag)
print tags_lst
Solution 5:
Almost all solutions to this assignment have two sections to load the urls. Instead, I defined a function that prints the relevant link for any given url.
Initially, the function will use the Fikret.html url as input. Subsequent inputs rely on refreshed urls that appear on the required position.
The important line of code is this one: url = allerretour(url)[position-1]
This gets the new url that feeds the loop another round.
import urllib
from bs4 import BeautifulSoup
url = 'http://py4e-data.dr-chuck.net/known_by_Fikret.html'# raw_input('Enter URL : ')
position = 3# int(raw_input('Enter position : '))
count = 4#int(raw_input('Enter count : '))defallerretour(url):
print('Retrieving: ' + url)
soup = BeautifulSoup(urllib.urlopen(url).read())
link = list()
for tag in soup('a'):
link.append(tag.get('href', None))
return(link)
for x inrange(1, count + 2):
url = allerretour(url)[position-1]
Post a Comment for "Following Links In Python Assignment Using Beautifulsoup"