Skip to content Skip to sidebar Skip to footer

Get The Contents(full Of Text) From The Paragraph Beautiful Soup

I want to extract the contents (full of text) of a paragraph from a news webpages, I have a set of url's from which it should extract only the content of a paragraphs. When i use t

Solution 1:

This is because you are having print p.read() line that prints out the whole HTML page.

To get the article text, find it by id and then all paragraphs inside the article.

Example using CSS Selector:

soup = BeautifulSoup(p)
print''.join(p.text for p in soup.select('article#story p.story-content'))

Prints:

ANKARA, Turkey —  The Obama administration on Monday began the work of trying to determine
...

FYI, article#story p.story-content would match all p tags that have story-content class inside the article tag with story id.

Post a Comment for "Get The Contents(full Of Text) From The Paragraph Beautiful Soup"