Topics created by rsayeed

@rayseed
Do you want to scrape text off of any old website? Or a particular one?

If it's for random sites, then even using Beautiful soup will be a little tough. Well, anything would be tough for that matter :)

Usually site developers use some constants throughout their code - but going from different site to different site it's not constant and that's where you would need flexibility in your implementation of Beautiful Soup.

But, often, headings are in heading tags and text content is in p tags and etc. So...a generic scraper is possible - but may not get everything, or most likely you'll get more than what you want.

Example:

# coding: utf-8 import requests from bs4 import BeautifulSoup url = 'http://www.cheese.com/' soup = BeautifulSoup(requests.get(url).text) for i in soup.find_all(lambda tag: tag.parent.name == 'body'): print i.text.strip() #gives a lot of junk...

Topics created by rsayeed

Capture Specific Webpage text using regex/searchHTML and save as new textile Editorial • • rsayeed

Capture Specific Webpage text using regex/searchHTML and save as new textile
Editorial • • rsayeed