Screen Scraping

reefboy1

What is screen scraping? From what I know it's like getting info from some database(i think). And also how can I screen scrape? Thanks for your answers

ccc

Screen Scraping is the art and science of:

1) getting all the text from a computer display (terminal, webpage, etc.) and then
2) selecting out only those data fields of interest for storage or further processing.

It used to be about getting data from terminal displays but these daze it is mostly about scraping data off of web pages. The Pythonista tools that I prefer for web scraping are requests (for getting all the HTML of a webpage) and beautiful soup 4 (selecting out only those data fields of interest). bs4 is complicated but it is supercool once you get the hang of it.

Here are two recent examples of web scraping. They follow the model:

import bs4, requests

def get_beautiful_soup(url):
    return bs4.BeautifulSoup(requests.get(url).text)

soup = get_beautiful_soup('http://omz-forums.appspot.com/pythonista')
print(soup.prettify())
# See: http://www.crummy.com/software/BeautifulSoup/bs4/doc for all the things you can do with the soup.

As you can see by looking at the output, the harder part is selecting out only those data fields of interest. ;-)

If bs4 is too complicated for your purposes, you can do html = requests.get(url).text and then try using str.find() and str.partition() or Python's regular expressions module, re as a poor man's soup. Happy scraping.

reefboy1

Cool! Thanks for the response

scraperhunk

This post is deleted!