I am trying to make a script that caches websites so I can easily scrape data without have to use proxies (I scrape enough to the point where my traffic would get blocked). My first step is using urllib2 to get the page source:
cc = urllib2.urlopen(args.site) source = cc.read()
Once I have the page source, I need to look through the source code to find any CSS files that are linked relatively to the site. The two tags I would be looking for are <link, href=, and stylesheet. The "rel" is where the CSS is relatively linked, the stylesheet proves that the file is CSS, and directly after the hyper-refrence (href) is the name of the css file in quotes that I need.
<link rel="stylesheet" type="text/css" href="main.css" />
How could I get this information?
edit: I need to pick this css file out of all the HTML code that the webpage has.