Finding CSS files inside HTML source

low

I am trying to make a script that caches websites so I can easily scrape data without have to use proxies (I scrape enough to the point where my traffic would get blocked). My first step is using urllib2 to get the page source:

cc = urllib2.urlopen(args.site)
source = cc.read()

Once I have the page source, I need to look through the source code to find any CSS files that are linked relatively to the site. The two tags I would be looking for are <link, href=, and stylesheet. The "rel" is where the CSS is relatively linked, the stylesheet proves that the file is CSS, and directly after the hyper-refrence (href) is the name of the css file in quotes that I need.
An example;

<link rel="stylesheet" type="text/css" href="main.css" />

How could I get this information?
edit: I need to pick this css file out of all the HTML code that the webpage has.

kristof_be

I would personally use Requests and BeautifulSoup (both modules are included by default with Pythonista).

Furthermore, I would set up the HTTP request to use a modified User-Agent value, in order to hide that you're running this from a script, and pose as a regular web browser (note that this is not foolproof, but it helps).

If you only need the links to the CSS, you can instruct BeautifulSoup to parse only to links, thus speeding up the process, by using the SoupStrainer class (more info here).

Hope this helps.