Appex Safari content
Is there a way to get the full content from a safari page? Currently
_appex.get_input()only returns the URL.
The easiest way would be to download the page, using
import urllib2, appex response = urllib2.urlopen(appex.get_url()') html = response.read()
My favorite method for page scraping is
Couldn't you just use bs4 to parse the top page, the load the individual iframes?
Alternatively, use a webview, with a custom delegate to catch iframes, then load those individually. As an example of some useful js logging functions, and how to get source from a loaded page: You can modify the delegate to also look for urls and handle the
There's no way to do this in pythonista. First of all, the way I understand it, content blockers don't pull out chunks of the html, just hide it. So letting content blockers activate and then downloading the HTML won't look different than if you download the HTML in the beginning. With my solution, you could parse it with
bs4to find your iframe URLs as @JonB mentioned
requestswill be easier than urllib2. But for pages that are heavily dynamically generated, you will want to use something lets js run.
Hyashi, you seemed to think this was possible in a safariVC? If so, there may be an objc solution... if you can find references of how to do what you want in objectivec, it may be translateable to pythonista, as ling as it does not require new permissions.
If you run your own delegate, you can load whatever you want, since this happens outside of the browser. you would return False, then open the iframe in its own webview instance. although i am not sure if the delegate gets access to cookies/headers/etc ehich might be needed to open the same content...
what site are you trying to scrape?