Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Auto fill form and simulate enter
-
Is there a way to generate an automatic fill of a web form and simulate the enter key for sites like http://webpagetopdf.com/ ?
-
I think mechanize can do it. I have to do some research though
-
I've tried mechanize but it does not find forms on this site. Perhaps they exist several kinds of forms?
Anyway, thanks spending time to help me -
@cvp ...maybe try this? I don't know if it suits your needs. The tricky part is with resources that are linked. So if all the linking is correct/corrected then it should do.
edit: One other problem with this is I highly doubt it will work for websites that have content written by javascript. There's a way around that - load the page in a ui.WebView and evaluate javascript to grab the html.
Just feed it an url and it will download a pdf (takes a few moments)
# -*- coding: utf-8 -*- import requests import re def convert_url(url): html = re.sub('src="/(?!/)', 'src="' + url, requests.get(url).text) #attempt to get resources from base url input_files = {'input_files[]': ('html.html', html)} fields = {'from' : 'html', 'to' : 'pdf'} r = requests.post('http://c.docverter.com/convert', data=fields, files=input_files) with open('converted.pdf', 'wb') as fd: for chunk in r.iter_content(chunk_size = 1024): fd.write(chunk) convert_url('http://www.google.com')
-
Thanks for your help and for your marvelous sample.
I've searched and tested a lot of online conversion sites, and Python packages and even the Workflow Makepdf action, but without success on some URLs like 'http://www.lacuisinedebernard.com/2016/03/galettes-de-son-davoine-aux-carottes.html#' where my wife downloads some kitchen receipts.
I've tried your site and the PDF does not show the receipt pictures.
And the only site I've found that converts correctly the web page is http://webpagetopdf.com.
This my problem is not yet resolved, but one more time, thanks a lot. -
Okay!
I've tried this out...added a few things. It seems the images are all wrapped in DIV's in the html. Taking those out makes it work. Also...there's a lot of other stuff on the page that I assume is unnecessary for cooking.
There are other and better ways to scrape a website. This below is really poorly done and it assumes consistency between webpages for where the recipe content is. I checked a few others on that site and they seem to have the same elements.
I also didn't add any error handling or checking so if it can't find the elements this will throw an error. Like I said....poorly done.
Anyway...give it a try:
# -*- coding: utf-8 -*- import requests import re def convert_url(url): html = re.sub('src="/(?!/)', 'src="' + url, requests.get(url).text) #attempt to get resources from base url html = re.findall('<h3 class=\'post-title entry-title\'.*<div id=\'post-livres\'>', html, re.DOTALL)[0] html = re.sub('<div.*?>|</div>', '', html) html = re.sub('<img ', '<img width=100% ', html) input_files = {'input_files[]': ('html.html', html)} fields = {'from' : 'html', 'to' : 'pdf'} r = requests.post('http://c.docverter.com/convert', data=fields, files=input_files) with open('converted.pdf', 'wb') as fd: for chunk in r.iter_content(chunk_size = 1024): fd.write(chunk) print 'done' convert_url('http://www.lacuisinedebernard.com/2016/03/galettes-de-son-davoine-aux-carottes.html')
-
Also....
I bet this would work well, for everything....:
uiprintpagerendererIt's possible to do this using the objc_utils in pythonista. So it would simply be loading a webpage in a ui.WebView and then having a function that makes it a PDF. I'm sorry I don't have a lot of time right now to try and do that one... Especially because trying to convert objective c takes me a boatload of time.
But ...converting an entire site to PDF isn't necessarily great. You get a lot of things extra - for example comment sections, huge images, etc. etc. I think it's better to scrape out the content you want, maybe reformat it a little and then do some conversion.
-
Whaaaaaaaaa
You're too smart!
I think I'll stop to (try to) program my-self. I wouldn't never find a such solution my-self.
Thanks a lot for your help -
@cvp If you want to use webpagetopdf.com, you don't really have to parse the page, emulate clicks etc. You can bypass all that by doing essentially the same as the JavaScript on that page (which isn't much, it basically just generates a random session/conversion ID, makes one GET request to start the conversion, and a couple more to check its status, and to download the result when the conversion has finished). I've made a little script to automate that process without scraping the page:
# coding: utf-8 import requests import urllib import random import time def random_string(): alphabet = 'abcdefghijklmnopqrstuvwxyz0123456789' return ''.join(random.choice(alphabet) for i in range(16)) def convert_to_pdf(page_url, verbose=True): sid = random_string() cid = random_string() conv_url = 'http://webpagetopdf.com/convert/%s/%s/?url=%s' % (sid, cid, urllib.quote(page_url, '')) if verbose: print 'Requesting conversion...' r = requests.get(conv_url) pid = r.json()['pid'] if verbose: print 'pid:', pid filename = 'result.pdf' while True: print 'Checking conversion status...' r = requests.get('http://webpagetopdf.com/status/%s/%s/%s' % (sid, cid, pid)) status_info = r.json() if 'file' in status_info: filename = status_info['file'] if verbose: print status_info if status_info['status'] == 'processing': time.sleep(1) elif status_info['status'] == 'success': break else: return None urllib.urlretrieve('http://webpagetopdf.com/download/%s/%s' % (sid, cid), filename) return filename if __name__ == '__main__': print 'Running demo...' page_url = 'http://pythonista-app.com' filename = convert_to_pdf(page_url) if filename: import console, os console.quicklook(os.path.abspath(filename)) else: print 'Conversion failed'
(It looks more complicated than it is because I'm printing lots of status info to the console) etc.
-
And here is a scraping solution. It uses js to fill in the url, then clicks the button. I included js debugging delegate, though it is not needed.
#!python2 # coding: utf-8 import ui,requests, json, time, console, urllib debug=False # create debuggin delegate code. not necessary, but helpful for debugging debugjs=''' // debug_utils.js // 1) custom console object console = new Object(); console.log = function(log) { // create then remove an iframe, to communicate with webview delegate var iframe = document.createElement("IFRAME"); iframe.setAttribute("src", "ios-log:" + log); document.documentElement.appendChild(iframe); iframe.parentNode.removeChild(iframe); iframe = null; }; // TODO: give each log level an identifier in the log console.debug = console.log; console.info = console.log; console.warn = console.log; console.error = console.log; window.onerror = (function(error, url, line,col,errorobj) { console.log("error: "+error+"%0Aurl:"+url+" line:"+line+"col:"+col+"stack:"+errorobj);}) console.log("logging activated"); ''' class debugDelegate (object): def webview_should_start_load(self,webview, url, nav_type): if url.startswith('ios-log'): print urllib.unquote(url) return True # create webview, and turn on debugging delegate w=ui.WebView() if debug: w.delegate=debugDelegate() w.eval_js(debugjs) # load page print 'loading page' w.load_url('http://webpagetopdf.com') # wait for documentState to start loading, for i in range(10): if w.eval_js('document.readyState')!='complete': break time.sleep(1) # ...then wait for it to complete while w.eval_js('document.readyState')!='complete': time.sleep(1) # fill in form, and click button w.eval_js('url=document.getElementById("url");') w.eval_js('url.value="www.google.com";') w.eval_js('btn=document.getElementById("start-button");') print 'clicking button, and waiting for response' w.eval_js('btn.click()') # wait until downloadlink is populated, then grab link. TODO: Timeout while json.loads(w.eval_js('document.getElementsByClassName("download-link").length<1')): time.sleep(0.5) link=w.eval_js('document.getElementsByClassName("download-link")[0].href') # download print link r=requests.get(link) with open('webpage.pdf','wb') as f: f.write(r.content) print 'download complete'
-
Cook, JonB and OMZ, you're really THE champions!
I've tried your 3 solutions, they are all ok.
The cook's one has to be generalized, what is too complex for my poor knowledge.
The Omz's one uses features I don't understand.
The JonB's one using a webview (I'll "present" it) would allow the user (my wife) to see the web page where the conversion occurs and follow its progress.
Of course, I'm not able to say which is the best solution, but I suppose that is not needed!
Sorry for my English and still thanks a lot to all of you because you spend obviously too much time for helping guys like me.
See you for my next questionsssssssss 😊I really love this app but I need too much help with my particular requests.