Auto fill form and simulate enter

cvp

Is there a way to generate an automatic fill of a web form and simulate the enter key for sites like http://webpagetopdf.com/ ?

filippocld

I think mechanize can do it. I have to do some research though

cvp

I've tried mechanize but it does not find forms on this site. Perhaps they exist several kinds of forms?
Anyway, thanks spending time to help me

cook

@cvp ...maybe try this? I don't know if it suits your needs. The tricky part is with resources that are linked. So if all the linking is correct/corrected then it should do.

edit: One other problem with this is I highly doubt it will work for websites that have content written by javascript. There's a way around that - load the page in a ui.WebView and evaluate javascript to grab the html.

Just feed it an url and it will download a pdf (takes a few moments)

# -*- coding: utf-8 -*-

import requests
import re

def convert_url(url):
    html = re.sub('src="/(?!/)', 'src="' + url, requests.get(url).text) #attempt to get resources from base url
    input_files = {'input_files[]': ('html.html', html)}
    fields = {'from' : 'html', 'to' : 'pdf'}
    r = requests.post('http://c.docverter.com/convert', data=fields, files=input_files)
    with open('converted.pdf', 'wb') as fd:
        for chunk in r.iter_content(chunk_size = 1024):
            fd.write(chunk)

convert_url('http://www.google.com')

cvp

Thanks for your help and for your marvelous sample.
I've searched and tested a lot of online conversion sites, and Python packages and even the Workflow Makepdf action, but without success on some URLs like 'http://www.lacuisinedebernard.com/2016/03/galettes-de-son-davoine-aux-carottes.html#' where my wife downloads some kitchen receipts.
I've tried your site and the PDF does not show the receipt pictures.
And the only site I've found that converts correctly the web page is http://webpagetopdf.com.
This my problem is not yet resolved, but one more time, thanks a lot.

cook

Okay!

I've tried this out...added a few things. It seems the images are all wrapped in DIV's in the html. Taking those out makes it work. Also...there's a lot of other stuff on the page that I assume is unnecessary for cooking.

There are other and better ways to scrape a website. This below is really poorly done and it assumes consistency between webpages for where the recipe content is. I checked a few others on that site and they seem to have the same elements.

I also didn't add any error handling or checking so if it can't find the elements this will throw an error. Like I said....poorly done.

Anyway...give it a try:

# -*- coding: utf-8 -*-

import requests
import re

def convert_url(url):
    html = re.sub('src="/(?!/)', 'src="' + url, requests.get(url).text) #attempt to get resources from base url
    html = re.findall('<h3 class=\'post-title entry-title\'.*<div id=\'post-livres\'>', html, re.DOTALL)[0]
    html = re.sub('<div.*?>|</div>', '', html)
    html = re.sub('<img ', '<img width=100% ', html)
    input_files = {'input_files[]': ('html.html', html)}
    fields = {'from' : 'html', 'to' : 'pdf'}
    r = requests.post('http://c.docverter.com/convert', data=fields, files=input_files)
    with open('converted.pdf', 'wb') as fd:
        for chunk in r.iter_content(chunk_size = 1024):
            fd.write(chunk)
    print 'done'

convert_url('http://www.lacuisinedebernard.com/2016/03/galettes-de-son-davoine-aux-carottes.html')

cook

Also....

I bet this would work well, for everything....:
uiprintpagerenderer

It's possible to do this using the objc_utils in pythonista. So it would simply be loading a webpage in a ui.WebView and then having a function that makes it a PDF. I'm sorry I don't have a lot of time right now to try and do that one... Especially because trying to convert objective c takes me a boatload of time.

But ...converting an entire site to PDF isn't necessarily great. You get a lot of things extra - for example comment sections, huge images, etc. etc. I think it's better to scrape out the content you want, maybe reformat it a little and then do some conversion.

cvp

Whaaaaaaaaa
You're too smart!
I think I'll stop to (try to) program my-self. I wouldn't never find a such solution my-self.
Thanks a lot for your help

omz

@cvp If you want to use webpagetopdf.com, you don't really have to parse the page, emulate clicks etc. You can bypass all that by doing essentially the same as the JavaScript on that page (which isn't much, it basically just generates a random session/conversion ID, makes one GET request to start the conversion, and a couple more to check its status, and to download the result when the conversion has finished). I've made a little script to automate that process without scraping the page:

# coding: utf-8
import requests
import urllib
import random
import time

def random_string():
	alphabet = 'abcdefghijklmnopqrstuvwxyz0123456789'
	return ''.join(random.choice(alphabet) for i in range(16))

def convert_to_pdf(page_url, verbose=True):
	sid = random_string()
	cid = random_string()
	conv_url = 'http://webpagetopdf.com/convert/%s/%s/?url=%s' % (sid, cid, urllib.quote(page_url, ''))
	if verbose:
		print 'Requesting conversion...'
	r = requests.get(conv_url)
	pid = r.json()['pid']
	if verbose:
		print 'pid:', pid
	filename = 'result.pdf'
	while True:
		print 'Checking conversion status...'
		r = requests.get('http://webpagetopdf.com/status/%s/%s/%s' % (sid, cid, pid))
		status_info = r.json()
		if 'file' in status_info:
			filename = status_info['file']
		if verbose:
			print status_info
		if status_info['status'] == 'processing':
			time.sleep(1)
		elif status_info['status'] == 'success':
			break
		else:
			return None
	urllib.urlretrieve('http://webpagetopdf.com/download/%s/%s' % (sid, cid), filename)
	return filename

if __name__ == '__main__':
	print 'Running demo...'
	page_url = 'http://pythonista-app.com'
	filename = convert_to_pdf(page_url)
	if filename:
		import console, os
		console.quicklook(os.path.abspath(filename))
	else:
		print 'Conversion failed'

(It looks more complicated than it is because I'm printing lots of status info to the console) etc.

JonB

And here is a scraping solution. It uses js to fill in the url, then clicks the button. I included js debugging delegate, though it is not needed.

#!python2
# coding: utf-8
import ui,requests, json, time, console, urllib
debug=False
# create debuggin delegate code. not necessary, but helpful for debugging
debugjs='''
// debug_utils.js
// 1) custom console object
console = new Object();
console.log = function(log) {
  // create then remove an iframe, to communicate with webview delegate
  var iframe = document.createElement("IFRAME");
  iframe.setAttribute("src", "ios-log:" + log);
  document.documentElement.appendChild(iframe);
  iframe.parentNode.removeChild(iframe);
  iframe = null;    
};
// TODO: give each log level an identifier in the log
console.debug = console.log;
console.info = console.log;
console.warn = console.log;
console.error = console.log;

window.onerror = (function(error, url, line,col,errorobj) {
   console.log("error: "+error+"%0Aurl:"+url+" line:"+line+"col:"+col+"stack:"+errorobj);})

console.log("logging activated");
'''


class debugDelegate (object):
	def webview_should_start_load(self,webview, url, nav_type):
		if url.startswith('ios-log'):
			print urllib.unquote(url)
		return True
		
# create webview, and turn on debugging delegate
w=ui.WebView()

if debug:
	w.delegate=debugDelegate()
	w.eval_js(debugjs)

# load page
print 'loading page'
w.load_url('http://webpagetopdf.com')

# wait for documentState to start loading, 
for i in range(10):
	if w.eval_js('document.readyState')!='complete':
		break
	time.sleep(1)
# ...then wait for it to complete
while w.eval_js('document.readyState')!='complete':
	time.sleep(1)

# fill in form, and click button
w.eval_js('url=document.getElementById("url");')
w.eval_js('url.value="www.google.com";')
w.eval_js('btn=document.getElementById("start-button");')
print 'clicking button, and waiting for response'
w.eval_js('btn.click()')

# wait until downloadlink is populated, then grab link.  TODO: Timeout
while json.loads(w.eval_js('document.getElementsByClassName("download-link").length<1')):
	time.sleep(0.5)
link=w.eval_js('document.getElementsByClassName("download-link")[0].href')

# download
print link
r=requests.get(link)
with open('webpage.pdf','wb') as f:
	f.write(r.content)
print 'download complete'

cvp

Cook, JonB and OMZ, you're really THE champions!
I've tried your 3 solutions, they are all ok.
The cook's one has to be generalized, what is too complex for my poor knowledge.
The Omz's one uses features I don't understand.
The JonB's one using a webview (I'll "present" it) would allow the user (my wife) to see the web page where the conversion occurs and follow its progress.
Of course, I'm not able to say which is the best solution, but I suppose that is not needed!
Sorry for my English and still thanks a lot to all of you because you spend obviously too much time for helping guys like me.
See you for my next questionsssssssss 😊

I really love this app but I need too much help with my particular requests.