Other ways to convert Markdown/HTML to PDF in iOS

peterh86

I've been experimenting with workflows to convert Markdown/HTML to PDF, using a CSS and wanting to set page breaks. Here's what I found:

Using a cloud service - Docverter

The free services usually have limited features, though Docverter seemed promising. I wrote a workflow to use It, based on Caleb McDaniel's Pythonista script here http://wcm1.web.rice.edu/pandoc-on-ios.html. Docverter supports @page, page breaks, page-break-after:avoid (to keep headings on the same page as the following paragraph) and links in a displayed PDF are live.

Sadly it always lost the last few sentences I sent it, and my Python/internet skills are not good enough to find where the problem is. I found other problems:

Caleb's script sent the CSS and markdown as two files, but I found this was not reliable. I converted to HTML, with a CSS, so I only needed to send one file. even so, it occasionally failed.
There are only a couple of fonts available, presumably for copyright reasons, though you can use @fontface to send other font files.
I couldn't see how to hyphenate, so either the right margin is very ragged, or you justify it and get terrible gaps in the text.
It doesn't handle images wider that the print area well (does not scale down properly)

Using an iOS HTML converter app

There are many apps to convert HTML to PDF. The support seems to be for CSS2.1, so you can' t use @page but you can use page breaks. Unfortunately all apps seem to have a bug with page-break-before:always and most have a bug with page-break-before:always (insert two page breaks), so most aren't suitable. I used:

PDF-Converter (Readdle) as described here http://editorial-app.appspot.com/workflow/5660638047109120/vHXf1o0Is1o (in the webbrowser.open, change rhttp to pdfhttp). The top and bottom margins are a bit small for a printed page. And since iOS 7, this only gives me US paper size, which I don't want. I think it is arrogant of a developer not to allow for A4 paper as well.
PDF This Page (Julian Yap) This is what I use now. The top and bottom margins are better, and you can select US or A4 pages. My workflow for this is here: http://editorial-app.appspot.com/workflow/5789374323097600/pWAw-_K2jfY

Neither of these apps let you add a header or footer, or set the top and bottom margins and you can't hyphenate. Page-break-after:avoid doesn't work and links in a displayed PDF are not live. Images do work well.

So, I'm still looking for the perfect app for this.

hvmhvm

I installed xhtml2pdf (together with reportlab-2.7, pyPdf-1.13, html5lib) in Editorial.

dlehman

@hvmhvm: Can you expand a bit on how you implemented this?

hvmhvm

Here is the python script I used to install everything (I have a directory 'scripts' that contains all python stuff):

<pre>
import urllib
import tarfile
from zipfile import ZipFile
import shutil
import console
import os
import editor
from os.path import expanduser

os.chdir(expanduser('~/Documents/'))

url = 'http://www.reportlab.com/ftp/reportlab-2.7.tar.gz'
fname='reportlab-2.7'
sname='src/reportlab'
dname='scripts/reportlab'
print 'Downloading '+dname+'...'
urllib.urlretrieve(url, fname+'.tar.gz')

print 'Extracting...'
t = tarfile.open(fname+'.tar.gz')
t.extractall()

if os.path.isdir(dname):
shutil.rmtree(dname)
shutil.move(fname+'/'+sname, dname)

print 'Cleaning up...'
shutil.rmtree(fname)
os.remove(fname+'.tar.gz')

url='http://pybrary.net/pyPdf/pyPdf-1.13.tar.gz'
fname='pyPdf-1.13'
sname='pyPdf'
dname='scripts/pyPdf'
print 'Downloading '+dname+'...'
urllib.urlretrieve(url, fname+'.tar.gz')

print 'Extracting...'
t = tarfile.open(fname+'.tar.gz')
t.extractall()

if os.path.isdir(dname):
shutil.rmtree(dname)
shutil.move(fname+'/'+sname, dname)

print 'Cleaning up...'
shutil.rmtree(fname)
os.remove(fname+'.tar.gz')

url = 'https://github.com/html5lib/html5lib-python/archive/master.zip'
fname='html5lib-python-master'
sname='html5lib'
dname='scripts/html5lib'
print 'Downloading '+dname+'...'
urllib.urlretrieve(url, fname+'.zip')

print 'Extracting...'
with ZipFile(fname+'.zip', 'r') as z:
z.extractall()

if os.path.isdir(dname):
shutil.rmtree(dname)
shutil.move(fname+'/'+sname, dname)

print 'Cleaning up...'
shutil.rmtree(fname)
os.remove(fname+'.zip')

url='https://github.com/chrisglass/xhtml2pdf/archive/master.zip'
fname='xhtml2pdf-master'
sname='xhtml2pdf'
dname='scripts/xhtml2pdf'
print 'Downloading '+dname+'...'
urllib.urlretrieve(url, fname+'.zip')

print 'Extracting...'
with ZipFile(fname+'.zip', 'r') as z:
z.extractall()

if os.path.isdir(dname):
shutil.rmtree(dname)
shutil.move(fname+'/'+sname, dname)

print 'Cleaning up...'
shutil.rmtree(fname)
os.remove(fname+'.zip')

url='http://www.reportlab.com/ftp/pfbfer-20070710.zip'
fname='scripts/xhtml2pdf/fonts/pfbfer-20070710'
sname='pfbfer-20070710'
dname='scripts/xhtml2pdf/fonts'
if os.path.isdir(dname):
shutil.rmtree(dname)
os.mkdir(dname)
print 'Downloading '+sname+'...'
urllib.urlretrieve(url, fname+'.zip')

print 'Extracting...'
dr=os.getcwd()
os.chdir(dname)
with ZipFile(sname+'.zip', 'r') as z:
z.extractall()
os.chdir(dr)

print 'Cleaning up...'
os.remove(fname+'.zip')

editor.reload_files()
print 'Done'

</pre>

Then the workflow contains a 'Document Text' step, followed by a 'Convert Markdown to HTML' step, followed by the following 'Run Python Script' step:
<pre>
#coding: utf-8
import sys
from os.path import expanduser
if not(expanduser('~/Documents/scripts') in sys.path):
sys.path.append(expanduser('~/Documents/scripts'))
import workflow
import os.path
import editor
import xhtml2pdf.pisa as pisa
import StringIO
from urllib import unquote

pisa.showLogging()

def link_callback(uri,rel):
if not(uri.startswith('/')):
return dir+'/'+unquote(uri)
return unquote(uri)

action_in = workflow.get_input()
pre='<html>\n<head>\n<meta charset="utf-8"/>\n<style>\n p {font-size:12pt}\n</style>\n</head>\n\n<body>\n'
post='\n</body>\n</html>'
inp=StringIO.StringIO(pre+action_in.encode('ascii', 'xmlcharrefreplace')+post)
p = editor.get_path()
dir = os.path.split(p)[0]
f = os.path.split(p)[1]
fn= os.path.splitext(f)[0]
fl = file(dir+'/'+fn+".pdf", "w+b")
print('processing '+p)
pdf = pisa.CreatePDF(inp,fl,dir,link_callback=link_callback)
fl.close()
if pdf.err!=0:
print(pdf.err)
else:
print('done!')
</pre>

peterh86

Thanks for posting that.

omz

@hvmhvm Thanks for sharing! Btw, Pythonista already includes html5lib, so it shouldn't be necessary to install that.

peterh86

To round this thread off, here's a workflow to convert markdown to PDFs via LaTeX and the iOS app Texpad:
http://editorial-app.appspot.com/workflow/5245394703351808/W5YhtQWbp0g

It works really well, is fast, and creates beautiful PDFs. You can type LaTeX commands in the Markdown, so you can do real tables, equations and all the rest.

phillipsmn

I made a workflow public that lets you convert your markdown document to pdf and saves it in your dropbox directory. You can download it from here in Editorial.

jackadision

If you are a normal user or unable to execute any manual method then use any converter tool because this types of tool is very easy to operate and also able to give you good output. Free download this tool from http://www.softwaredownloadcentre.com/software/html-to-pdf.php

plmtr

@hvmhvm: Whoa, quite an eye-opener. I had no idea you could install scripts and run in such a manner.

Anyway I managed to actually get these all installed and a conversion success message...Done!

Although the resulting pdf contains only the title of my md file, not the content. Any ideas?

Also, your Workflow was failing on me initially, protesting that xhtml2PDF was looking for PyPDF2 when in fact your dependencies install script installs the older PyPDF 1.1.3. I got it working after changing that portion to:

url = 'https://github.com/mstamy2/PyPDF2/archive/master.zip'  
fname='PyPDF2-master'  
sname='PyPDF2'  
dname='scripts/PyPDF2'  
print 'Downloading '+dname+'...'  
urllib.urlretrieve(url, fname+'.zip')  

print 'Extracting...'  
with ZipFile(fname+'.zip', 'r') as z:  
z.extractall()

if os.path.isdir(dame):  
shutil.rmtree(dname)  
shutil.move(fname+'/'+sname, dname)  

print 'Cleaning up...'  
shutil.rmtree(fname)  
os.remove(fname+'.zip')

Cheers.

Jason

peterh86

Now the best way is to use Ulysses for iPad. It adjusts the width of the whitespace between words and between letters so the PDF looks great; CSS-based PDF converters usually don't do this. And it handles images easily.

Later versions of Ulysses will have an X-callback scheme, so maybe we can send Markdown from Editorial to be converted to PDF.

I don't like the available styles much, so I wrote a workflow so I can edit them on the iPad:
http://www.editorial-workflows.com/workflow/5894304400670720/UxmTPJfJ_W0

xxxxx

Re: Adjusting "whitespace"

That's nothing special. You can enable kerning in Editorial as well. Add "font-feature-settings: "kern"" to your CSS. Other OpenType features like proportional figures ("pnum") also work.

I really hope we will be able to generate PDF with the inbuilt iOS PDF engine in the next update.

iamjebautista

I personally prefer using cloud service like pdfmyurl.com. Their APIs are extremely easy to use and their pricing is better than their competitors.

GrabzIt

As the owner of GrabzIt I would like to recommend you check us out we provide a highly flexible HTML to PDF API. The linked example has a Python demo for your convenience.