I would like to convert some documents from the Perseus Library into EPUB format. The documents have been made available in TEI XML format, e.g.:
As I understand it, the conversion can be achieved by applying an XSL Transformation. There is a site set up for this, which is fine for a small number of documents, but not especially efficient for more than that:
So I had wondered if I could do something with Pythonista. However, it seems that every approach I consider is thwarted in some way. For example, the libxml library has XSLT functions, but the library is not included with Pythonista and is not pure python.
I wonder if someone could suggest something that would work.
Thank you for the suggestion. It's under consideration. Other ideas also welcome.
@flipflap, do you have an existing transformation definition?
If not, you could look at SAX as a programmatic way of transforming the elements in a way you need them transformed.
Another idea is to use Pythonista as a scraper/driver, automating the conversion using the web site you linked. You can throttle the speed etc., but of course feasibility depends on the amount and size of materials you have.
As a follow-up, it appears that safari supports xslt 1.0 but not 2.0. Saxon-JS 2 appears to provide a full implemention of xslt 3.0 /xpath 3.1 in the browser -- although your xsl files must be precompiled using an included xsl compiler. There seems to also be an older version, saxon-ce, which which does not require precompiling the xsl into sef files. If you can tolerate precompiling the transforms, it is supposedly reasonably performant.
I also noticed that Pythonista has a web server library, but I wasn’t able to get to get a server response when running the included test() function (and I’m not sure it can run concurrently with the other stuff I want to do given the iOS limitations on fork/background processing etc.).
You don't need a server, you can load files via file:// urls. Nor do you need user interaction (use load_url).
You do need to "precompile" the xslt files into sef files.
(You could also use online services, and automate using
requests, which will be much easier, but requires wifi/data)
I tried using transformToFragment, but it returned null. I think the problem is (possibly) that it doesn't support XSLT 2.0. Perhaps XSLT 1.0 files would suffice to perform the transform. I haven't yet been able to Google up such files, but I'll try again later.
Apologies for the off topic tangent, but I wanted to offer some encouragement.
I spent the past two weeks trying to find a simple way to handle complete image metadata read/write in Pure Python so I could run it on my iPad. ExifTool unfortunately is anchored to Perl (at least for someone with my level of understanding). Two days ago, I was searching on Pip when Exif v1.0 appeared. Literally released before my very eyes. You can’t always get what you want, but if you try, sometimes, you get what you need.
I think that there are a lot more people entering this Particular programming niche right now, myself included. My old desktop gaming rig is laughably out of date, my laptops are even older, and my old Wacom tablet was just not sensitive enough let alone wireless. So I got an iPad mini 5th gen with a nice big 256GB of local storage. Cue the 6 weeks worth of updates, patches, failed syncs, transfers, duplications, uninstalls, reinstalls, and assorted BS to get my music in my music app, my photos in my photos app, and Adobe creative cloud loaded up with my files.
Naturally, I became more serious about learning to code during this time. I just started teaching myself Python less than two weeks ago looking for a way to centralize my ~15 years of artwork and photography into something I can bring to market.
If I learned anything from my father being a programmer for my entire life, it’s that code has to be approached as a puzzle to solve. Keep studying the pieces and testing how they fit, you’ll get there in time. Two weeks into learning Python and while I haven’t written more than a dozen lines of Python script, I’ve been able to pore over documentation and other people’s code to learn what’s possible, identify and install packages for my project, build up my codebase, and register for .api keys as needed.
It’s an easy-ish language to understand, but the limitations of running on iOS are an unfortunate stumbling block. Keep trying, but also remember there’s more help on the way in the form of new Pythonistas.
@sociallydistant thank you!
My fallback option is to parse the tei xml in Python and transform it that way.
You should look at the WKWebView from @mikael which will make that a lot easier.
This is a somewhat delayed reply and I would like to apologise for that.
Thank you all for your very helpful suggestions. I have solved the problem - and in more than one way.
Using WKWebView, as per the suggestion above, I was able to run SaxonCE and pass out the results via a callback handler.
I also managed to work out the correct URL construction to use a 'requests' POST to the oxgarage site.
Finally, I used the WKWebView python to generate book covers automatically and take JPEG screenshots of them to be used in the EPUB books. Please see the linked images of an example EPUB book I produced using pythonista.
Once again, thank you for your assistance.
@flipflap, thanks for getting back. Very interesting use of Pythonista, and excellent results.