XML/XSLT

flipflap

Good morning.

I would like to convert some documents from the Perseus Library into EPUB format. The documents have been made available in TEI XML format, e.g.:

https://github.com/PerseusDL/canonical-latinLit

As I understand it, the conversion can be achieved by applying an XSL Transformation. There is a site set up for this, which is fine for a small number of documents, but not especially efficient for more than that:

https://oxgarage.tei-c.org/

So I had wondered if I could do something with Pythonista. However, it seems that every approach I consider is thwarted in some way. For example, the libxml library has XSLT functions, but the library is not included with Pythonista and is not pure python.

I wonder if someone could suggest something that would work.

Thank you.

ccc

https://github.com/ColdGrub1384/Pyto has https://lxml.de

flipflap

Thank you for the suggestion. It's under consideration. Other ideas also welcome.

mikael

@flipflap, do you have an existing transformation definition?

If not, you could look at SAX as a programmatic way of transforming the elements in a way you need them transformed.

Another idea is to use Pythonista as a scraper/driver, automating the conversion using the web site you linked. You can throttle the speed etc., but of course feasibility depends on the amount and size of materials you have.

JonB

It may be possible to use webview or WKWebView to do xsl transforms, either standalone (by specifying a stylesheet), or via JavaScript. Then use JavaScript to output back to python. I have not tried -- but i believe safari on iOS does support xslt

JonB

As a follow-up, it appears that safari supports xslt 1.0 but not 2.0. Saxon-JS 2 appears to provide a full implemention of xslt 3.0 /xpath 3.1 in the browser -- although your xsl files must be precompiled using an included xsl compiler. There seems to also be an older version, saxon-ce, which which does not require precompiling the xsl into sef files. If you can tolerate precompiling the transforms, it is supposedly reasonably performant.

flipflap

Thank you for the replies. I had previously looked into using JavaScript to carry out the transformation. However, the example I saw relied on the necessary files being served up by a server, which I don’t have access to atm. Further, it is said that JavaScript in-browser requires user interaction to load local files and that’s something I’d like to avoid.

I also noticed that Pythonista has a web server library, but I wasn’t able to get to get a server response when running the included test() function (and I’m not sure it can run concurrently with the other stuff I want to do given the iOS limitations on fork/background processing etc.).

JonB

You don't need a server, you can load files via file:// urls. Nor do you need user interaction (use load_url).

You do need to "precompile" the xslt files into sef files.

(You could also use online services, and automate using requests, which will be much easier, but requires wifi/data)

flipflap

I have to confess that I’m not very knowledgeable about javascript etc., although I have dabbled with it occasionally. I was considering using XSLTProcessor.transformToFragment() . Is that the right approach? Can I use the load method you suggest to get access to a node reference to pass to the above?

I had considered the requests idea before, but the ox garage site has a javascript convert button. According to some comments on stackexchange, that can be awkward to deal with (can be handled with Selenium, but there are problems getting it working on ios - needs a browser driver).

Cheers!

flipflap

This is currently stalled. I don't have access to a desktop computer atm and I'm not sure how to conpile xsl to SEF on the phone. The Saxon-JS documentation states that a javascript xsl compiler is included, but, if I'm reading this correctly, it doesn't help me because it cannot generate the SEF or SEF.JSON file. I had wondered if the result of compilation could be passed on immediately without writing out a file, but I suspect it can't (I'm not completely sure).

I tried using transformToFragment, but it returned null. I think the problem is (possibly) that it doesn't support XSLT 2.0. Perhaps XSLT 1.0 files would suffice to perform the transform. I haven't yet been able to Google up such files, but I'll try again later.

sociallydistant

Apologies for the off topic tangent, but I wanted to offer some encouragement.

I spent the past two weeks trying to find a simple way to handle complete image metadata read/write in Pure Python so I could run it on my iPad. ExifTool unfortunately is anchored to Perl (at least for someone with my level of understanding). Two days ago, I was searching on Pip when Exif v1.0 appeared. Literally released before my very eyes. You can’t always get what you want, but if you try, sometimes, you get what you need.

I think that there are a lot more people entering this Particular programming niche right now, myself included. My old desktop gaming rig is laughably out of date, my laptops are even older, and my old Wacom tablet was just not sensitive enough let alone wireless. So I got an iPad mini 5th gen with a nice big 256GB of local storage. Cue the 6 weeks worth of updates, patches, failed syncs, transfers, duplications, uninstalls, reinstalls, and assorted BS to get my music in my music app, my photos in my photos app, and Adobe creative cloud loaded up with my files.

Naturally, I became more serious about learning to code during this time. I just started teaching myself Python less than two weeks ago looking for a way to centralize my ~15 years of artwork and photography into something I can bring to market.

If I learned anything from my father being a programmer for my entire life, it’s that code has to be approached as a puzzle to solve. Keep studying the pieces and testing how they fit, you’ll get there in time. Two weeks into learning Python and while I haven’t written more than a dozen lines of Python script, I’ve been able to pore over documentation and other people’s code to learn what’s possible, identify and install packages for my project, build up my codebase, and register for .api keys as needed.

It’s an easy-ish language to understand, but the limitations of running on iOS are an unfortunate stumbling block. Keep trying, but also remember there’s more help on the way in the form of new Pythonistas.

flipflap

@sociallydistant thank you!

Ok, then. I have made some progress. It seems that using Saxon CE allows the transformation to be applied successfully to a test file. I have an html page with javascript in it that replaces the same page’s contents with the XSLT result. I ran this outside of Pythonista. So the next step would be to control this process from Pythonista, read off the results there and then construct the EPUB. I’m not yet sure how to proceed. I need something within the python environment that will execute javascript and allow access to what it produces. Using bottle perhaps?

My fallback option is to parse the tei xml in Python and transform it that way.

Thanks again!

JonB

@flipflap said:
You can execute JavaScript in pythonista using a webview. For instance you would load a local HTML file, (using abspath to get full path to file), which references scripts in the same folder that load the appropriate functions. Then, you can execute JavaScript using exec_js on the webview. The result could then be passed back to python as a data url, maybe.

You should look at the WKWebView from @mikael which will make that a lot easier.

sulcud

@flipflap, I know this is not WebView based but I hope it also helps you to execute the JavaScript you need

import objc_util


class JavaScriptVM(object):
	def __init__(self):
		self._framework = objc_util.load_framework("JavaScriptCore")
		self._JSContext = objc_util.ObjCClass("JSContext")
		self._JSVirtualMachine = objc_util.ObjCClass("JSVirtualMachine")
		self.context = None
		self.javascript_vm = None
		
		self._prepare_vm()

	def _prepare_vm(self):
		context = self._JSContext.new()
		self.javascript_vm = self._JSVirtualMachine.new()
		self.context = context.initWithVirtualMachine_(self.javascript_vm)
		
	def evaluate_script(self, script_code: str):
		if self.context is not None:
			if self.javascript_vm is not None:
				return self.context.evaluateScript(script_code)


code = """function test(a, b){
	return a + b;
}
test(1+2, 3)+3"""
vm = JavaScriptVM()
r = vm.evaluate_script(code)
print(r)

flipflap

This is a somewhat delayed reply and I would like to apologise for that.

Thank you all for your very helpful suggestions. I have solved the problem - and in more than one way.

Using WKWebView, as per the suggestion above, I was able to run SaxonCE and pass out the results via a callback handler.

I also managed to work out the correct URL construction to use a 'requests' POST to the oxgarage site.

Finally, I used the WKWebView python to generate book covers automatically and take JPEG screenshots of them to be used in the EPUB books. Please see the linked images of an example EPUB book I produced using pythonista.

Once again, thank you for your assistance.

Images:

https://ibb.co/68mnWtM
https://ibb.co/nbRPhfj

mikael

@flipflap, thanks for getting back. Very interesting use of Pythonista, and excellent results.

JonB

It might be useful to others if you post your code someplace -- the HTML /JavaScript, and whatever webview handler code -- a few folks have asked about xslt over the years, and it sounds like you got it working.