Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Recognize text from picture
-
I realized that
result
will also have a.boundingBox()
attribute which would make some of this a little simpler.
That is a CGrect, consisting of.origin
(in turn consisting of.x
and.y
and.size
containing.w
and.h
.
In that case you could use ui.Path.rect. -
Okay, my previous reply was full of errors... here is a working version, which adds red boxes around each result, along with the text
language_preference = ['fi','en','se'] import photos, ui, dialogs import io from objc_util import * load_framework('Vision') VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest') VNImageRequestHandler = ObjCClass('VNImageRequestHandler') ACCURATE=0 FAST=1 def pil2ui(pil_image): buffer = io.BytesIO() pil_image.save(buffer, format='PNG') return ui.Image.from_data(buffer.getvalue()) selection = dialogs.alert('Get pic', button1='Camera', button2='Photos') ui_image = None if selection == 1: pil_image = photos.capture_image() if pil_image is not None: ui_image = pil2ui(pil_image) elif selection == 2: ui_image = photos.pick_asset().get_ui_image() if ui_image is not None: print('Recognizing...\n') req = VNRecognizeTextRequest.alloc().init().autorelease() req.recognitionLevel= ACCURATE# accurate req.setRecognitionLanguages_(language_preference) handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease() success = handler.performRequests_error_([req], None) if success: for result in req.results(): print(result.text()) else: print('Problem recognizing anything') with ui.ImageContext(*tuple(ui_image.size) ) as ctx: ui_image.draw() for result in req.results(): cgpts=[ result.bottomLeft(), result.topLeft(), result.topRight(), result.bottomRight(), result.bottomLeft() ] vertecies = [(p.x*ui_image.size.w, (1-p.y)*ui_image.size.h) for p in cgpts] pth = ui.Path() pth.move_to(*vertecies[0]) for p in vertecies[1:]: pth.line_to(*p) ui.set_color('red') pth.stroke() x,y = vertecies[0] w,h =(vertecies[2][0]-x), (vertecies[2][1]-y) ui.draw_string(str(result.text()), rect=(x,y,w,h), font=('<system>', 12), color='red') marked_img = ctx.get_image() marked_img.show()
-
-
Here is another solution... I use a rectangle detection and a perspective correction to crop the puzzle. This gives much better detection, though not perfect. The recognition is pretty good, though it has troubles with 1’s on their own.... turn into Ts of all things. Some additional work in the clean function might fix common problems.
I’m using images from https://github.com/prajwalkr/SnapSudoku/tree/master/train
I suspect doing some CIFiltering first will probably improve things.
from objc_util import * import ui VNImagePointForNormalizedPoint=c.VNImagePointForNormalizedPoint VNImagePointForNormalizedPoint.argTypes=[CGPoint, c_int, c_int] VNImagePointForNormalizedPoint.restype=CGPoint ui_image=ui.Image.named('image2.jpg') ui_image.show() CIImage=ObjCClass('CIImage') ci_image=CIImage.imageWithCGImage_(ui_image.objc_instance.CGImage()) CIPerspectiveCorrection=ObjCClass('CIPerspectiveCorrection') f=CIPerspectiveCorrection.perspectiveCorrectionFilter() f.inputImage=ci_image o=f.outputImage() load_framework('Vision') VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest') VNDetectRectanglesRequest = ObjCClass('VNDetectRectanglesRequest') VNImageRequestHandler = ObjCClass('VNImageRequestHandler') req=VNDetectRectanglesRequest.alloc().init().autorelease() req.maximumObservations=2 req.minimumSize=0.5 req.minimumAspectRatio=0.7 req.quadratureTolerance=30 handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease() success = handler.performRequests_error_([req], None) try: result=req.results()[0] nm=lambda p :VNImagePointForNormalizedPoint(p,int(ui_image.size.w),int(ui_image.size.h)) f.topLeft = nm(result.topLeft()) f.topRight = nm(result.topRight()) f.bottomLeft = nm(result.bottomLeft()) f.bottomRight = nm(result.bottomRight()) o=f.outputImage() with ui.ImageContext(o.extent().size.width, o.extent().size.height) as ctx: UIImage.imageWithCIImage_(o).drawAtPoint_( CGPoint(0,0)) ui_image2=ctx.get_image() ui_image2.show() except: print('bounding rec not found...results wont work') ui_image2=ui_image '''now, detect rectangles again...''' handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image2.to_png(), None).autorelease() req0 = VNRecognizeTextRequest.alloc().init().autorelease() req0.recognitionLevel= 0# accurate req0.usesLanguageCorrection=True req0.customWords=[str(a) for a in range(10)] #req0.maximumObservations=81 #req0.minimumSize=.1 success = handler.performRequests_error_([req0], None) with ui.ImageContext(*tuple(ui_image2.size) ) as ctx: ui_image2.draw() for result in req0.results(): cgpts=[result.bottomLeft(), result.topLeft(), result.topRight(), result.bottomRight(), result.bottomLeft()] vertecies = [(p.x*ui_image2.size.w, (1-p.y)*ui_image2.size.h) for p in cgpts] pth = ui.Path() pth.move_to(*vertecies[0]) for p in vertecies[1:]: pth.line_to(*p) ui.set_color('red') pth.stroke() x,y = vertecies[0] w,h =(vertecies[2][0]-x), (vertecies[2][1]-y) ui.draw_string(str(result.text()), rect=(x,y,w,h), font=('<system>', 12), color='red') marked_img = ctx.get_image() marked_img.show() def bbcenter(bb): return((9*(bb.origin.x+bb.size.width/2)-0.5), (9*(bb.origin.y+bb.size.height/2)-0.5) ) def clean(results): cleaned=[] for r in results: col,row=bbcenter(r.boundingBox()) approx_num_ch=(r.boundingBox().size.width*9) txt=str(r.text()).replace(' ','') if approx_num_ch<=1: if len(txt) == 1: cleaned.append(((round(col),round(row)),txt)) else: cleaned.append(((round(col),round(row)),'-1')) else: #more than one char col-=(len(txt)-1)/2 col=round(col) row=round(row) for ch in txt: if ch in [str(a) for a in range(10)]: cleaned.append(((col,row),ch)) else: cleaned.append(((col,row),'-1')) col+=1 return cleaned import numpy as np puzzle=np.zeros([9,9]) for c,v in clean(req0.results()): puzzle[c]=int(v) print(np.flipud(puzzle.T))
-
@JonB, thanks, very nice. I have noted and wondered about how difficult number 1 is to recognize... Not very exotic, is it? But in my experiments it looked like the simple heuristic of ”if the result is something else than 1-9, assume it is a 1” would work pretty well for Sudoku.
-
@JonB, can you open up this one a little bit?
approx_num_ch=(r.boundingBox().size.width*9)
-
The *9 is because if the initial rectangle detection and crop works, then each square is approx 1/9 width. So the approx number of squares a rectangle covers tells us how many characters it should have... I was getting many cases where 1 got read as Te, or some other two character value, even though the width was less than one box... so I wanted to have special handling for narrow boxes, as that is probably a 1, while wide boxes could have multiple characters because the bounding box legitimately spans adjacent boxes.
-
@JonB, there’s something in the math here I do not quite get. I would expect something like:
num_char = r.bbox/(full_bbox/9) = r.bbox * 9 / full_bbox
Thus looks like you are missing the division?
-
The results of vision are always provided as normalized coordinates — meaning the full box is always 1.
For drawing, you have to then multiply by image width/height.Since the perspective correction both fixes perspective and crops — 1/9 is the size, roughly, of a single cell.
-
@JonB, now I understand, thank you.
-
I have a quick question in regards to the original ocr post how do I print the text as one single csv list, I tried but I have been getting a list of lists instead of one single list
This is a snippet of the code example I think needs to be altered
success = handler.performRequests_error_([req], None) if success: for result in req.results(): print(result.text()) else: print('Problem recognizing anything')
-
results=[str(result.text()) for result in req.results()] print(results)
Or maybe
print(','.join(results))
-
There is a lot of good code here... It would be really awesome if there was a
GitHub repo
to stitch it all together into an app. -
-
Hi,
I'm aware that this thread is about a year old. But maybe someone can nevertheless alighten me. I'm trying to do a similar thing in JavaScript for Automation (JXA), and I see this line in your example:for result in req.results(): print(result.text())
translated to JavaScript, that's
results.forEach(r => { console.log(r.text); })
and that works like a charm. I'm just wondering why, since according to Apple's documentation, the
results
object doesn't even have atext
property, onlystring
(cf. https://developer.apple.com/documentation/vision/vnrecognizedtext?language=objc)I was first wondering if
text
is perhaps a nice Python thing, but since the same works in JavaScript, I'm sure that I'm missing something obvious in Apple's documentation. Does anyone know what (and where I should be looking)?Thanks a lot in advance
Christian -
Does
string
not work?Often there are undocumented or decrecates features available in objc objects. Often we just poke around using autocomplete (which ultimately uses some of the introspection objc features of the objc runtime (which let you get a list of methods or instance vars, etc)
-
Does string not work?
It does, but only in a very convoluted way, like so:
results.forEach(r => { console.log(r.topCandidates(1).js[0].string.js) })
The
js
in the middle is required to convert the ObjC array returned bytopCandidates
to a JavaScript array (and again to convert the NSString returned bystring
to a JS string). But usingstring
directly atr
does not work.we just poke around using autocomplete
I gues that happens in XCode (the poking around)?
-
No, the exploration happens in pythonista, in the console. Once you have an object, dir(variable) lists the methods and such, or frankly just typing a letter and autocomplete suggestions does it's thing.
If you're not using a bridging library like
https://github.com/TooTallNate/NodObjC
I'd suggest that you do, since it might take care of a lot of the annoying bits like converting every type to js equivalents, and let's you access some of the dynamic introspection stuff that makes objc pretty neat.Under the hood, there are objc runtime functions that let you get lists of method names. For instance,see
https://github.com/jsbain/objc_hacks/blob/master/print_objc.py
For how you can do it in python. Or, look at the NodObjC code for class.js and core.js -- it looks like it does something similar, using the objc copy_methodsList, etc, and adds those as J's callable functions to the prototype. Then your favorite J's debugger ought to show you what is there... -
In this case, looking at the headers for VNTextObservation shows the
text
attribute. -
Thanks a lot for that. Apple's documentation doesn't mention any of these properties :-(