Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Recognize text from picture
-
This post is deleted! -
So there are a few edits in this thread I don’t know how to piece together to have the best edited version of this???
-
@sodoku, the one on github is the latest version.
-
I will test it for sudoku in the console, well I will try to convert it to use in console for the sudoku solver if need help I’ll post a message
-
I need help adapting this script for inputting the numbers from a picture of sudoku and insert the starting numbers into a console script, it does not work good for recognizing ones and sevens???
Example of sudoku solver
In my version I want to make the board is all zeros and when you take a picture it will add the numbers than solve, this combines the two programs (sudoku solver) & (ocr text recognition)
board=[ [5,8,4,1,0,0,0,0,0], [0,0,6,8,0,0,5,1,0], [0,0,0,0,5,4,7,0,6], [0,5,3,0,1,0,0,6,7], [0,0,0,0,2,0,0,0,0], [4,6,0,0,9,0,8,3,0], [7,0,8,5,4,0,0,0,0], [0,2,9,0,0,3,4,0,0], [0,0,0,0,0,1,3,7,9] ] def solve(bo): find = find_empty(bo) if not find: return True else: row,col = find for i in range(1,10): if valid(bo,i,(row,col)): bo[row][col] = i if solve(bo): return True bo[row][col] = 0 return False def valid(bo,num,pos): #check row for i in range(len(bo[0])): if bo[pos[0]][i] == num and pos[1] != i: return False #check column for i in range(len(bo[0])): if bo[i][pos[1]] == num and pos[0] != i: return False #check quadrant box_x = pos[1] // 3 box_y = pos[0] // 3 for i in range(box_y * 3, box_y * 3 + 3): for j in range(box_x * 3, box_x * 3 + 3): if bo[i][j] == num and (i,j) != pos: return False return True def print_board(bo): for i in range(len(bo)): if i % 3 == 0 and i != 0: print('------+-------+------') for j in range(len(bo[0])): if j % 3 == 0 and j != 0: print('|',end=' ') if j == 8: print(bo[i][j]) else: print(str(bo[i][j])+ ' ', end='') def find_empty(bo): for i in range(len(bo)): for j in range(len(bo[0])): if bo[i][j] == 0: return (i,j) # row, col return None print_board(board) solve(board) print('=====================') print_board(board)
language_preference = ['fi','en','se'] import photos, ui, dialogs import io from objc_util import * load_framework('Vision') VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest') VNImageRequestHandler = ObjCClass('VNImageRequestHandler') def pil2ui(pil_image): buffer = io.BytesIO() pil_image.save(buffer, format='PNG') return ui.Image.from_data(buffer.getvalue()) selection = dialogs.alert('Get pic', button1='Camera', button2='Photos') ui_image = None if selection == 1: pil_image = photos.capture_image() if pil_image is not None: ui_image = pil2ui(pil_image) elif selection == 2: ui_image = photos.pick_asset().get_ui_image() if ui_image is not None: print('Recognizing...\n') req = VNRecognizeTextRequest.alloc().init().autorelease() req.setRecognitionLanguages_(language_preference) handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease() success = handler.performRequests_error_([req], None) if success: for result in req.results(): print(result.text()) else: print('Problem recognizing anything') ```
-
@sodoku, I tried something similar as well a while ago, first recognizing rectangles and then trying to recognize the numbers, but I hit the same issue of very poor recognition of the numbers. I wonder if we would need a number-specific recognizer for that.
-
Hi, since the sudoko is a square of many squares I think it is more robust to slice the cells evenly and only have one nr in a small image.
Of course downside is to use a recognition service per image and you go from 1 image to 81 - that can get expensive.
But it would work more robust.
Best reg Tommy -
@Spitfire, thanks. I did try all kinds of approaches, finally resorting to manual cropping, and it still was not reliable enough.
-
@mikael Could you share some picture of sudoku, where recognition fails?
-
Multiple sample Sudoku puzzles would help to achieve a robust solution.
-
I have a few questions about the very first text recognition code posted on this one
the example video I am referring to is https://developer.apple.com/videos/play/wwdc2019/234
1 how do you change the recognition level from fast to accurate
example code from apple website I am not sure if its written in swift or objective c but it is like this :myTextRegcognitionRequest.recognitionLevel = VNRequestTextRecognitionLevel.accurate
and another example of this shown in the apple video for setting the recognition level
request.recognitionLevel = .fast
question 2
to ensure that numbers don't get mistaken as letters
without the language corrector active to avoid mistaking the number 5 for an S or I as 1
example of this from the video isextension Character { func GetSimilarCharacterIfNotIn(allowedChars: String -> Character { let conversionTable = [ 's':'5', 'S':'5', 'i':'1', 'I':'1', ]
question 3
if you know how to set up the special words detector thingy feature mentioned in the video -
I gather they are looking for words you'd find in an English dictionary. So perhaps façade, or tête-à-tête might recognize, while other examples wouldn't? mobdro apk tubemate
-
@sodoku
See https://developer.apple.com/documentation/vision/vnrequesttextrecognitionlevel/fastTry
req.recognitionLevel=1
for fast, or 0 for accurate.Re fixing characters... I gather you might set
req.usesLanguageCorrection=False
(or maybe 0), then make your own replacement map and use str.translate.Custom words is handled by
req.customWords = ['customword1', 'etc']
See apple docs for VNRecognizeTextRequest
-
ive seen the apple documentation coding on Vision Framework I just dont know how to convert it to python
Question 1
What about the setting the minimum text height how do you translate either of these codes to python????
@property(readwrite, nonatomic, assign) float minimumTextHeight;
written in objective-c
var minimumTextHeight: Float { get set }
written in SwiftQuestion 2
I was also interested in learning how to recognize the individual boxes from a sudoku puzzle to extract the numbers is there a way to do that possibly with
VNRecognizedTextObservation A request that detects and recognizes regions of text in an image.
or possibly with the bounding box technique show in the video https://developer.apple.com/videos/play/wwdc2019/234 , also can you put multiple bounding boxes to recognize text from a sudoku cardthis is Mikeals code i am trying to insert the code into but dont know how to convert the code shown in the apple documentation into python
language_preference = ['fi','en','se'] import photos, ui, dialogs import io from objc_util import * load_framework('Vision') VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest') VNImageRequestHandler = ObjCClass('VNImageRequestHandler') def pil2ui(pil_image): buffer = io.BytesIO() pil_image.save(buffer, format='PNG') return ui.Image.from_data(buffer.getvalue()) selection = dialogs.alert('Get pic', button1='Camera', button2='Photos') ui_image = None if selection == 1: pil_image = photos.capture_image() if pil_image is not None: ui_image = pil2ui(pil_image) elif selection == 2: ui_image = photos.pick_asset().get_ui_image() if ui_image is not None: print('Recognizing...\n') req = VNRecognizeTextRequest.alloc().init().autorelease() req.recognitionLevel=1 req.setRecognitionLanguages_(language_preference) handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease() success = handler.performRequests_error_([req], None) if success: for result in req.results(): print(result.text()) else: print('Problem recognizing anything')
-
@sodoku For things like enumerations, you can usually check the swift version of docs, which tells you the value. Otherwise, you can often look up source code.
For minimumTextHeight, both swift and ObjC say this is a float. The fact that it is readwrite/nonatomic/assign is not important.
So, usually this would just be
req.minimumTextHeight = 32.5
or whatever you want...It is often helpful to explore objects in the console, since this can tell you what you're working with. For instance, if you type
req.
in the console, you will see autocomplete of all known attributes. Usually you need to treat objc properties as function calls -- so to check minimumTextHeight, you'd usereq.minimumTextHeight()
. But to set, you can treat the property as a python attribute and assign directly. In some cases, you may need to use the set_propertyName_(value) convention.Where things get tricky is where the declared type is another object (in which case you have to provide the right type of object), or a structure. Structures can be tricky because objc_util often screws up the type encodings, and you have to manually override. Structures get turned into python STRUCTUREs, and you access fields normally like you would with a python object (no () needed).
Re question 2:
Per the docs, theresults
of a request will be VNRecognizedTextObservation objects. This is a subclass of VNRectangleObservation.@interface VNRecognizedTextObservation : VNRectangleObservation
<-- colon here means inherits fromIf you look up VNRectangleObservation, you will see it has the following attributes
bottomLeft
bottomRight
topLeft
topRight
Which are declared asCGPoint
, which is a structure that has an.x
and.y
fields.for result in req.results(): x = result.bottomLeft().x y = result.bottomLeft().y w = result.topRight().x-x h = result.topRight().y-y print('({},{},{},{}) {}'.format(x,y,w,h, result.text())
You could draw the image into an image context, and then also stroke a rectangle.. something like this...(not tried).
with ui.ImageContext(ui_image.size()) as ctx: ui_image.draw() for result in req.results(): vertecies = [(p.x, p.y) for p in [result.bottomLeft() result.TopLeft() result.TopRight() result.BottomRight() result.bottomLeft()] pth = ui.Path.moveTo(*vertecies[0]) %initial point for p in vertecies[1:]: pth.line_to(*p) ui.set_color('red') pth.stroke() x,y = vertecies[0] w,h =(vertecies[2].x-x), (vertecies[2].y-y) ui.draw_string(result.text(), rect=(x,y,w,h), font=('<system>', 12), color='red') marked_img = ctx.get_image() marked_img.show()
-
I realized that
result
will also have a.boundingBox()
attribute which would make some of this a little simpler.
That is a CGrect, consisting of.origin
(in turn consisting of.x
and.y
and.size
containing.w
and.h
.
In that case you could use ui.Path.rect. -
Okay, my previous reply was full of errors... here is a working version, which adds red boxes around each result, along with the text
language_preference = ['fi','en','se'] import photos, ui, dialogs import io from objc_util import * load_framework('Vision') VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest') VNImageRequestHandler = ObjCClass('VNImageRequestHandler') ACCURATE=0 FAST=1 def pil2ui(pil_image): buffer = io.BytesIO() pil_image.save(buffer, format='PNG') return ui.Image.from_data(buffer.getvalue()) selection = dialogs.alert('Get pic', button1='Camera', button2='Photos') ui_image = None if selection == 1: pil_image = photos.capture_image() if pil_image is not None: ui_image = pil2ui(pil_image) elif selection == 2: ui_image = photos.pick_asset().get_ui_image() if ui_image is not None: print('Recognizing...\n') req = VNRecognizeTextRequest.alloc().init().autorelease() req.recognitionLevel= ACCURATE# accurate req.setRecognitionLanguages_(language_preference) handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease() success = handler.performRequests_error_([req], None) if success: for result in req.results(): print(result.text()) else: print('Problem recognizing anything') with ui.ImageContext(*tuple(ui_image.size) ) as ctx: ui_image.draw() for result in req.results(): cgpts=[ result.bottomLeft(), result.topLeft(), result.topRight(), result.bottomRight(), result.bottomLeft() ] vertecies = [(p.x*ui_image.size.w, (1-p.y)*ui_image.size.h) for p in cgpts] pth = ui.Path() pth.move_to(*vertecies[0]) for p in vertecies[1:]: pth.line_to(*p) ui.set_color('red') pth.stroke() x,y = vertecies[0] w,h =(vertecies[2][0]-x), (vertecies[2][1]-y) ui.draw_string(str(result.text()), rect=(x,y,w,h), font=('<system>', 12), color='red') marked_img = ctx.get_image() marked_img.show()
-
-
Here is another solution... I use a rectangle detection and a perspective correction to crop the puzzle. This gives much better detection, though not perfect. The recognition is pretty good, though it has troubles with 1’s on their own.... turn into Ts of all things. Some additional work in the clean function might fix common problems.
I’m using images from https://github.com/prajwalkr/SnapSudoku/tree/master/train
I suspect doing some CIFiltering first will probably improve things.
from objc_util import * import ui VNImagePointForNormalizedPoint=c.VNImagePointForNormalizedPoint VNImagePointForNormalizedPoint.argTypes=[CGPoint, c_int, c_int] VNImagePointForNormalizedPoint.restype=CGPoint ui_image=ui.Image.named('image2.jpg') ui_image.show() CIImage=ObjCClass('CIImage') ci_image=CIImage.imageWithCGImage_(ui_image.objc_instance.CGImage()) CIPerspectiveCorrection=ObjCClass('CIPerspectiveCorrection') f=CIPerspectiveCorrection.perspectiveCorrectionFilter() f.inputImage=ci_image o=f.outputImage() load_framework('Vision') VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest') VNDetectRectanglesRequest = ObjCClass('VNDetectRectanglesRequest') VNImageRequestHandler = ObjCClass('VNImageRequestHandler') req=VNDetectRectanglesRequest.alloc().init().autorelease() req.maximumObservations=2 req.minimumSize=0.5 req.minimumAspectRatio=0.7 req.quadratureTolerance=30 handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease() success = handler.performRequests_error_([req], None) try: result=req.results()[0] nm=lambda p :VNImagePointForNormalizedPoint(p,int(ui_image.size.w),int(ui_image.size.h)) f.topLeft = nm(result.topLeft()) f.topRight = nm(result.topRight()) f.bottomLeft = nm(result.bottomLeft()) f.bottomRight = nm(result.bottomRight()) o=f.outputImage() with ui.ImageContext(o.extent().size.width, o.extent().size.height) as ctx: UIImage.imageWithCIImage_(o).drawAtPoint_( CGPoint(0,0)) ui_image2=ctx.get_image() ui_image2.show() except: print('bounding rec not found...results wont work') ui_image2=ui_image '''now, detect rectangles again...''' handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image2.to_png(), None).autorelease() req0 = VNRecognizeTextRequest.alloc().init().autorelease() req0.recognitionLevel= 0# accurate req0.usesLanguageCorrection=True req0.customWords=[str(a) for a in range(10)] #req0.maximumObservations=81 #req0.minimumSize=.1 success = handler.performRequests_error_([req0], None) with ui.ImageContext(*tuple(ui_image2.size) ) as ctx: ui_image2.draw() for result in req0.results(): cgpts=[result.bottomLeft(), result.topLeft(), result.topRight(), result.bottomRight(), result.bottomLeft()] vertecies = [(p.x*ui_image2.size.w, (1-p.y)*ui_image2.size.h) for p in cgpts] pth = ui.Path() pth.move_to(*vertecies[0]) for p in vertecies[1:]: pth.line_to(*p) ui.set_color('red') pth.stroke() x,y = vertecies[0] w,h =(vertecies[2][0]-x), (vertecies[2][1]-y) ui.draw_string(str(result.text()), rect=(x,y,w,h), font=('<system>', 12), color='red') marked_img = ctx.get_image() marked_img.show() def bbcenter(bb): return((9*(bb.origin.x+bb.size.width/2)-0.5), (9*(bb.origin.y+bb.size.height/2)-0.5) ) def clean(results): cleaned=[] for r in results: col,row=bbcenter(r.boundingBox()) approx_num_ch=(r.boundingBox().size.width*9) txt=str(r.text()).replace(' ','') if approx_num_ch<=1: if len(txt) == 1: cleaned.append(((round(col),round(row)),txt)) else: cleaned.append(((round(col),round(row)),'-1')) else: #more than one char col-=(len(txt)-1)/2 col=round(col) row=round(row) for ch in txt: if ch in [str(a) for a in range(10)]: cleaned.append(((col,row),ch)) else: cleaned.append(((col,row),'-1')) col+=1 return cleaned import numpy as np puzzle=np.zeros([9,9]) for c,v in clean(req0.results()): puzzle[c]=int(v) print(np.flipud(puzzle.T))
-
@JonB, thanks, very nice. I have noted and wondered about how difficult number 1 is to recognize... Not very exotic, is it? But in my experiments it looked like the simple heuristic of ”if the result is something else than 1-9, assume it is a 1” would work pretty well for Sudoku.