Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Text detection with Vison coreML
-
@sodoku Shortest code I can give, without searching first the rectangles containing text, but assuming the image is a square grid with only one digit per cell.
You have to set (or ask, up to you) the number of cells per row/column, here 9, and the percentage (here 15) of the cell dimensions for the grid-lines. Even if the image seems to be entirely displayed, it is not fully because it is overriden by buttons where background_image is the cropped part which will be transmitted to the VNCoreMLRequest for trying to recognize the digit.
And, as you can see in my example, it does not work correctly on my ipad mini 4 under ios 13 and latest Pythonista beta. You could try on your idevice but I can't help more, sorry. Hoping it will be better for you 😢import os import io import photos import dialogs from PIL import Image from objc_util import ObjCClass, nsurl, ns import ui MODEL_FILENAME = 'MNIST.mlmodel' #MODEL_FILENAME = 'MNISTClassifier.mlmodel' #MODEL_FILENAME = 'OCR.mlmodel' #MODEL_FILENAME = 'Alphanum_28x28.mlmodel' # Use a local path for caching the model file MODEL_PATH = os.path.join(os.path.expanduser('~/Documents/'), MODEL_FILENAME) # Declare/import ObjC classes: MLModel = ObjCClass('MLModel') VNCoreMLModel = ObjCClass('VNCoreMLModel') VNCoreMLRequest = ObjCClass('VNCoreMLRequest') VNImageRequestHandler = ObjCClass('VNImageRequestHandler') def pil2ui(imgIn): with io.BytesIO() as bIO: imgIn.save(bIO, 'PNG') imgOut = ui.Image.from_data(bIO.getvalue()) del bIO return imgOut def load_model(): global vn_model ml_model_url = nsurl(MODEL_PATH) # Compile the model: c_model_url = MLModel.compileModelAtURL_error_(ml_model_url, None) # Load model from the compiled model file: ml_model = MLModel.modelWithContentsOfURL_error_(c_model_url, None) # Create a VNCoreMLModel from the MLModel for use with the Vision framework: vn_model = VNCoreMLModel.modelForMLModel_error_(ml_model, None) return vn_model def _classify_img_data(img_data): global vn_model # Create and perform the recognition request: req = VNCoreMLRequest.alloc().initWithModel_(vn_model).autorelease() handler = VNImageRequestHandler.alloc().initWithData_options_(img_data, None).autorelease() success = handler.performRequests_error_([req], None) if success: best_result = req.results()[0] label = str(best_result.identifier()) confidence = best_result.confidence() return {'label': label, 'confidence': confidence} else: return None def classify_image(img): buffer = io.BytesIO() img.save(buffer, 'JPEG') img_data = ns(buffer.getvalue()) return _classify_img_data(img_data) def classify_asset(asset): mv = ui.View() mv.background_color = 'white' im = ui.ImageView() pil_image = asset.get_image() print(pil_image.size) ui_image = asset.get_ui_image() n_squares = 9 d_grid = 15 # % around the digit wim,him = pil_image.size ws,hs = ui.get_screen_size() if (ws/hs) < (wim/him): h = ws*him/wim im.frame = (0,(hs-h)/2,ws,h) else: w = hs*wim/him im.frame = ((ws-w)/2,0,w,hs) print(wim,him,ws,hs) mv.add_subview(im) wi = im.width hi = im.height im.image = ui_image im.content_mode = 1 #1 mv.frame = (0,0,ws,hs) mv.present('fullscreen') dx = wim/n_squares dy = him/n_squares d = dx*d_grid/100 dl = int((wi/n_squares)*d_grid/100) for ix in range(n_squares): x = ix*dx for iy in range(n_squares): y = iy*dy pil_char = pil_image.crop((int(x+d),int(y+d),int(x+dx-d),int(y+dy-d))) l = ui.Button() l.frame = (int(ix*wi/n_squares)+dl, int(iy*hi/n_squares)+dl, int(wi/n_squares)-2*dl, int(hi/n_squares)-2*dl) l.border_width = 1 l.border_color = 'red' l.tint_color = 'red' ObjCInstance(l).button().contentHorizontalAlignment= 1 # left l.background_image = pil2ui(pil_char) im.add_subview(l) l.title = classify_image(pil_char)['label'] def main(): global vn_model vn_model = load_model() all_assets = photos.get_assets() asset = photos.pick_asset(assets=all_assets) if asset is None: return classify_asset(asset) if __name__ == '__main__': main()
-
@cvp Cool looking output! I am trying to follow along at home but have a few questions:
- What image (of a sudoku puzzle) did you start with?
- How did you constrain the labels to just 0-9?
- Could we have a GitHub repo for this effort? With goals:
- Recognize sudoku digits from a still image
- Recognize sudoku digits from a real-time image
-
-
@ccc Do you have a iDevice under iOS < 13, just to know if digits recognition is working because my above test is not ok under iOS 13.
-
-
For non-Pythonista platforms... https://github.com/neeru1207/AI_Sudoku
-
Vision OCR produce prety good results.
But has someone succeeded to OCR text with indexes or power signs?
(https://commons.wikimedia.org/wiki/File:Quadratic-formula.jpg)
-
What about OCR in realtime video mode? Is it possible?
-
-
-
This post is deleted! -
This post is deleted! -
Hey there! I'm new to this forum, and I saw your post about implementing text detection with Vision + coreML in Pythonista.