Text detection with Vison coreML

cvp

@pavlinb Super. Do you think I have to try to intercept real frames of the camera or are you happy with this way of taking still photos?

cvp

@pavlinb On my (old) iPad mini 4, the script takes 2 photos per second, far from a real time process for frames...

pavlinb

@cvp For my current needs it’s perfect.

cvp

@pavlinb 👍

pavlinb

@cvp But it would be good if i can avoid shutter sound. I assume it is caused by the method used to capture frames?

cvp

@pavlinb for info, AVCaptureStillImageOutput is deprecated, we should use AVCapturePhotoOutput

cvp

@pavlinb said:

But it would be good if i can avoid shutter sound. I assume it is caused by the method used to capture frames?

Following this,
I tried this without success

      c.AudioServicesDisposeSystemSoundID(1108,restype=None, argtypes=[c_int32]) # 1108 = shutter sound

pavlinb

@cvp said:

@pavlinb Next step is to integrate AVCaptureVideoDataOutput from example here

If you follow the logic of this code, you will fall on UIImageFromSampleBuffer function
which is bery very complex (for me, at least)

Hi, did you manage to run given code? It seems it contains number of functions (ready to use or not).

Regards.

cvp

@pavlinb No, sorry.

sodoku

which one of the code examples in this thread is the BEST one to copy and paste into my ipad mini 2 ?????????? for text detection for getting the starting numbers of a sodoku puzzle by taking a picture of a sodoku puzzle in a newspaper

ccc

https://forum.omz-software.com/topic/3050/snapsudoku

cvp

@sodoku I'm sorry but my iPad mini 4 is in iOS 13 with the last beta of Pythonista and my little script does not work very good any more...

cvp

@sodoku Shortest code I can give, without searching first the rectangles containing text, but assuming the image is a square grid with only one digit per cell.
You have to set (or ask, up to you) the number of cells per row/column, here 9, and the percentage (here 15) of the cell dimensions for the grid-lines. Even if the image seems to be entirely displayed, it is not fully because it is overriden by buttons where background_image is the cropped part which will be transmitted to the VNCoreMLRequest for trying to recognize the digit.
And, as you can see in my example, it does not work correctly on my ipad mini 4 under ios 13 and latest Pythonista beta. You could try on your idevice but I can't help more, sorry. Hoping it will be better for you 😢


import os
import io
import photos
import dialogs
from PIL import Image
from objc_util import ObjCClass, nsurl, ns
import ui

MODEL_FILENAME = 'MNIST.mlmodel'
#MODEL_FILENAME = 'MNISTClassifier.mlmodel'
#MODEL_FILENAME = 'OCR.mlmodel'
#MODEL_FILENAME = 'Alphanum_28x28.mlmodel'

# Use a local path for caching the model file
MODEL_PATH = os.path.join(os.path.expanduser('~/Documents/'), MODEL_FILENAME)

# Declare/import ObjC classes:
MLModel = ObjCClass('MLModel')
VNCoreMLModel = ObjCClass('VNCoreMLModel')
VNCoreMLRequest = ObjCClass('VNCoreMLRequest')
VNImageRequestHandler = ObjCClass('VNImageRequestHandler')

def pil2ui(imgIn):
	with io.BytesIO() as bIO:
		imgIn.save(bIO, 'PNG')
		imgOut = ui.Image.from_data(bIO.getvalue())
	del bIO
	return imgOut

def load_model():
	global vn_model
	ml_model_url = nsurl(MODEL_PATH)
	# Compile the model:
	c_model_url = MLModel.compileModelAtURL_error_(ml_model_url, None)
	# Load model from the compiled model file:
	ml_model = MLModel.modelWithContentsOfURL_error_(c_model_url, None)
	# Create a VNCoreMLModel from the MLModel for use with the Vision framework:
	vn_model = VNCoreMLModel.modelForMLModel_error_(ml_model, None)
	return vn_model

def _classify_img_data(img_data):
	global vn_model
	# Create and perform the recognition request:
	req = VNCoreMLRequest.alloc().initWithModel_(vn_model).autorelease()
	handler = VNImageRequestHandler.alloc().initWithData_options_(img_data, None).autorelease()
	success = handler.performRequests_error_([req], None)
	if success:
		best_result = req.results()[0]
		label = str(best_result.identifier())
		confidence = best_result.confidence()
		return {'label': label, 'confidence': confidence}
	else:
		return None

def classify_image(img):
	buffer = io.BytesIO()
	img.save(buffer, 'JPEG')
	img_data = ns(buffer.getvalue())
	return _classify_img_data(img_data)

def classify_asset(asset):
  mv = ui.View()
  mv.background_color = 'white'
  im = ui.ImageView()
  pil_image = asset.get_image()
  print(pil_image.size)
  ui_image = asset.get_ui_image()
  n_squares = 9
  d_grid = 15 # % around the digit
  wim,him = pil_image.size
  ws,hs = ui.get_screen_size()
  if (ws/hs) < (wim/him):
    h = ws*him/wim
    im.frame = (0,(hs-h)/2,ws,h)
  else:
    w = hs*wim/him
    im.frame = ((ws-w)/2,0,w,hs)
  print(wim,him,ws,hs)
  mv.add_subview(im)
  wi = im.width
  hi = im.height
  im.image = ui_image
  im.content_mode = 1 #1
  mv.frame = (0,0,ws,hs)	 
  mv.present('fullscreen')
  dx = wim/n_squares
  dy = him/n_squares
  d = dx*d_grid/100
  dl = int((wi/n_squares)*d_grid/100)
  for ix in range(n_squares):
    x = ix*dx
    for iy in range(n_squares):
      y = iy*dy
      pil_char = pil_image.crop((int(x+d),int(y+d),int(x+dx-d),int(y+dy-d)))
      l = ui.Button()
      l.frame = (int(ix*wi/n_squares)+dl, int(iy*hi/n_squares)+dl, int(wi/n_squares)-2*dl, int(hi/n_squares)-2*dl)
      l.border_width = 1
      l.border_color = 'red'
      l.tint_color = 'red'
      ObjCInstance(l).button().contentHorizontalAlignment= 1 # left
      l.background_image = pil2ui(pil_char)
      im.add_subview(l)
      l.title = classify_image(pil_char)['label'] 

def main():
    global vn_model
    vn_model = load_model()
    all_assets = photos.get_assets()
    asset = photos.pick_asset(assets=all_assets)
    if asset is None:
      return
    classify_asset(asset)

if __name__ == '__main__':
	main()

ccc

@cvp Cool looking output! I am trying to follow along at home but have a few questions:

What image (of a sudoku puzzle) did you start with?
How did you constrain the labels to just 0-9?
Could we have a GitHub repo for this effort? With goals:
1. Recognize sudoku digits from a still image
2. Recognize sudoku digits from a real-time image

cvp

@ccc

I use an image of a sudoku grid from a Google search here
Use of MNIST.mlmodel for only digits
Not sure that it could be interesting because iOS 13 offers now a better way via VNRecognizeTextRequest but it was only to answer to @sodoku question

See topic with @mikael code

cvp

@ccc Do you have a iDevice under iOS < 13, just to know if digits recognition is working because my above test is not ok under iOS 13.

cvp

@ccc Really because you ask it 😀 Github

ccc

For non-Pythonista platforms... https://github.com/neeru1207/AI_Sudoku

pavlinb

Vision OCR produce prety good results.

But has someone succeeded to OCR text with indexes or power signs?

(https://commons.wikimedia.org/wiki/File:Quadratic-formula.jpg)

pavlinb

What about OCR in realtime video mode? Is it possible?