omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Text detection with Vison coreML

    Pythonista
    8
    97
    42633
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ccc
      ccc last edited by

      https://forum.omz-software.com/topic/3050/snapsudoku

      1 Reply Last reply Reply Quote 0
      • cvp
        cvp @sodoku last edited by

        @sodoku I'm sorry but my iPad mini 4 is in iOS 13 with the last beta of Pythonista and my little script does not work very good any more...

        1 Reply Last reply Reply Quote 0
        • cvp
          cvp @sodoku last edited by cvp

          @sodoku Shortest code I can give, without searching first the rectangles containing text, but assuming the image is a square grid with only one digit per cell.
          You have to set (or ask, up to you) the number of cells per row/column, here 9, and the percentage (here 15) of the cell dimensions for the grid-lines. Even if the image seems to be entirely displayed, it is not fully because it is overriden by buttons where background_image is the cropped part which will be transmitted to the VNCoreMLRequest for trying to recognize the digit.
          And, as you can see in my example, it does not work correctly on my ipad mini 4 under ios 13 and latest Pythonista beta. You could try on your idevice but I can't help more, sorry. Hoping it will be better for you 😢

          
          import os
          import io
          import photos
          import dialogs
          from PIL import Image
          from objc_util import ObjCClass, nsurl, ns
          import ui
          
          MODEL_FILENAME = 'MNIST.mlmodel'
          #MODEL_FILENAME = 'MNISTClassifier.mlmodel'
          #MODEL_FILENAME = 'OCR.mlmodel'
          #MODEL_FILENAME = 'Alphanum_28x28.mlmodel'
          
          # Use a local path for caching the model file
          MODEL_PATH = os.path.join(os.path.expanduser('~/Documents/'), MODEL_FILENAME)
          
          # Declare/import ObjC classes:
          MLModel = ObjCClass('MLModel')
          VNCoreMLModel = ObjCClass('VNCoreMLModel')
          VNCoreMLRequest = ObjCClass('VNCoreMLRequest')
          VNImageRequestHandler = ObjCClass('VNImageRequestHandler')
          
          def pil2ui(imgIn):
          	with io.BytesIO() as bIO:
          		imgIn.save(bIO, 'PNG')
          		imgOut = ui.Image.from_data(bIO.getvalue())
          	del bIO
          	return imgOut
          
          def load_model():
          	global vn_model
          	ml_model_url = nsurl(MODEL_PATH)
          	# Compile the model:
          	c_model_url = MLModel.compileModelAtURL_error_(ml_model_url, None)
          	# Load model from the compiled model file:
          	ml_model = MLModel.modelWithContentsOfURL_error_(c_model_url, None)
          	# Create a VNCoreMLModel from the MLModel for use with the Vision framework:
          	vn_model = VNCoreMLModel.modelForMLModel_error_(ml_model, None)
          	return vn_model
          
          def _classify_img_data(img_data):
          	global vn_model
          	# Create and perform the recognition request:
          	req = VNCoreMLRequest.alloc().initWithModel_(vn_model).autorelease()
          	handler = VNImageRequestHandler.alloc().initWithData_options_(img_data, None).autorelease()
          	success = handler.performRequests_error_([req], None)
          	if success:
          		best_result = req.results()[0]
          		label = str(best_result.identifier())
          		confidence = best_result.confidence()
          		return {'label': label, 'confidence': confidence}
          	else:
          		return None
          
          def classify_image(img):
          	buffer = io.BytesIO()
          	img.save(buffer, 'JPEG')
          	img_data = ns(buffer.getvalue())
          	return _classify_img_data(img_data)
          
          def classify_asset(asset):
            mv = ui.View()
            mv.background_color = 'white'
            im = ui.ImageView()
            pil_image = asset.get_image()
            print(pil_image.size)
            ui_image = asset.get_ui_image()
            n_squares = 9
            d_grid = 15 # % around the digit
            wim,him = pil_image.size
            ws,hs = ui.get_screen_size()
            if (ws/hs) < (wim/him):
              h = ws*him/wim
              im.frame = (0,(hs-h)/2,ws,h)
            else:
              w = hs*wim/him
              im.frame = ((ws-w)/2,0,w,hs)
            print(wim,him,ws,hs)
            mv.add_subview(im)
            wi = im.width
            hi = im.height
            im.image = ui_image
            im.content_mode = 1 #1
            mv.frame = (0,0,ws,hs)	 
            mv.present('fullscreen')
            dx = wim/n_squares
            dy = him/n_squares
            d = dx*d_grid/100
            dl = int((wi/n_squares)*d_grid/100)
            for ix in range(n_squares):
              x = ix*dx
              for iy in range(n_squares):
                y = iy*dy
                pil_char = pil_image.crop((int(x+d),int(y+d),int(x+dx-d),int(y+dy-d)))
                l = ui.Button()
                l.frame = (int(ix*wi/n_squares)+dl, int(iy*hi/n_squares)+dl, int(wi/n_squares)-2*dl, int(hi/n_squares)-2*dl)
                l.border_width = 1
                l.border_color = 'red'
                l.tint_color = 'red'
                ObjCInstance(l).button().contentHorizontalAlignment= 1 # left
                l.background_image = pil2ui(pil_char)
                im.add_subview(l)
                l.title = classify_image(pil_char)['label'] 
          
          def main():
              global vn_model
              vn_model = load_model()
              all_assets = photos.get_assets()
              asset = photos.pick_asset(assets=all_assets)
              if asset is None:
                return
              classify_asset(asset)
          
          if __name__ == '__main__':
          	main()
          

          1 Reply Last reply Reply Quote 1
          • ccc
            ccc last edited by

            @cvp Cool looking output! I am trying to follow along at home but have a few questions:

            1. What image (of a sudoku puzzle) did you start with?
            2. How did you constrain the labels to just 0-9?
            3. Could we have a GitHub repo for this effort? With goals:
              1. Recognize sudoku digits from a still image
              2. Recognize sudoku digits from a real-time image
            cvp 3 Replies Last reply Reply Quote 0
            • cvp
              cvp @ccc last edited by cvp

              @ccc

              1. I use an image of a sudoku grid from a Google search here
              2. Use of MNIST.mlmodel for only digits
              3. Not sure that it could be interesting because iOS 13 offers now a better way via VNRecognizeTextRequest but it was only to answer to @sodoku question

              See topic with @mikael code

              1 Reply Last reply Reply Quote 0
              • cvp
                cvp @ccc last edited by cvp

                @ccc Do you have a iDevice under iOS < 13, just to know if digits recognition is working because my above test is not ok under iOS 13.

                1 Reply Last reply Reply Quote 0
                • cvp
                  cvp @ccc last edited by cvp

                  @ccc Really because you ask it 😀 Github

                  1 Reply Last reply Reply Quote 0
                  • ccc
                    ccc last edited by

                    For non-Pythonista platforms... https://github.com/neeru1207/AI_Sudoku

                    1 Reply Last reply Reply Quote 0
                    • pavlinb
                      pavlinb last edited by pavlinb

                      Vision OCR produce prety good results.

                      But has someone succeeded to OCR text with indexes or power signs?

                      (https://commons.wikimedia.org/wiki/File:Quadratic-formula.jpg)

                      1 Reply Last reply Reply Quote 0
                      • pavlinb
                        pavlinb last edited by

                        What about OCR in realtime video mode? Is it possible?

                        1 Reply Last reply Reply Quote 0
                        • twinsant
                          twinsant last edited by

                          See this topic: https://forum.omz-software.com/topic/6016/recognize-text-from-picture

                          1 Reply Last reply Reply Quote 0
                          • enginsur
                            enginsur last edited by enginsur

                            Can you help me with one issue I can't fix on iPhone 12 Pro?. .

                            1 Reply Last reply Reply Quote 0
                            • N
                              Norman56 last edited by

                              This post is deleted!
                              1 Reply Last reply Reply Quote 0
                              • N
                                Norman56 last edited by

                                This post is deleted!
                                1 Reply Last reply Reply Quote 0
                                • N
                                  Norman56 last edited by

                                  Hey there! I'm new to this forum, and I saw your post about implementing text detection with Vision + coreML in Pythonista.

                                  1 Reply Last reply Reply Quote 0
                                  • First post
                                    Last post
                                  Powered by NodeBB Forums | Contributors