omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Recognize text from picture

    Pythonista
    12
    65
    25736
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • JonB
      JonB @sodoku last edited by

      @sodoku For things like enumerations, you can usually check the swift version of docs, which tells you the value. Otherwise, you can often look up source code.

      For minimumTextHeight, both swift and ObjC say this is a float. The fact that it is readwrite/nonatomic/assign is not important.
      So, usually this would just be
      req.minimumTextHeight = 32.5
      or whatever you want...

      It is often helpful to explore objects in the console, since this can tell you what you're working with. For instance, if you type req. in the console, you will see autocomplete of all known attributes. Usually you need to treat objc properties as function calls -- so to check minimumTextHeight, you'd use req.minimumTextHeight(). But to set, you can treat the property as a python attribute and assign directly. In some cases, you may need to use the set_propertyName_(value) convention.

      Where things get tricky is where the declared type is another object (in which case you have to provide the right type of object), or a structure. Structures can be tricky because objc_util often screws up the type encodings, and you have to manually override. Structures get turned into python STRUCTUREs, and you access fields normally like you would with a python object (no () needed).

      Re question 2:
      Per the docs, the results of a request will be VNRecognizedTextObservation objects. This is a subclass of VNRectangleObservation.

      @interface VNRecognizedTextObservation : VNRectangleObservation <-- colon here means inherits from

      If you look up VNRectangleObservation, you will see it has the following attributes
      bottomLeft
      bottomRight
      topLeft
      topRight
      Which are declared as CGPoint, which is a structure that has an .x and .y fields.

      for result in req.results():
          x = result.bottomLeft().x
          y = result.bottomLeft().y
          w = result.topRight().x-x
          h = result.topRight().y-y
          print('({},{},{},{}) {}'.format(x,y,w,h, result.text())
      

      You could draw the image into an image context, and then also stroke a rectangle.. something like this...(not tried).

      with ui.ImageContext(ui_image.size()) as ctx:
         ui_image.draw()
         for result in req.results():
            vertecies = [(p.x, p.y) 
                                     for p in [result.bottomLeft()
                                              result.TopLeft()
                                              result.TopRight()
                                              result.BottomRight()
                                              result.bottomLeft()]
            pth = ui.Path.moveTo(*vertecies[0]) %initial point
            for p in vertecies[1:]:
               pth.line_to(*p)  
            ui.set_color('red')
            pth.stroke()
            x,y = vertecies[0]
            w,h =(vertecies[2].x-x), (vertecies[2].y-y)
            ui.draw_string(result.text(), rect=(x,y,w,h), font=('<system>', 12), color='red')
         marked_img = ctx.get_image()
         marked_img.show()
      
      1 Reply Last reply Reply Quote 0
      • JonB
        JonB last edited by

        I realized that result will also have a .boundingBox() attribute which would make some of this a little simpler.
        That is a CGrect, consisting of .origin (in turn consisting of .x and .y and .size containing .w and .h.
        In that case you could use ui.Path.rect.

        1 Reply Last reply Reply Quote 0
        • JonB
          JonB last edited by JonB

          Okay, my previous reply was full of errors... here is a working version, which adds red boxes around each result, along with the text

          language_preference = ['fi','en','se']
          
          import photos, ui, dialogs
          import io
          from objc_util import *
          
          load_framework('Vision')
          VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest')
          VNImageRequestHandler = ObjCClass('VNImageRequestHandler')
          
          ACCURATE=0
          FAST=1
          
          def pil2ui(pil_image):
              buffer = io.BytesIO()
              pil_image.save(buffer, format='PNG')
              return ui.Image.from_data(buffer.getvalue())
          
          selection = dialogs.alert('Get pic', button1='Camera', button2='Photos')
          
          ui_image = None
          
          if selection == 1:
              pil_image = photos.capture_image()
              if pil_image is not None:
                  ui_image = pil2ui(pil_image)
          elif selection == 2:
              ui_image = photos.pick_asset().get_ui_image()
          
          if ui_image is not None:
              print('Recognizing...\n')
          
              req = VNRecognizeTextRequest.alloc().init().autorelease()
              req.recognitionLevel= ACCURATE# accurate
              req.setRecognitionLanguages_(language_preference)
              handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease()
          
              success = handler.performRequests_error_([req], None)
              if success:
                  for result in req.results():
                      print(result.text())
              else:
                  print('Problem recognizing anything')
          
          with ui.ImageContext(*tuple(ui_image.size) ) as ctx:
             ui_image.draw()
             for result in req.results():
                cgpts=[   result.bottomLeft(),
                          result.topLeft(),
                          result.topRight(),
                          result.bottomRight(),
                          result.bottomLeft()  ] 
                vertecies = [(p.x*ui_image.size.w, (1-p.y)*ui_image.size.h) for p in cgpts]
                pth = ui.Path()
                pth.move_to(*vertecies[0]) 
                for p in vertecies[1:]:
                   pth.line_to(*p)  
                ui.set_color('red')
                pth.stroke()
                x,y = vertecies[0]
                w,h =(vertecies[2][0]-x), (vertecies[2][1]-y)
                ui.draw_string(str(result.text()), rect=(x,y,w,h), font=('<system>', 12), color='red')
             marked_img = ctx.get_image()
             marked_img.show()
          
          mikael 1 Reply Last reply Reply Quote 0
          • mikael
            mikael @JonB last edited by

            @JonB and @sodoku, just a note that I tried a different route, where I first used a rectangle recognizer to isolate the numbers, and only then used text recognition.. The results were not impressive, but I can try to find the code for reference, if you think it might be useful.

            1 Reply Last reply Reply Quote 0
            • JonB
              JonB last edited by

              Here is another solution... I use a rectangle detection and a perspective correction to crop the puzzle. This gives much better detection, though not perfect. The recognition is pretty good, though it has troubles with 1’s on their own.... turn into Ts of all things. Some additional work in the clean function might fix common problems.

              I’m using images from https://github.com/prajwalkr/SnapSudoku/tree/master/train

              I suspect doing some CIFiltering first will probably improve things.

              from objc_util import *
              import ui
              
              VNImagePointForNormalizedPoint=c.VNImagePointForNormalizedPoint
              VNImagePointForNormalizedPoint.argTypes=[CGPoint, c_int, c_int]
              VNImagePointForNormalizedPoint.restype=CGPoint
              
              
              ui_image=ui.Image.named('image2.jpg')
              ui_image.show()
              
              CIImage=ObjCClass('CIImage')
              ci_image=CIImage.imageWithCGImage_(ui_image.objc_instance.CGImage())
              
              CIPerspectiveCorrection=ObjCClass('CIPerspectiveCorrection')
              f=CIPerspectiveCorrection.perspectiveCorrectionFilter()
              f.inputImage=ci_image
              o=f.outputImage()
              
              load_framework('Vision')
              VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest')
              VNDetectRectanglesRequest = ObjCClass('VNDetectRectanglesRequest')
              VNImageRequestHandler = ObjCClass('VNImageRequestHandler')
              
              req=VNDetectRectanglesRequest.alloc().init().autorelease()
              req.maximumObservations=2
              req.minimumSize=0.5
              req.minimumAspectRatio=0.7
              req.quadratureTolerance=30
              
              handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease()
              
              success = handler.performRequests_error_([req], None)
              try:
                 result=req.results()[0]
                 nm=lambda p :VNImagePointForNormalizedPoint(p,int(ui_image.size.w),int(ui_image.size.h))
                 f.topLeft = nm(result.topLeft())
                 f.topRight = nm(result.topRight())
                 f.bottomLeft = nm(result.bottomLeft())
                 f.bottomRight = nm(result.bottomRight())
                 o=f.outputImage()
              
                 with ui.ImageContext(o.extent().size.width, o.extent().size.height) as ctx:
                   UIImage.imageWithCIImage_(o).drawAtPoint_( CGPoint(0,0))
                   ui_image2=ctx.get_image()
                 ui_image2.show()
              except:
                 print('bounding rec not found...results wont work')
                 ui_image2=ui_image
              '''now, detect rectangles again...'''
              handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image2.to_png(), None).autorelease()
              req0 = VNRecognizeTextRequest.alloc().init().autorelease()
              req0.recognitionLevel= 0# accurate
              req0.usesLanguageCorrection=True
              req0.customWords=[str(a) for a in range(10)]
              
              #req0.maximumObservations=81
              #req0.minimumSize=.1
              success = handler.performRequests_error_([req0], None)
              with ui.ImageContext(*tuple(ui_image2.size) ) as ctx:
                 ui_image2.draw()
                 for result in req0.results():
                    cgpts=[result.bottomLeft(),
                                                      result.topLeft(),
                                                      result.topRight(),
                                                      result.bottomRight(),
                                                      result.bottomLeft()] 
                    vertecies = [(p.x*ui_image2.size.w, (1-p.y)*ui_image2.size.h) for p in cgpts]
                    pth = ui.Path()
                    pth.move_to(*vertecies[0]) 
                    for p in vertecies[1:]:
                       pth.line_to(*p)  
                    ui.set_color('red')
                    pth.stroke()
                    x,y = vertecies[0]
                    w,h =(vertecies[2][0]-x), (vertecies[2][1]-y)
              
                    ui.draw_string(str(result.text()), rect=(x,y,w,h), font=('<system>', 12), color='red')
                 marked_img = ctx.get_image()
                 marked_img.show()
                 
              def bbcenter(bb):
                 return((9*(bb.origin.x+bb.size.width/2)-0.5), 
                        (9*(bb.origin.y+bb.size.height/2)-0.5) )
              def clean(results):
                 cleaned=[]
                 for r in results:
                    col,row=bbcenter(r.boundingBox())
                    approx_num_ch=(r.boundingBox().size.width*9)
                    txt=str(r.text()).replace(' ','')
                    if approx_num_ch<=1:
                       if len(txt) == 1:
                           cleaned.append(((round(col),round(row)),txt))
                       else:
                           cleaned.append(((round(col),round(row)),'-1'))
                    else: #more than one char
                       col-=(len(txt)-1)/2
                       col=round(col)
                       row=round(row)
                       for ch in txt:
                         if ch in [str(a) for a in range(10)]:
                           cleaned.append(((col,row),ch))
                         else:
                           cleaned.append(((col,row),'-1'))
                         col+=1
                 return cleaned
              
              
              import numpy as np
              puzzle=np.zeros([9,9])
              for c,v in clean(req0.results()):
                 puzzle[c]=int(v)
              print(np.flipud(puzzle.T))
              
              mikael 2 Replies Last reply Reply Quote 0
              • mikael
                mikael @JonB last edited by

                @JonB, thanks, very nice. I have noted and wondered about how difficult number 1 is to recognize... Not very exotic, is it? But in my experiments it looked like the simple heuristic of ”if the result is something else than 1-9, assume it is a 1” would work pretty well for Sudoku.

                1 Reply Last reply Reply Quote 0
                • mikael
                  mikael @JonB last edited by

                  @JonB, can you open up this one a little bit?

                  approx_num_ch=(r.boundingBox().size.width*9)
                  
                  1 Reply Last reply Reply Quote 0
                  • JonB
                    JonB last edited by

                    The *9 is because if the initial rectangle detection and crop works, then each square is approx 1/9 width. So the approx number of squares a rectangle covers tells us how many characters it should have... I was getting many cases where 1 got read as Te, or some other two character value, even though the width was less than one box... so I wanted to have special handling for narrow boxes, as that is probably a 1, while wide boxes could have multiple characters because the bounding box legitimately spans adjacent boxes.

                    mikael 1 Reply Last reply Reply Quote 0
                    • mikael
                      mikael @JonB last edited by

                      @JonB, there’s something in the math here I do not quite get. I would expect something like:

                      num_char = r.bbox/(full_bbox/9) = r.bbox * 9 / full_bbox
                      

                      Thus looks like you are missing the division?

                      1 Reply Last reply Reply Quote 0
                      • JonB
                        JonB last edited by JonB

                        The results of vision are always provided as normalized coordinates — meaning the full box is always 1.
                        For drawing, you have to then multiply by image width/height.

                        Since the perspective correction both fixes perspective and crops — 1/9 is the size, roughly, of a single cell.

                        mikael 1 Reply Last reply Reply Quote 1
                        • mikael
                          mikael @JonB last edited by

                          @JonB, now I understand, thank you.

                          1 Reply Last reply Reply Quote 0
                          • sodoku
                            sodoku last edited by sodoku

                            I have a quick question in regards to the original ocr post how do I print the text as one single csv list, I tried but I have been getting a list of lists instead of one single list

                            This is a snippet of the code example I think needs to be altered

                              success = handler.performRequests_error_([req], None)
                                if success:
                                    for result in req.results():
                                        print(result.text())
                                else:
                                    print('Problem recognizing anything')
                            
                            1 Reply Last reply Reply Quote 0
                            • JonB
                              JonB last edited by

                              results=[str(result.text()) for result in req.results()]
                              
                              print(results)
                              

                              Or maybe

                              print(','.join(results))
                              
                              1 Reply Last reply Reply Quote 0
                              • ccc
                                ccc last edited by

                                There is a lot of good code here... It would be really awesome if there was a GitHub repo to stitch it all together into an app.

                                mikael 1 Reply Last reply Reply Quote 0
                                • mikael
                                  mikael @ccc last edited by

                                  @ccc, do you mean this one? PRs are always welcome.

                                  1 Reply Last reply Reply Quote 0
                                  • chrillek
                                    chrillek last edited by chrillek

                                    Hi,
                                    I'm aware that this thread is about a year old. But maybe someone can nevertheless alighten me. I'm trying to do a similar thing in JavaScript for Automation (JXA), and I see this line in your example:

                                    for result in req.results():
                                                print(result.text())
                                    

                                    translated to JavaScript, that's

                                    results.forEach(r => {
                                          console.log(r.text);
                                     })
                                    

                                    and that works like a charm. I'm just wondering why, since according to Apple's documentation, the results object doesn't even have a text property, only string (cf. https://developer.apple.com/documentation/vision/vnrecognizedtext?language=objc)

                                    I was first wondering if text is perhaps a nice Python thing, but since the same works in JavaScript, I'm sure that I'm missing something obvious in Apple's documentation. Does anyone know what (and where I should be looking)?

                                    Thanks a lot in advance
                                    Christian

                                    1 Reply Last reply Reply Quote 0
                                    • JonB
                                      JonB last edited by

                                      Does string not work?

                                      Often there are undocumented or decrecates features available in objc objects. Often we just poke around using autocomplete (which ultimately uses some of the introspection objc features of the objc runtime (which let you get a list of methods or instance vars, etc)

                                      1 Reply Last reply Reply Quote 0
                                      • chrillek
                                        chrillek last edited by

                                        Does string not work?

                                        It does, but only in a very convoluted way, like so:

                                        results.forEach(r => {
                                              console.log(r.topCandidates(1).js[0].string.js)
                                        })
                                        

                                        The js in the middle is required to convert the ObjC array returned by topCandidates to a JavaScript array (and again to convert the NSString returned by string to a JS string). But using string directly at r does not work.

                                        we just poke around using autocomplete

                                        I gues that happens in XCode (the poking around)?

                                        1 Reply Last reply Reply Quote 0
                                        • JonB
                                          JonB last edited by

                                          No, the exploration happens in pythonista, in the console. Once you have an object, dir(variable) lists the methods and such, or frankly just typing a letter and autocomplete suggestions does it's thing.

                                          If you're not using a bridging library like
                                          https://github.com/TooTallNate/NodObjC
                                          I'd suggest that you do, since it might take care of a lot of the annoying bits like converting every type to js equivalents, and let's you access some of the dynamic introspection stuff that makes objc pretty neat.

                                          Under the hood, there are objc runtime functions that let you get lists of method names. For instance,see
                                          https://github.com/jsbain/objc_hacks/blob/master/print_objc.py
                                          For how you can do it in python. Or, look at the NodObjC code for class.js and core.js -- it looks like it does something similar, using the objc copy_methodsList, etc, and adds those as J's callable functions to the prototype. Then your favorite J's debugger ought to show you what is there...

                                          1 Reply Last reply Reply Quote 0
                                          • JonB
                                            JonB last edited by

                                            In this case, looking at the headers for VNTextObservation shows the text attribute.

                                            https://github.com/xybp888/iOS-Header/blob/master/13.0/Frameworks/Vision.framework/VNRecognizedTextObservation.h

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Powered by NodeBB Forums | Contributors