omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Recognize text from picture

    Pythonista
    13
    66
    27590
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • mikael
      mikael @Spitfire last edited by

      @Spitfire, thanks. I did try all kinds of approaches, finally resorting to manual cropping, and it still was not reliable enough.

      pavlinb 1 Reply Last reply Reply Quote 0
      • pavlinb
        pavlinb @mikael last edited by

        @mikael Could you share some picture of sudoku, where recognition fails?

        1 Reply Last reply Reply Quote 1
        • ccc
          ccc last edited by

          Multiple sample Sudoku puzzles would help to achieve a robust solution.

          1 Reply Last reply Reply Quote 0
          • sodoku
            sodoku last edited by sodoku

            I have a few questions about the very first text recognition code posted on this one

            the example video I am referring to is https://developer.apple.com/videos/play/wwdc2019/234

            1 how do you change the recognition level from fast to accurate
            example code from apple website I am not sure if its written in swift or objective c but it is like this :

            myTextRegcognitionRequest.recognitionLevel = VNRequestTextRecognitionLevel.accurate
            

            and another example of this shown in the apple video for setting the recognition level

             request.recognitionLevel = .fast 
            

            question 2
            to ensure that numbers don't get mistaken as letters
            without the language corrector active to avoid mistaking the number 5 for an S or I as 1
            example of this from the video is

            extension Character {
                   
                 func GetSimilarCharacterIfNotIn(allowedChars: String -> Character {
                        let  conversionTable = [
                                  's':'5',
                                  'S':'5',
                                  'i':'1',
                                  'I':'1', ] 
            

            question 3
            if you know how to set up the special words detector thingy feature mentioned in the video

            JonB 1 Reply Last reply Reply Quote 0
            • westjensontexas
              westjensontexas last edited by westjensontexas

              I gather they are looking for words you'd find in an English dictionary. So perhaps façade, or tête-à-tête might recognize, while other examples wouldn't? mobdro apk tubemate

              1 Reply Last reply Reply Quote 0
              • JonB
                JonB @sodoku last edited by

                @sodoku
                See https://developer.apple.com/documentation/vision/vnrequesttextrecognitionlevel/fast

                Try req.recognitionLevel=1 for fast, or 0 for accurate.

                Re fixing characters... I gather you might set req.usesLanguageCorrection=False (or maybe 0), then make your own replacement map and use str.translate.

                Custom words is handled by
                req.customWords = ['customword1', 'etc']

                See apple docs for VNRecognizeTextRequest

                1 Reply Last reply Reply Quote 0
                • sodoku
                  sodoku last edited by sodoku

                  ive seen the apple documentation coding on Vision Framework I just dont know how to convert it to python

                  Question 1
                  What about the setting the minimum text height how do you translate either of these codes to python????
                  @property(readwrite, nonatomic, assign) float minimumTextHeight; written in objective-c
                  var minimumTextHeight: Float { get set } written in Swift

                  Question 2
                  I was also interested in learning how to recognize the individual boxes from a sudoku puzzle to extract the numbers is there a way to do that possibly with
                  VNRecognizedTextObservation A request that detects and recognizes regions of text in an image.
                  or possibly with the bounding box technique show in the video https://developer.apple.com/videos/play/wwdc2019/234 , also can you put multiple bounding boxes to recognize text from a sudoku card

                  this is Mikeals code i am trying to insert the code into but dont know how to convert the code shown in the apple documentation into python

                  language_preference = ['fi','en','se']
                  
                  import photos, ui, dialogs
                  import io
                  from objc_util import *
                  
                  load_framework('Vision')
                  VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest')
                  VNImageRequestHandler = ObjCClass('VNImageRequestHandler')
                  
                  def pil2ui(pil_image):
                      buffer = io.BytesIO()
                      pil_image.save(buffer, format='PNG')
                      return ui.Image.from_data(buffer.getvalue())
                  
                  selection = dialogs.alert('Get pic', button1='Camera', button2='Photos')
                  
                  ui_image = None
                  
                  if selection == 1:
                      pil_image = photos.capture_image()
                      if pil_image is not None:
                          ui_image = pil2ui(pil_image)
                  elif selection == 2:
                      ui_image = photos.pick_asset().get_ui_image()
                  
                  if ui_image is not None:
                      print('Recognizing...\n')
                  
                      req = VNRecognizeTextRequest.alloc().init().autorelease()
                      req.recognitionLevel=1
                      req.setRecognitionLanguages_(language_preference)
                      handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease()
                  
                      success = handler.performRequests_error_([req], None)
                      if success:
                          for result in req.results():
                              print(result.text())
                      else:
                          print('Problem recognizing anything')
                  
                  JonB 1 Reply Last reply Reply Quote 0
                  • JonB
                    JonB @sodoku last edited by

                    @sodoku For things like enumerations, you can usually check the swift version of docs, which tells you the value. Otherwise, you can often look up source code.

                    For minimumTextHeight, both swift and ObjC say this is a float. The fact that it is readwrite/nonatomic/assign is not important.
                    So, usually this would just be
                    req.minimumTextHeight = 32.5
                    or whatever you want...

                    It is often helpful to explore objects in the console, since this can tell you what you're working with. For instance, if you type req. in the console, you will see autocomplete of all known attributes. Usually you need to treat objc properties as function calls -- so to check minimumTextHeight, you'd use req.minimumTextHeight(). But to set, you can treat the property as a python attribute and assign directly. In some cases, you may need to use the set_propertyName_(value) convention.

                    Where things get tricky is where the declared type is another object (in which case you have to provide the right type of object), or a structure. Structures can be tricky because objc_util often screws up the type encodings, and you have to manually override. Structures get turned into python STRUCTUREs, and you access fields normally like you would with a python object (no () needed).

                    Re question 2:
                    Per the docs, the results of a request will be VNRecognizedTextObservation objects. This is a subclass of VNRectangleObservation.

                    @interface VNRecognizedTextObservation : VNRectangleObservation <-- colon here means inherits from

                    If you look up VNRectangleObservation, you will see it has the following attributes
                    bottomLeft
                    bottomRight
                    topLeft
                    topRight
                    Which are declared as CGPoint, which is a structure that has an .x and .y fields.

                    for result in req.results():
                        x = result.bottomLeft().x
                        y = result.bottomLeft().y
                        w = result.topRight().x-x
                        h = result.topRight().y-y
                        print('({},{},{},{}) {}'.format(x,y,w,h, result.text())
                    

                    You could draw the image into an image context, and then also stroke a rectangle.. something like this...(not tried).

                    with ui.ImageContext(ui_image.size()) as ctx:
                       ui_image.draw()
                       for result in req.results():
                          vertecies = [(p.x, p.y) 
                                                   for p in [result.bottomLeft()
                                                            result.TopLeft()
                                                            result.TopRight()
                                                            result.BottomRight()
                                                            result.bottomLeft()]
                          pth = ui.Path.moveTo(*vertecies[0]) %initial point
                          for p in vertecies[1:]:
                             pth.line_to(*p)  
                          ui.set_color('red')
                          pth.stroke()
                          x,y = vertecies[0]
                          w,h =(vertecies[2].x-x), (vertecies[2].y-y)
                          ui.draw_string(result.text(), rect=(x,y,w,h), font=('<system>', 12), color='red')
                       marked_img = ctx.get_image()
                       marked_img.show()
                    
                    1 Reply Last reply Reply Quote 0
                    • JonB
                      JonB last edited by

                      I realized that result will also have a .boundingBox() attribute which would make some of this a little simpler.
                      That is a CGrect, consisting of .origin (in turn consisting of .x and .y and .size containing .w and .h.
                      In that case you could use ui.Path.rect.

                      1 Reply Last reply Reply Quote 0
                      • JonB
                        JonB last edited by JonB

                        Okay, my previous reply was full of errors... here is a working version, which adds red boxes around each result, along with the text

                        language_preference = ['fi','en','se']
                        
                        import photos, ui, dialogs
                        import io
                        from objc_util import *
                        
                        load_framework('Vision')
                        VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest')
                        VNImageRequestHandler = ObjCClass('VNImageRequestHandler')
                        
                        ACCURATE=0
                        FAST=1
                        
                        def pil2ui(pil_image):
                            buffer = io.BytesIO()
                            pil_image.save(buffer, format='PNG')
                            return ui.Image.from_data(buffer.getvalue())
                        
                        selection = dialogs.alert('Get pic', button1='Camera', button2='Photos')
                        
                        ui_image = None
                        
                        if selection == 1:
                            pil_image = photos.capture_image()
                            if pil_image is not None:
                                ui_image = pil2ui(pil_image)
                        elif selection == 2:
                            ui_image = photos.pick_asset().get_ui_image()
                        
                        if ui_image is not None:
                            print('Recognizing...\n')
                        
                            req = VNRecognizeTextRequest.alloc().init().autorelease()
                            req.recognitionLevel= ACCURATE# accurate
                            req.setRecognitionLanguages_(language_preference)
                            handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease()
                        
                            success = handler.performRequests_error_([req], None)
                            if success:
                                for result in req.results():
                                    print(result.text())
                            else:
                                print('Problem recognizing anything')
                        
                        with ui.ImageContext(*tuple(ui_image.size) ) as ctx:
                           ui_image.draw()
                           for result in req.results():
                              cgpts=[   result.bottomLeft(),
                                        result.topLeft(),
                                        result.topRight(),
                                        result.bottomRight(),
                                        result.bottomLeft()  ] 
                              vertecies = [(p.x*ui_image.size.w, (1-p.y)*ui_image.size.h) for p in cgpts]
                              pth = ui.Path()
                              pth.move_to(*vertecies[0]) 
                              for p in vertecies[1:]:
                                 pth.line_to(*p)  
                              ui.set_color('red')
                              pth.stroke()
                              x,y = vertecies[0]
                              w,h =(vertecies[2][0]-x), (vertecies[2][1]-y)
                              ui.draw_string(str(result.text()), rect=(x,y,w,h), font=('<system>', 12), color='red')
                           marked_img = ctx.get_image()
                           marked_img.show()
                        
                        mikael 1 Reply Last reply Reply Quote 0
                        • mikael
                          mikael @JonB last edited by

                          @JonB and @sodoku, just a note that I tried a different route, where I first used a rectangle recognizer to isolate the numbers, and only then used text recognition.. The results were not impressive, but I can try to find the code for reference, if you think it might be useful.

                          1 Reply Last reply Reply Quote 0
                          • JonB
                            JonB last edited by

                            Here is another solution... I use a rectangle detection and a perspective correction to crop the puzzle. This gives much better detection, though not perfect. The recognition is pretty good, though it has troubles with 1’s on their own.... turn into Ts of all things. Some additional work in the clean function might fix common problems.

                            I’m using images from https://github.com/prajwalkr/SnapSudoku/tree/master/train

                            I suspect doing some CIFiltering first will probably improve things.

                            from objc_util import *
                            import ui
                            
                            VNImagePointForNormalizedPoint=c.VNImagePointForNormalizedPoint
                            VNImagePointForNormalizedPoint.argTypes=[CGPoint, c_int, c_int]
                            VNImagePointForNormalizedPoint.restype=CGPoint
                            
                            
                            ui_image=ui.Image.named('image2.jpg')
                            ui_image.show()
                            
                            CIImage=ObjCClass('CIImage')
                            ci_image=CIImage.imageWithCGImage_(ui_image.objc_instance.CGImage())
                            
                            CIPerspectiveCorrection=ObjCClass('CIPerspectiveCorrection')
                            f=CIPerspectiveCorrection.perspectiveCorrectionFilter()
                            f.inputImage=ci_image
                            o=f.outputImage()
                            
                            load_framework('Vision')
                            VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest')
                            VNDetectRectanglesRequest = ObjCClass('VNDetectRectanglesRequest')
                            VNImageRequestHandler = ObjCClass('VNImageRequestHandler')
                            
                            req=VNDetectRectanglesRequest.alloc().init().autorelease()
                            req.maximumObservations=2
                            req.minimumSize=0.5
                            req.minimumAspectRatio=0.7
                            req.quadratureTolerance=30
                            
                            handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease()
                            
                            success = handler.performRequests_error_([req], None)
                            try:
                               result=req.results()[0]
                               nm=lambda p :VNImagePointForNormalizedPoint(p,int(ui_image.size.w),int(ui_image.size.h))
                               f.topLeft = nm(result.topLeft())
                               f.topRight = nm(result.topRight())
                               f.bottomLeft = nm(result.bottomLeft())
                               f.bottomRight = nm(result.bottomRight())
                               o=f.outputImage()
                            
                               with ui.ImageContext(o.extent().size.width, o.extent().size.height) as ctx:
                                 UIImage.imageWithCIImage_(o).drawAtPoint_( CGPoint(0,0))
                                 ui_image2=ctx.get_image()
                               ui_image2.show()
                            except:
                               print('bounding rec not found...results wont work')
                               ui_image2=ui_image
                            '''now, detect rectangles again...'''
                            handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image2.to_png(), None).autorelease()
                            req0 = VNRecognizeTextRequest.alloc().init().autorelease()
                            req0.recognitionLevel= 0# accurate
                            req0.usesLanguageCorrection=True
                            req0.customWords=[str(a) for a in range(10)]
                            
                            #req0.maximumObservations=81
                            #req0.minimumSize=.1
                            success = handler.performRequests_error_([req0], None)
                            with ui.ImageContext(*tuple(ui_image2.size) ) as ctx:
                               ui_image2.draw()
                               for result in req0.results():
                                  cgpts=[result.bottomLeft(),
                                                                    result.topLeft(),
                                                                    result.topRight(),
                                                                    result.bottomRight(),
                                                                    result.bottomLeft()] 
                                  vertecies = [(p.x*ui_image2.size.w, (1-p.y)*ui_image2.size.h) for p in cgpts]
                                  pth = ui.Path()
                                  pth.move_to(*vertecies[0]) 
                                  for p in vertecies[1:]:
                                     pth.line_to(*p)  
                                  ui.set_color('red')
                                  pth.stroke()
                                  x,y = vertecies[0]
                                  w,h =(vertecies[2][0]-x), (vertecies[2][1]-y)
                            
                                  ui.draw_string(str(result.text()), rect=(x,y,w,h), font=('<system>', 12), color='red')
                               marked_img = ctx.get_image()
                               marked_img.show()
                               
                            def bbcenter(bb):
                               return((9*(bb.origin.x+bb.size.width/2)-0.5), 
                                      (9*(bb.origin.y+bb.size.height/2)-0.5) )
                            def clean(results):
                               cleaned=[]
                               for r in results:
                                  col,row=bbcenter(r.boundingBox())
                                  approx_num_ch=(r.boundingBox().size.width*9)
                                  txt=str(r.text()).replace(' ','')
                                  if approx_num_ch<=1:
                                     if len(txt) == 1:
                                         cleaned.append(((round(col),round(row)),txt))
                                     else:
                                         cleaned.append(((round(col),round(row)),'-1'))
                                  else: #more than one char
                                     col-=(len(txt)-1)/2
                                     col=round(col)
                                     row=round(row)
                                     for ch in txt:
                                       if ch in [str(a) for a in range(10)]:
                                         cleaned.append(((col,row),ch))
                                       else:
                                         cleaned.append(((col,row),'-1'))
                                       col+=1
                               return cleaned
                            
                            
                            import numpy as np
                            puzzle=np.zeros([9,9])
                            for c,v in clean(req0.results()):
                               puzzle[c]=int(v)
                            print(np.flipud(puzzle.T))
                            
                            mikael 2 Replies Last reply Reply Quote 0
                            • mikael
                              mikael @JonB last edited by

                              @JonB, thanks, very nice. I have noted and wondered about how difficult number 1 is to recognize... Not very exotic, is it? But in my experiments it looked like the simple heuristic of ”if the result is something else than 1-9, assume it is a 1” would work pretty well for Sudoku.

                              1 Reply Last reply Reply Quote 0
                              • mikael
                                mikael @JonB last edited by

                                @JonB, can you open up this one a little bit?

                                approx_num_ch=(r.boundingBox().size.width*9)
                                
                                1 Reply Last reply Reply Quote 0
                                • JonB
                                  JonB last edited by

                                  The *9 is because if the initial rectangle detection and crop works, then each square is approx 1/9 width. So the approx number of squares a rectangle covers tells us how many characters it should have... I was getting many cases where 1 got read as Te, or some other two character value, even though the width was less than one box... so I wanted to have special handling for narrow boxes, as that is probably a 1, while wide boxes could have multiple characters because the bounding box legitimately spans adjacent boxes.

                                  mikael 1 Reply Last reply Reply Quote 0
                                  • mikael
                                    mikael @JonB last edited by

                                    @JonB, there’s something in the math here I do not quite get. I would expect something like:

                                    num_char = r.bbox/(full_bbox/9) = r.bbox * 9 / full_bbox
                                    

                                    Thus looks like you are missing the division?

                                    1 Reply Last reply Reply Quote 0
                                    • JonB
                                      JonB last edited by JonB

                                      The results of vision are always provided as normalized coordinates — meaning the full box is always 1.
                                      For drawing, you have to then multiply by image width/height.

                                      Since the perspective correction both fixes perspective and crops — 1/9 is the size, roughly, of a single cell.

                                      mikael 1 Reply Last reply Reply Quote 1
                                      • mikael
                                        mikael @JonB last edited by

                                        @JonB, now I understand, thank you.

                                        1 Reply Last reply Reply Quote 0
                                        • sodoku
                                          sodoku last edited by sodoku

                                          I have a quick question in regards to the original ocr post how do I print the text as one single csv list, I tried but I have been getting a list of lists instead of one single list

                                          This is a snippet of the code example I think needs to be altered

                                            success = handler.performRequests_error_([req], None)
                                              if success:
                                                  for result in req.results():
                                                      print(result.text())
                                              else:
                                                  print('Problem recognizing anything')
                                          
                                          1 Reply Last reply Reply Quote 0
                                          • JonB
                                            JonB last edited by

                                            results=[str(result.text()) for result in req.results()]
                                            
                                            print(results)
                                            

                                            Or maybe

                                            print(','.join(results))
                                            
                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Powered by NodeBB Forums | Contributors