Recognize text from picture

ccc

def pil2ui(pil_image):
    buffer = io.BytesIO()
    pil_image.save(buffer, format='PNG')
    return ui.Image.from_data(buffer.getvalue())

is memory leaking buffer which has been proven to crash Pythonista when multiple images are processed. A better approach is to use a context manager to force the .close().

def pil2ui(pil_image):
    with io.BytesIO() as buffer:
        pil_image.save(buffer, format='PNG')
        return ui.Image.from_data(buffer.getvalue())

mikael

Revisiting this.

Regardless of language restrictions, I have found the simple and reliable ability to pick text from paper to be useful for me almost weekly - URLs, email addresses, reservation codes, laptop serial numbers, etc.

With the use, I noted that the original script had some issues:

Difficult to find and open when quickly needed.
Slow to get from the picked photo to recognized text.
Results are a pain to copy from the Console as it likes to jump around just as you’ve selected the text to copy.

Point #1 was fixed with a simple Apple Shortcuts shortcut to make the script easy to run.

Point #3 was resolved by presenting the recognized text in a TableView, with tap to copy.

Point #2 took a bit more doing.

Pythonista photos module wants to return PIL images, and that results in two very slow conversions - first the module converts the UIImage to PIL, and then I converted that back to a PNG image for recognition. I found some @cvp code in this thread and replaced photos module with objc_util pickers, which return PNG data almost directly.

And hey presto! Not just faster recognition, but instantaneous - and with much better quality than with the only contender app I could find (Prizmo Go).

Updated script here.

cvp

@mikael Thanks for your great 🎁for New Year

cvp

Question for a specialist of this forum.
In the last post of @mikael , I see my user as @cvp but as a black text and not clickable blue, although I've received a notification "Mikael mentioned you...".
How is that possible?

mikael

Happy last day of the decade to everyone who shares my calendar!

I finessed the script a bit with the ability to select, copy or share multiple items, and nicer icons.

@cvp, noted and wondered about the lack of the link for your handle, no idea why.

sodoku

Does this work with the new iPad OS ????

mikael

@sodoku, do you mean if the latest versions have included robust support for non-English characters? Not to my knowledge.

sodoku

So good news I got an iPhone 11 and I’m testing this on it, for sudoku, do the pictures taken save anywhere when used, just curious if I have to delete them because after I use it the pictures don’t show up in my pictures app

sodoku

Also what’s the updated code posted by Mikael on GitHub used for is it same as this one posted here or not because it’s so much longer and bigger then this code posted on the forum, is it a better version then this one on the forum

mikael

@sodoku, the code on Github is more of a tool, and much faster than the version at the beginning of this thread. For your purposes, you probably just want pieces of it.

It supports taking a picture normally and then selecting it from the photo library when you use the tool, or just snapping a quick ”disposable” in-tool image which is not saved.

pavlinb

This post is deleted!

sodoku

So there are a few edits in this thread I don’t know how to piece together to have the best edited version of this???

mikael

@sodoku, the one on github is the latest version.

sodoku

I will test it for sudoku in the console, well I will try to convert it to use in console for the sudoku solver if need help I’ll post a message

sodoku

I need help adapting this script for inputting the numbers from a picture of sudoku and insert the starting numbers into a console script, it does not work good for recognizing ones and sevens???

Example of sudoku solver

In my version I want to make the board is all zeros and when you take a picture it will add the numbers than solve, this combines the two programs (sudoku solver) & (ocr text recognition)

board=[
   [5,8,4,1,0,0,0,0,0],
   [0,0,6,8,0,0,5,1,0],
   [0,0,0,0,5,4,7,0,6],
   [0,5,3,0,1,0,0,6,7],
   [0,0,0,0,2,0,0,0,0],
   [4,6,0,0,9,0,8,3,0],
   [7,0,8,5,4,0,0,0,0],
   [0,2,9,0,0,3,4,0,0],
   [0,0,0,0,0,1,3,7,9]
   ]



def solve(bo):

   find = find_empty(bo)
   if not find:
   	return True
   else:
   	row,col = find

   for i in range(1,10):
   	if valid(bo,i,(row,col)):
   		bo[row][col] = i

   		if solve(bo):
   			return True

   		bo[row][col] = 0

   return False



def valid(bo,num,pos):
   #check row
   for i in range(len(bo[0])):
   	if bo[pos[0]][i] == num and pos[1] != i:
   		return False
   #check column
   for i in range(len(bo[0])):
   	if bo[i][pos[1]] == num and pos[0] != i:
   		return False
   #check quadrant
   box_x = pos[1] // 3
   box_y = pos[0] // 3

   for i in range(box_y * 3, box_y * 3 + 3):
   	for j in range(box_x * 3, box_x * 3 + 3):
   		if bo[i][j] == num and (i,j) != pos:
   			return False

   return True



def print_board(bo):
   for i in range(len(bo)):
   	if i % 3 == 0 and i != 0:
   		print('------+-------+------')

   	for j in range(len(bo[0])):
   		if j % 3 == 0 and j != 0:
   			print('|',end=' ')
   		if j == 8:
   			print(bo[i][j])
   		else:
   			print(str(bo[i][j])+ ' ', end='')

def find_empty(bo):
   for i in range(len(bo)):
   	for j in range(len(bo[0])):
   		if bo[i][j] == 0:
   			return (i,j) # row, col
   return None



print_board(board)
solve(board)
print('=====================')
print_board(board)

language_preference = ['fi','en','se']

import photos, ui, dialogs
import io
from objc_util import *

load_framework('Vision')
VNRecognizeTextRequest = ObjCClass('VNRecognizeTextRequest')
VNImageRequestHandler = ObjCClass('VNImageRequestHandler')

def pil2ui(pil_image):
   buffer = io.BytesIO()
   pil_image.save(buffer, format='PNG')
   return ui.Image.from_data(buffer.getvalue())

selection = dialogs.alert('Get pic', button1='Camera', button2='Photos')

ui_image = None

if selection == 1:
   pil_image = photos.capture_image()
   if pil_image is not None:
       ui_image = pil2ui(pil_image)
elif selection == 2:
   ui_image = photos.pick_asset().get_ui_image()

if ui_image is not None:
   print('Recognizing...\n')

   req = VNRecognizeTextRequest.alloc().init().autorelease()
   req.setRecognitionLanguages_(language_preference)
   handler = VNImageRequestHandler.alloc().initWithData_options_(ui_image.to_png(), None).autorelease()

   success = handler.performRequests_error_([req], None)
   if success:
       for result in req.results():
           print(result.text())
   else:
       print('Problem recognizing anything') ```

mikael

@sodoku, I tried something similar as well a while ago, first recognizing rectangles and then trying to recognize the numbers, but I hit the same issue of very poor recognition of the numbers. I wonder if we would need a number-specific recognizer for that.

Spitfire

Hi, since the sudoko is a square of many squares I think it is more robust to slice the cells evenly and only have one nr in a small image.

Of course downside is to use a recognition service per image and you go from 1 image to 81 - that can get expensive.

But it would work more robust.
Best reg Tommy

mikael

@Spitfire, thanks. I did try all kinds of approaches, finally resorting to manual cropping, and it still was not reliable enough.

pavlinb

@mikael Could you share some picture of sudoku, where recognition fails?

ccc

Multiple sample Sudoku puzzles would help to achieve a robust solution.