Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Script crashes when making bulk upload into local server (Synology NAS)
-
I am writing a script to upload files and photos into my Synology NAS server and during testing the script crashes after a period of time. I have around 1400 photos in my phone and it always crashes between 900-1000. At first I thought that it might be some corrupt file that makes this crash and tested starting from file 800 and onwards but it does not crash.
Any idea what causes this crash when trying to upload files between 900-1000? There are no error codes, the app just closes.
-
Pythonista might not be deleting memory that you are freeing up. You might add a deliberate
del variable_name
to suggest that the garbage collector free up large memory allocations like images and videos. -
@ccc tried but still crashes. This is the code:
import appex import photos import requests import os import editor import sys from objc_util import * from datetime import datetime class Synology(object): def __init__(self, url, port, account, passwd): """ :param url: IP/URL or Synology. So far it only supports HTTP. :param port: :param account: :param passwd: :returnv: """ self.url = url self.port = port self.account = account self.passwd = passwd self.sid = None # self.session = requests.Session() def login(self): print("Logging in...") param = { "version": "2", "method": "login", "account": self.account, "passwd": self.passwd, "session": "FileStation", "format": "cookie" } url = f"http://{self.url}:{self.port}/webapi/auth.cgi?api=SYNO.API.Auth" r = requests.get(url, param, verify=False) r = r.json() if not r["success"]: raise Exception("Authenticatoin failed. Note that account with 2FA enabled does not work.") print("Logging successful.") self.sid = r["data"]["sid"] return self.sid def upload_file(self, files): upload_url = f"http://{self.url}:{self.port}/webapi/entry.cgi?api=SYNO.FileStation.Upload&version=2&method=upload&_sid={self.sid}" args = { "path": "/home/b", "create_parents": "true", "overwrite": "true" } # Ignpre this ugly code. I am trying to figure out if I can do bulk upload. file = {'file': (files[0]["fname"], files[0]["payload"], 'application/octet-stream')} # file = [('files', (x["fname"], x["payload"], "application/octet-stream")) for x in files] r = requests.post(upload_url, data=args, files=file, verify=False) del upload_url del args del file del r del files def main(): # if not appex.is_running_extension(): # print("Sript is intended to be un from sharing extension.") # return print("Starting...") s = Synology(url="1.1.1.1", port="1000", account="1", passwd="1") s.login() startTime = datetime.now() all_photos = photos.get_assets() # images = [] for idx, asset in enumerate(all_photos): if idx % 100 == 0: print(f"Uploading... {idx}/{len(all_photos)}") #fname = str(ObjCInstance(asset).valueForKey_('filename')) #payload = asset.get_image_data(original=True) # images = [{"fname": fname, "payload": payload}] s.upload_file([{"fname": str(ObjCInstance(asset).valueForKey_('filename')), "payload": asset.get_image_data(original=True)}]) del asset # print(fname) print(datetime.now() - startTime) if __name__ == "__main__": main()
-
I believe you will need to do the following:
-
ObjCInstance(asset) should be on its own line, assigned to a variable, so that you can del the instance after it is created. You might also need to delete the instance method (if I recall, the
release
method has a reference cycle of some sort). -
when dealing with many images, you MUST have an
with objc_util.autorelease_pool():
wrapping the code that grabs the asset and data, otherwise the asset won't actually get released (even if you del it).
-
-
Aha,
https://forum.omz-software.com/topic/3844/photos-asset-get_image_data-is-there-a-memory-leak/6I am not on my iPad, so not sure if this but was ever fixed.... But when the ObjCInstance is
retain
ed, the retain cached method creates a reference cycle. It is necessary to modify objc_util to avoid the leak by deleting the cached methods of the objc instance.So,what I think you need to do is:
A) wrap your upload line in an autorelease_pool() context handler
B) inside that handler, also add:ObjCInstance(asset)._cached_methods = {} del asset # probably not needed
I think it is possible that making the change to your objc_util.py effectively does the same thing, but I have not tried it recently.
-
ok. turns out that autoreleasepool() does not work -- it works once, but not upon repeated script run. This appears to be fairly robust, and does not leak:
from objc_util import * #from objc_util import autoreleasepool #this doesnt work properly import photos import sys, os try: sys.path+=os.path.expanduser('~') from objc_hacks.memstatus import _get_taskinfo except: pass # if memstatus is not installed,or not supported device import gc from PIL import Image all_photos=photos.get_assets() for idx, asset in enumerate(all_photos): pool=ObjCClass('NSAutoreleasePool').new() try: if idx % 10 == 0: print(f"Uploading... {idx}/{len(all_photos)}") if _get_taskinfo: print(_get_taskinfo().resident_size/1024/1024) fname = str(ObjCInstance(asset).valueForKey_('filename')) payload = asset.get_image_data(original=True) #do the actual upload.... # upload(....) # this is the important bit, all objcinstances must have cached methods cleared manually in order to get gc'd ObjCInstance(asset)._cached_methods.clear() finally: pool.drain() pool._cached_methods.clear() del pool ObjCClass('NSAutoreleasePool').releaseAllPools()
They keys:
-
using NSReleasePool manually, which requires a context manager or try/finally to ensure it gets releaseAllPools at the end. (the releaseAllPools could go at the end if the script, maybe, but it needs to happen before gc gets called, or script is run again, i think)
-
inside the pool, any ObjCInstance's that get created must have
._cached_methods.clear() called to break the ref cycle.
I tried with and without del the asset and payload, that is not necessary, but not harmful either
-
-
Just to close out on my investigations here, here is a cleaned up version with a proper context manager to replace the objc_util version, and some extra fluff removed. .
from objc_util import * #from objc_util import autoreleasepool #this doesnt work properly import photos import sys, os,gc import contextlib try: sys.path+=os.path.expanduser('~') from objc_hacks.memstatus import _get_taskinfo except: pass # if memstatus is not installed,or not supported device from PIL import Image all_photos=photos.get_assets() '''Replacement for objc_util.autoreleasepool that doesnt crash when rerunning script. objc_util's has two issues: first, a bug in ObcCInstance.__del__ means that NSAutoreleasePool will have release called, which is never proper for NSAutoreleasePool. This wouldnt be an issue if the object is alive, since those calls get ignired, it seems. The second issue affects all ObjCInstances: the _method_cache prevents timely garbage collection due to a ref cycle. So ObjCInstances stick around -- and thus never get release called -- until gc is run. Problem is, once the pool has been drained, we are not guaranteed to have it around later. Since objc_util doesnt retain it, after draining, it might get released, and then when gc finally breaks the ref cycle, release gets called, and that pointer might be a new object at that point, so release causes a crash''' @contextlib.contextmanager def autoreleasepool(): pool = ObjCClass('NSAutoreleasePool').new() try: yield finally: pool.drain() pool._cached_methods.clear() #break ref cycle pool.__del__=None #prevent release from being called for idx, asset in enumerate(all_photos): with autoreleasepool(): ''' debugging ''' if _get_taskinfo: print(f'\nSTART {_get_taskinfo().resident_size/1024/1024:3.2f} MB') ''' get the data''' fname = str(ObjCInstance(asset).valueForKey_('filename')) payload = asset.get_image_data(original=True) ''' Do some actual work here ... insert upload, save, etc, code ''' '''Clean up''' del payload # not actually needed ObjCInstance(asset)._cached_methods.clear() #required !!! del asset # not actually needed ''' debugging ''' if _get_taskinfo: print(f'END. {_get_taskinfo().resident_size/1024/1024:3.2f} MB')
As written, the memory is basically constant each cycle:
START 76.20 MB END. 76.52 MB START 76.20 MB END. 76.52 MB START 76.20 MB END. 76.94 MB START 76.20 MB END. 76.79 MB START 76.20 MB END. 76.85 MB
If i comment out the
with autoreleasepool()
line, essentially not using a release pool, it should now be obvious why looping phassets never works -- despite deleting the objc objects, the actual PHAsset's data does not get cleared without draining the pool, and we gain several MB per cycleSTART 94.90 MB END. 96.81 MB START 96.81 MB END. 98.68 MB START 98.68 MB END. 99.94 MB START 99.94 MB END. 100.99 MB START 101.09 MB END. 102.03 MB START 102.02 MB END. 102.89 MB
If I use the pool, but comment out the two
del
lines:START 63.28 MB END. 65.14 MB START 63.70 MB END. 64.77 MB START 63.51 MB END. 62.46 MB START 62.36 MB END. 62.43 MB START 62.35 MB END. 66.07 MB START 64.17 MB END. 66.01 MB
we see that there is no net growth, although there is somewhat more variation from cycle to cycle (because the bytesIO object payload is still in scope at the start -- if the meat of the loop is moved to a function, the results are the same with or without the del) . This shows that reference counting gc does do its job -- deleting the bytesIO object (payload) is not necessary (this was a point of contention with @ccc in a previous thread) -- when it falls out of scope, the memory is cleared.
Finally, to see the effect of the objc_util bug that prevents ObjCInstances from getting cleared when they fall out of scope, I comment out the
ObjCInstance(asset)._cached_methods.clear() !
line:since ObjCInstance objects themselves are not that big, we see a slow leak, growing maybe a 0.1 MB every few hundred images. I seemed to get more crashes if starting/stopping the fhe script many times, but that could be my imagination. So I conclude, this probably not needed unless you are creating many tens of thousands of ObjCInstances, or you need to ensure that the objects release is called at a specific time for some other reason.
-
Awesome detailed work @JonB . The small leak might become significant after hundreds of loops in an
appex
environment.