Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
dropbox.put_file() chopping off bytes
-
I don't understand what's going on here so hopefully one of you can enlighten me.
I subscribe to the RSS feed for http://kupu.maori.nz in MrReader and have a service set up to send a post to a Pythonista script to extract the word of the day and append it to a text file in my Dropbox. But every time it runs, the file shrinks by a number of bytes, usually between 2 and 25. I can't see any correlation between the size of the appended string and the size of the shrinkage.
What am I doing wrong? I'm betting it's something simple I've missed but I dunno.
#coding: utf-8 import sys import dropboxlogin from dropbox import rest import console import locale import webbrowser dropboxlogin.app_key = '...' dropboxlogin.app_secret = '...' DB_FOLDER = '/flashcards/' DB_FILE = 'Māori.cards.txt' def main(): locale.setlocale(locale.LC_ALL, '') #extract the word of the day from the RSS text rss = sys.argv[1] new_word = rss.split('.')[0] new_word = new_word.replace(': ', ' :: ') #this gives us text like "ngeru :: cat" try: db = dropboxlogin.get_client() ff, md = db.get_file_and_metadata(DB_FOLDER + DB_FILE) wordlist = ff.read().decode('utf-8').splitlines() ff.close() print(md) #to check our starting size wordlist.append(new_word) wordlist = list(set(wordlist)) wordlist.sort(cmp=locale.strcoll) md = db.put_file(DB_FOLDER + DB_FILE, u'\n'.join(wordlist), overwrite=True) print(md) #to see how much we've shrunk console.hud_alert('added {}'.format(new_word)) except rest.ErrorResponse as e: console.alert('Error - add_maori.py', message='{}\n'.format(e)) webbrowser.open('mrreader://') if __name__ == '__main__': main()
-
I would try...
DB_FOLDER = '/flashcards/' DB_FILE = 'Māori.cards.txt' DB_FILEPATH = DB_FOLDER + DB_FILE # use DB_FILEPATH in your main program # ... print('before', len(wordlist), len(''.join(wordlist)) wordlist = list(set(wordlist)) # is the set() operation removing data? print(' after', len(wordlist), len(''.join(wordlist))
-
Here are the results of running that several times, with the string I added and the bytes returned from the Dropbox metadata.
# "whatitoka :: door" ('bytes - before', 1157) ('before', 60, 1091) ('after', 60, 1091) ('bytes - after', 1150) # 17 characters, -7 bytes # "awa :: river" ('bytes - before', 1150) ('before', 60, 1083) ('after', 60, 1083) ('bytes - after', 1142) # 12 characters, -8 bytes # "hapa :: dinner" ('bytes - before', 1142) ('before', 60, 1078) ('after', 60, 1078) ('bytes - after', 1137) # 14 characters, -5 bytes # "āporo :: apple" ('bytes - before', 1137) ('before', 60, 1073) ('after', 60, 1073) ('bytes - after', 1132) # 14 characters, -5 bytes # "hgtj :: gfrd" - random characters ('bytes - before', 1132) ('before', 60, 1066) ('after', 60, 1066) ('bytes - after', 1125) # 12 characters, -7 bytes # "hgtjfj :: hytgfrd" - random characters ('bytes - before', 1125) ('before', 59, 1064) ('after', 59, 1064) ('bytes - after', 1122) # 17 characters, -3 bytes
I can't really see any pattern here. After seeing the byte difference was -5 for both 14-character strings, I tested using same length strings as earlier examples (17 and 12) but got different results.
-
Yeah, now I'm really confused. I woke up this morning thinking "hmm, maybe it's the sort", added in just one print statement and now my output looks like this:
# "hgtjfj :: hytgfrd" ('bytes - before', 1122) ('before', 59, 1062) ('after', 58, 1045) ('sorted', 58, 1045) ('bytes - after', 1102) # 17 characters, -20 bytes # "hgtjfj :: hytgfrd" ('bytes - before', 1102) ('before', 58, 1044) ('after', 57, 1027) ('sorted', 57, 1027) ('bytes - after', 1083) # 17 characters, -19 bytes # "hgtjfj :: hytgfrd" ('bytes - before', 1083) ('before', 57, 1027) ('after', 56, 1010) ('sorted', 56, 1010) ('bytes - after', 1065) # 17 characters, -18 bytes # "hgtjfj :: hytgfrd" ('bytes - before', 1065) ('before', 57, 1010) ('after', 56, 993) ('sorted', 56, 993) ('bytes - after', 1048) # 17 characters, -17 bytes
Now the
wordlist = list(set(wordlist))
line is removing data! And it looks like the byte count of the lost data is dropping by one each time. I just don't understand. -
And that's why I shouldn't code first thing out of bed in the morning. Of course it's doing that; I'm reusing the same input each time so changing it to a set strips out the dupe. Fixing that gives me this output:
# "12hgtjfj :: 34hytgfrd" ('bytes - before', 1048) ('before', 56, 999) ('after', 56, 999) ('sorted', 56, 999) ('bytes - after', 1054) # 21 characters, 6 bytes
So it's not the sort.
But I just noticed that my byte count increased this time!