Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Binary files read and write
-
The App is SPLnFFT for iOS. Your script works fine in Array mode with the only exception of this instruction: floats_in_the_file = os.path.getsize(filename) / struct.calcsize('f')
-
I just noticed that you changed the source code of the script. Now it reads 5MB binary data in about 4 seconds using the Array Mode. I'll check the Structure Mode and report results. Congratulations
-
How can I access the data in the 2D array to make plots and numeric calculation of aggregated data?. The App saves 5MB data for a 24 hours recording. If the recording time is shorter it pads data with zeroes. Is there a way to filter out this values while reading the file?. The total data saved in float format is computed as follows: count=2436008*2;
-
There was an error count=24 * 3600 * 8 * 2; Sorry
-
Here is a short sample of the values read by your script in Array Mode. Observe the zero values at the end.
(50.56001281738281, 53.25138854980469), (51.46320724487305, 59.16133117675781), (53.85163116455078, 56.33137512207031), (54.70978546142578, 54.6609001159668), (55.20241165161133, 47.02310562133789), (55.262977600097656, 40.11175537109375), (54.45186996459961, 43.808773040771484), (54.05076217651367, 42.50151824951172), (53.665225982666016, 54.5957145690918), (53.8406867980957, 65.96211242675781), (58.009944915771484, 65.0105972290039), (59.889801025390625, 59.607154846191406), (60.222530364990234, 56.390960693359375), (60.41679763793945, 54.57362747192383), (60.55101776123047, 41.55317687988281), (60.54636001586914, 40.97874450683594), (60.54383850097656, 46.896514892578125), (60.42774200439453, 45.11729049682617), (57.88348388671875, 49.7500114440918), (53.61359786987305, 46.01418685913086), (50.81377410888672, 36.51190185546875), (48.24237823486328, 31.083229064941406), (44.919986724853516, 32.971107482910156), (44.69907760620117, 46.87627410888672), (45.31845474243164, 39.48908233642578), (44.62737274169922, 35.4039192199707), (44.04755401611328, 34.72141647338867), (41.45051956176758, 34.2315673828125), (39.6866569519043, 33.11891174316406), (39.542598724365234, 50.92300796508789), (43.85606002807617, 34.235633850097656), (43.871002197265625, 40.384735107421875), (42.935977935791016, 52.999149322509766), (46.3834228515625, 52.6627311706543), (48.203895568847656, 57.949790954589844), (51.57520294189453, 52.208770751953125), (52.15311050415039, 45.712528228759766), (52.26800537109375, 50.851497650146484), (52.261497497558594, 46.12493133544922), (52.383358001708984, 84.88561248779297), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0)
-
How can I access the data in the 2D array to make plots and numeric calculation of aggregated data?
I am unclear what you mean. Your last post looks to me like it is a list of (x, y) tuples. What else do you need?
To remove all (0.0,0.0) elements from your list...
my_list = [(x[0], x[1]) for x in my_list if x[0] and x[1]] # remove all (0.0,0.0) elements
count=24 * 3600 * 8 * 2
count = 24 (hours in a day) * 60 (minutes in an hour) * 60 (seconds in a minute) * 8 (what is this? (samples per second?)) * 2 (values (fast and slow?))
-
Thera are two weighting times for SPL meters : SLOW = 0ne reading per second; FAST = one reading every 1/8 of second. That means that you have a pair of eight data points every second. One minute has 60 seconds so you have 60 * 60 = 3600 seconds per hour. One hour has 3600 * 8 * 2 = 57600 data points in float format that are exported to Dropbox. Another problem are the NAN AND infinite values generated for many reasons, that have to be replaced by the previous SLOW or FAST recorded values. They are mostly negative values. As you can observe, there is a post processing job to be done before plotting or computing aggregated data to render reliable results.
-
The total size of any file exported to Dropbox by the App SPLnFFT Noise Meter is 5529600 Bytes, therefore each data point in float format uses 4 Bytes. (24 * 3600 * 8 * 2 = 1382400) * 4 = 5529600. I've observed that The instruction: floats_in_the_file = os.path.getsize(filename) / struct.calcsize('f') reads a lot of garbage where is supposed to read zeroes.
-
Do the
array
approach andstruct
approach generate the same list?What is printed if you add
print(floats_in_the_file)
when you run the script against a 5529600 byte file? I would expect1382400
.You could try removing bogus values by post-processing the list with:
my_list = [(x[0], x[1]) for x in my_list if x[0] > 0 and x[1] > 0] # remove invalid elements
-
You are right, the maximum number of data points in the binary file is 1382400. I'm new in Phytonista and I'll have to read how to detect and remove NaN an Infinite values in Phyton and what are the available Array functions. I've an Academic Apple Developer License and I'm exploring all the available options to process the noise data within the IOS environment with an Universal standalone App. As far as I know, Phytonista seems to be the only one to import SPL data with a script from the Dropbox to its sandbox, overriding the cumbersome iTunes File Sharing. The project is part of an epidemiological investigation on Environmental Noise and Health which includes, among other challenges, the simultaneous recording of an ECG.
Thanks for your valuable help. -
OK... In just over 1 second
SPLnFFT_Reader.py
reads 1,382,400 floats out of the binary file, converts that into a 2d list of 691,200fast_slow
pairs and cleans that down to a 2d list of 2,786 validfast_slow
pairs and prints out the first 50 pairs.My cleansing step might not be right for your purposes. You can use
math.isnan()
andmath.isinf()
to find those values but I do not believe that it is required anymore because the author of the SPLnFFT app told me in an email that "In the matlab [example] script there is some processing to get rid of NaN data. But I thought I had solved this in latest release of SPLnFFT". -
Calling all
numpy gurus
... Why does this not work as expected?import numpy data = numpy.fromfile('SPLnFFT_2015_07_21.bin', dtype=float) print(len(data)) # 691200 :-( this is half of the expected number
-
You can try this:
data = numpy.fromfile('SPLnFFT_2015_07_21.bin', dtype=numpy.dtype('f4'))
The Python
float
data type is usually implemented as adouble
(8 bytes), so this specifies the number of bytes explicitly. -
Do issues still exist with byte order on different platforms? I really don't know. A long time ago, we used to have to consider this. Big and little Indien when reading binary/memory files without an API that took care of the translation
-
Yes. Complexity is preserved but it is better hidden. The fortunate thing here is that the file in question was written out by one iOS app (SPLnFFT) and read in by another iOS app (Pythonista) so byte order is not an issue.
-
@ccc. Ok, understand. Honestly, was not even sure these issues still existed. Regardless, normally they have no impact as long as you are calling API calls, it's when we decide to get tricky and implement our functions/ methods for reading so called cross platform files. But in this environment, I think it's food for thought. But as you say in the case, both files written from iOS so not a problem
-
Now I understand why numpy is all the rage with data scientists!!!
3 lines of numpy do the whole thing!! Import, read, transform, and cleanse. Much faster execution time too.
import numpy data = numpy.fromfile('SPLnFFT_2015_07_21.bin', dtype=numpy.float32).reshape(-1, 2) data = data[numpy.all(data > 0, axis=1)] # cleanse print(type(data), len(data)) # numpy.ndarray, 2786 print(data[:20]) # print first 20 fast, slow pairs
-
Hi CCC. I tried your script with an edited version of a SPLnFFT binary files before the iOS two last updates. Last night I made some random noise mesurements. For my surprise the exported files had many chunks of zeroes alternating with random chunks of normal SPL values. That is not a mormal behavior. No NaN or Infinite values were detected this time. If you give me a mail address I can send you the link to some test files in my Dropbox account. The struct approach and the array approach render the same results. You JUST gave another present to SPLnFFT users with your SPLnFFT_Reader.py. I'll download and try it right away. Best Regards
-
Use the
numpy version
instead. It is simpler, faster, and easier to mess around with. If you have a computer with an iPython notebook, that would be a great environment for exploring the dataset.To send a Dropbox file, you can check it directly into the Github repo above via a pull request or you can go into your Dropbox client and tap once on the file to select it and then tap the share icon (a box with an arrow pointing up out of it) and share as email. Cut the URL out of that draft email, and paste it into a comment on the repo or here.
-
does the splnfft guy have matlab scripts that read and plot the data? the screenshots show such an .m file. if younhave a copy of that, it would explain how to parse and interpret the data.