omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Binary files read and write

    Pythonista
    8
    99
    91672
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • dgelessus
      dgelessus last edited by

      To open files in binary mode (instead of text mode, which is the default), use the "rb" (read) and "wb" (write) modes. To convert the floating-point data to Python's float type, have a look at the struct module. Make sure to explicitly set the byte order to what is used in your file, otherwise the resulting data might be bogus.

      For the "further processing" part, you might find the numpy module useful. I haven't used it much myself - it's possible that it also provides ways of reading C-style arrays from a bytestring or a file.

      1 Reply Last reply Reply Quote 0
      • ccc
        ccc last edited by ccc

        EDIT: https://github.com/cclauss/SPLnFFT_tools contains the code created based on this thread. Please open issues and/or submit pull requests to improve that code.

        I created read() and write() with the array module and again with the struct module. It seems like much more work than just reading and writing json files which are far more portable and remove all the machine-specific abnormalities. https://github.com/cclauss/SPLnFFT_tools/blob/master/old_code/binary_file_of_2d_matrix.py

        1 Reply Last reply Reply Quote 0
        • ManuelU
          ManuelU last edited by

          Thanks to CCC for this valuable piece of code which teaches a lot about dealing with binary data imported from other Apps. In my case, it is raw float data of SPL measurements organized in two columns for both Slow and Fast time weightings that were imported from Dropbox with a script available in this Forum. I wonder why 2D or nD arrays have to be dealt as structures in Python.
          I'll test it and will report the results. Thanks again
          ManuelU

          1 Reply Last reply Reply Quote 0
          • ccc
            ccc last edited by ccc

            Which SPL are we talking about? https://en.wikipedia.org/wiki/SPL ?

            EDIT: SPL stands for Sound Pressure Level. The binary files in question are generated by the iOS apps SPLnFFT and SPLnWatch with a required in-app purchase.

            1 Reply Last reply Reply Quote 0
            • ManuelU
              ManuelU last edited by

              The sample script works fine and now I understand the process. Just a further question: For reading only my binary file with floating point data, like 'MYSPLDATA.BIN' how should I define the 2D array? , and do I need to initialize it with some value, as in the sample script?.

              1 Reply Last reply Reply Quote 0
              • ccc
                ccc last edited by

                What is SPL? What kind of computer writes the SPL datafile? What program on that computer writes the SPL datafile? Is there any documentation on the SPL format? Is there a sample SPL file with know values?

                I would start with:

                print(read_floats_via_array('MYSPLDATA.BIN'))
                

                And see if the values look good.

                1 Reply Last reply Reply Quote 0
                • ManuelU
                  ManuelU last edited by

                  The App is SPLnFFT for iOS. Your script works fine in Array mode with the only exception of this instruction: floats_in_the_file = os.path.getsize(filename) / struct.calcsize('f')

                  1 Reply Last reply Reply Quote 0
                  • ManuelU
                    ManuelU last edited by

                    I just noticed that you changed the source code of the script. Now it reads 5MB binary data in about 4 seconds using the Array Mode. I'll check the Structure Mode and report results. Congratulations

                    1 Reply Last reply Reply Quote 0
                    • ManuelU
                      ManuelU last edited by

                      How can I access the data in the 2D array to make plots and numeric calculation of aggregated data?. The App saves 5MB data for a 24 hours recording. If the recording time is shorter it pads data with zeroes. Is there a way to filter out this values while reading the file?. The total data saved in float format is computed as follows: count=2436008*2;

                      1 Reply Last reply Reply Quote 0
                      • ManuelU
                        ManuelU last edited by

                        There was an error count=24 * 3600 * 8 * 2; Sorry

                        1 Reply Last reply Reply Quote 0
                        • ManuelU
                          ManuelU last edited by ccc

                          Here is a short sample of the values read by your script in Array Mode. Observe the zero values at the end.

                          (50.56001281738281, 53.25138854980469), (51.46320724487305, 59.16133117675781), (53.85163116455078, 56.33137512207031), (54.70978546142578, 54.6609001159668), (55.20241165161133, 47.02310562133789), (55.262977600097656, 40.11175537109375), (54.45186996459961, 43.808773040771484), (54.05076217651367, 42.50151824951172), (53.665225982666016, 54.5957145690918), (53.8406867980957, 65.96211242675781), (58.009944915771484, 65.0105972290039), (59.889801025390625, 59.607154846191406), (60.222530364990234, 56.390960693359375), (60.41679763793945, 54.57362747192383), (60.55101776123047, 41.55317687988281), (60.54636001586914, 40.97874450683594), (60.54383850097656, 46.896514892578125), (60.42774200439453, 45.11729049682617), (57.88348388671875, 49.7500114440918), (53.61359786987305, 46.01418685913086), (50.81377410888672, 36.51190185546875), (48.24237823486328, 31.083229064941406), (44.919986724853516, 32.971107482910156), (44.69907760620117, 46.87627410888672), (45.31845474243164, 39.48908233642578), (44.62737274169922, 35.4039192199707), (44.04755401611328, 34.72141647338867), (41.45051956176758, 34.2315673828125), (39.6866569519043, 33.11891174316406), (39.542598724365234, 50.92300796508789), (43.85606002807617, 34.235633850097656), (43.871002197265625, 40.384735107421875), (42.935977935791016, 52.999149322509766), (46.3834228515625, 52.6627311706543), (48.203895568847656, 57.949790954589844), (51.57520294189453, 52.208770751953125), (52.15311050415039, 45.712528228759766), (52.26800537109375, 50.851497650146484), (52.261497497558594, 46.12493133544922), (52.383358001708984, 84.88561248779297), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0), (0.0, 0.0)

                          1 Reply Last reply Reply Quote 0
                          • ccc
                            ccc last edited by

                            How can I access the data in the 2D array to make plots and numeric calculation of aggregated data?

                            I am unclear what you mean. Your last post looks to me like it is a list of (x, y) tuples. What else do you need?

                            To remove all (0.0,0.0) elements from your list...

                            my_list = [(x[0], x[1]) for x in my_list if x[0] and x[1]]  # remove all (0.0,0.0) elements
                            

                            count=24 * 3600 * 8 * 2

                            count = 24 (hours in a day) * 60 (minutes in an hour) * 60 (seconds in a minute) * 8 (what is this? (samples per second?)) * 2 (values (fast and slow?))

                            1 Reply Last reply Reply Quote 0
                            • ManuelU
                              ManuelU last edited by

                              Thera are two weighting times for SPL meters : SLOW = 0ne reading per second; FAST = one reading every 1/8 of second. That means that you have a pair of eight data points every second. One minute has 60 seconds so you have 60 * 60 = 3600 seconds per hour. One hour has 3600 * 8 * 2 = 57600 data points in float format that are exported to Dropbox. Another problem are the NAN AND infinite values generated for many reasons, that have to be replaced by the previous SLOW or FAST recorded values. They are mostly negative values. As you can observe, there is a post processing job to be done before plotting or computing aggregated data to render reliable results.

                              1 Reply Last reply Reply Quote 0
                              • ManuelU
                                ManuelU last edited by

                                The total size of any file exported to Dropbox by the App SPLnFFT Noise Meter is 5529600 Bytes, therefore each data point in float format uses 4 Bytes. (24 * 3600 * 8 * 2 = 1382400) * 4 = 5529600. I've observed that The instruction: floats_in_the_file = os.path.getsize(filename) / struct.calcsize('f') reads a lot of garbage where is supposed to read zeroes.

                                1 Reply Last reply Reply Quote 0
                                • ccc
                                  ccc last edited by

                                  Do the array approach and struct approach generate the same list?

                                  What is printed if you add print(floats_in_the_file) when you run the script against a 5529600 byte file? I would expect 1382400.

                                  You could try removing bogus values by post-processing the list with:

                                  my_list = [(x[0], x[1]) for x in my_list if x[0] > 0 and x[1] > 0]  # remove invalid elements
                                  
                                  1 Reply Last reply Reply Quote 0
                                  • ManuelU
                                    ManuelU last edited by

                                    You are right, the maximum number of data points in the binary file is 1382400. I'm new in Phytonista and I'll have to read how to detect and remove NaN an Infinite values in Phyton and what are the available Array functions. I've an Academic Apple Developer License and I'm exploring all the available options to process the noise data within the IOS environment with an Universal standalone App. As far as I know, Phytonista seems to be the only one to import SPL data with a script from the Dropbox to its sandbox, overriding the cumbersome iTunes File Sharing. The project is part of an epidemiological investigation on Environmental Noise and Health which includes, among other challenges, the simultaneous recording of an ECG.
                                    Thanks for your valuable help.

                                    1 Reply Last reply Reply Quote 0
                                    • ccc
                                      ccc last edited by ccc

                                      OK... In just over 1 second SPLnFFT_Reader.py reads 1,382,400 floats out of the binary file, converts that into a 2d list of 691,200 fast_slow pairs and cleans that down to a 2d list of 2,786 valid fast_slow pairs and prints out the first 50 pairs.

                                      My cleansing step might not be right for your purposes. You can use math.isnan() and math.isinf() to find those values but I do not believe that it is required anymore because the author of the SPLnFFT app told me in an email that "In the matlab [example] script there is some processing to get rid of NaN data. But I thought I had solved this in latest release of SPLnFFT".

                                      1 Reply Last reply Reply Quote 0
                                      • ccc
                                        ccc last edited by

                                        Calling all numpy gurus... Why does this not work as expected?

                                        import numpy
                                        data = numpy.fromfile('SPLnFFT_2015_07_21.bin', dtype=float)
                                        print(len(data))  # 691200  :-( this is half of the expected number
                                        
                                        1 Reply Last reply Reply Quote 0
                                        • omz
                                          omz last edited by

                                          You can try this:

                                          data = numpy.fromfile('SPLnFFT_2015_07_21.bin', dtype=numpy.dtype('f4'))
                                          

                                          The Python float data type is usually implemented as a double (8 bytes), so this specifies the number of bytes explicitly.

                                          1 Reply Last reply Reply Quote 0
                                          • Phuket2
                                            Phuket2 last edited by

                                            Do issues still exist with byte order on different platforms? I really don't know. A long time ago, we used to have to consider this. Big and little Indien when reading binary/memory files without an API that took care of the translation

                                            1 Reply Last reply Reply Quote 0
                                            • First post
                                              Last post
                                            Powered by NodeBB Forums | Contributors