Binary files read and write

ManuelU

I sent an algorithm 2 hours ago to compute the X axis values for a full SPLnFTT binary file with no data cleansing I. Try it, but using the values that come in the MATLAB script. It should work. Regards

JonB

ManuelU, you need to be a little more specific as to what you want in the end. Do you want a single number(average SPL) over 24 hours... ? or plot of 24 hours where the resolution is dropped by, using a moving average, say dropping the resolution down to once per minute, or 10 minutes, and showing average and peak over that 10 minutes?
note that periods where spl is 0 will stay 0 after averaging!
or do you want the original resolution, but the ability to zoom the plot on each region where you actually have data?
or, detect how many recording sessions there were, and show N subplots, with only the data from that session, but showing the correct original timestamp in each subplot?

ccc. you want plt.xlim(0,24), and plt.xticks(numpy.arange(0,24))

ManuelU

JonB, In my program I have an option to process the whole data file in 1 hour or more chunks or even arbitrary chunks. The processing power for graphics is amazing. You just need to build up a pair of X,Y (SLOW-FAST) arrays and a few instructions to render a graphic output that response to finger gestures like any other professional graphic App for iOS.

A 24 hours overview is crucial to observe the segments that need to be analyzed in detail, for the search of pikes or a repeating pattern through time. By using this program and the SPLnFFT In an iPhone I detected some asymptomatic people with dangerous periods of sleep apnea, that otherwise couldn't have been detected. Many people work without any protection in noisy environments, like ambulance workers that are submitted to dangerous dB acoustic levels with the risk of permanent ear damage.

Respect to average values during a time period, the most common is the LEQ, that is a logarithmic average and you need to filter out zero values. That's easy done just by reading the array that hold the time values and SPL values for a given time period. For serious epidemiological investigation you need to sincronize with other biometric values.

My intention is to use a Holter-like ECG recorder. There are evidences of the relationship between noise and heart coronary disease, but few with the simultaneous recording of noise and ECG. To handle statistics I developed some years ago an App for the Mac OSX that uses Binary Logistic Regression and Survival analysis with parametric regression models for assessing the risk of diseases that might be related to noise. Statistics render numerical data, but sometimes I used the common sense instead of hypothesis contrast methods to take decisions. As you know, no hypothesis can be demonstrated; the most you obtain are only evidences. That can only be achieved with a team and the necessary tools.

One idea in mind is to analyze the noise data in the frequency domain with the FFT methods of the numpy library. This feature and the option to directly import the binary file from Dropbox was what made me chose Pythonista as a supporting App. Thanks for the valuable support from all people in this excellent Forum

ManuelU

The best way to learn about the basics of data processing with Pyhtonista for a new user is to read and try the excellent examples posted to this forum by CCC and JonB. The general Python documentation that come with the App is both obscure and almost sample-less

ccc

There are tons of cool examples to look at in Pythonista-Tools. I often consult @humberry's ui-tutorial when I am stumped with something in the ui module. Ole's gists also make for great reading.

ManuelU

CCC, Thanks a lot for this information. Do you know some good and extensive electronic book for KINDLE OR iBook about the Phyton language?. I bought two and became disappointed because of their poor content. Best regards

ManuelU

Hi JonB, Ccc,the last update works just fine. I process the total SPL file in chunks of one hour each. The code in standard basic is simple. Just create two vectors with the start and end times you want to process. Something like

for i=1 to 24
    timeSTART(i) = (limit * (i - 1)) + 1
    timeEND(i) = limit * i

I CREATE an input file and other output file AS BINARY where y save data filtered of NaN and infinite SPL values for further graphic processing. I use the FTPSERVER of Pythonista in one device and a FTP CLIENT in other remote device in the same Wifi local network. I use a FOR NEXT LOOP that goes from timeSTART(I) TO endSTART(I)
.Best regards

ccc

I can not really understand what you wrote. Perhaps edit your text above to put in a few blank lines to make it easier to understand. Your workflow is unclear to me. I think you want to create 24 binary files, one for each hour in the day. You want to filter out the NaN and INF values. You want ftp them from one iOS device to another iOS device. Is there something else that you need? What is the "ask"?

@JonB did create a fast, pinchable matplotlib view https://forum.omz-software.com/topic/2007/matplotlib-pinch-pan-dynamic-view but you need the current Beta version of Pythonista to run it. I don't quite understand it all myself but it is quite cool.

ccc

SPLnFFT_hourly_split.py takes about 0.3 seconds to read in 1,382,400 float32 values and write them back out to 24 binary files that each contain 57,600 float32 values that represent one hour of that day.

ManuelU

With the former BASIC code I only generate the starting and ending data points for a given time interval. Suppose you observe the relevant data is in the time interval that goes from 8 to 14 hours. Then your loop would go from start(8) to end(14). I only save one binary file for that interval to compute aggregated data and graphics, that look Iess cluttered in a shorter interval tha in a full 24 graph plot.

I wish I knew how to send images to the forum, so you would appreciate differences in quality. "limit" is an integer variable of the 57600 data points per hour. Here are the values for 1 to 8 hours, startling in the left and end time in the right

1 55760
55761 111520
111521 167280
167281 223040
223041 278800
278801 334560
334561 390320
390321 446080

ccc

The numbers at the end of your post are not correct. The correct (zero-based) numbers are generated by:

floats_per_file = 1382400 / 24
print('{} floats per file'.format(floats_per_file))  #  55760
for i in xrange(24):
    print(i*floats_per_file, (i+1)*floats_per_file - 1)

This code is similar to the code in SPLnFFT_hourly_split.py. Python's span syntax my_list[start_index : end_index] means that we do not need to loop over all elements but instead can just directly grab a block of data elements (such as an hour's worth of data) in a single operation. This helps to explain how we can break one binary file of 1m+ floats into 24 binary files in 1/3 of a second.

How did you know that 08h00 to 14h00 were the hours of interest?

Is there a mathematical test that I could run to determine which hours have useful data and which do not? Or must this decision be made by a human looking at a graph of the full day. Numpy supports a dizzying number of operations that can be applied to a ndarray if you can tell me what to look for. Alternatively, a human can tell the program which hours (minutes, seconds) are of interest and the file can be divided that way and rewritten as a smaller binary file.

The way to post an image to the forum is to put the image on an accessible web page (GitHub, Dropbox, etc.) and then add ?raw=1 to the end of the publicly accessible URL. Like ![](https://www.dropbox.com/s/00e5iyfealnuzxs/my_image.png?raw=1).

ManuelU

When you save raw data from th SPLnFFT app, you can save a 24 hours plot as well. You see the chunks of data as vertical cluttered lines, no vallies at all. This graph can be used as a hint about what time interval deserves to be analyzed in detail. The arrays are one based. Thanks for the info about sending images through the forum. I'll try it.

Regards.

By the way the time used for loading, saving and processing data is about 2 minutes per hour. I have a 10 MB Wifi, but a fiber 300 MB is coming.

ccc

OK. So there could be a Pythonista UI where the user would specify a start time and an end time. The script could use those times to determine which exactly which floats to copy from the original, full size binary file to the new, smaller binary file. Is that what you want? Do you want the Matplotlib graph too? Is start hour and end hour good enough or would you want to be able to specify minutes too?

I am still unclear why you transfer the files from one iOS device to another iOS device on the local WiFi network. Why not just do all the data capture and visualization on the first iOS device. The speed of the local WiFi network will not be improved by your move to fiber. Unless a new WiFi hub is included in the fiber upgrade.

ManuelU

Please try this. I hope I understood your instructions
https://www.dropbox.com/s/00e5iyfealnuzxs/spl_graphplot_test1.png?raw=1

ManuelU

It didn't work out. It gives a 404 error message. I wonder if I should save in a specific folder in Dropbox.
Respect to your question, it's simple. The Pythonista interpreter is the only App that can import from Dropbox the Binary Files saved by the SPLnFFT App. To avoid the obsolete iTunes File Sharing, that needs a desktop computer, cable conextions, etc., I use your FTP server to upload the files from the Pythonista''s sandbox to my BASIC interpreter sandbox. I'm pretty stone-headed and y don't understand why Apple don't allow the easiest way as the Open in ... Option available in other programs.

ccc

What is the name of the Basic Interpreter app that you use?

ManuelU

TechBASIC

ccc

SPLnFFT_strip.py removes any hours which only contain (0, 0) values from the start and end of a SPLnFFT.bin file. Much like ' 1 2 3 '.strip() returns '1 2 3'. It finds the first hour that has sound and the last hour that has sound and writes a new file that only has the data between those hours. This will result in smaller file sizes which should reduce the FTP transfer time and provide more focused plots/graphs.

ManuelU

OPEN "test_tr.bin" FOR OUTPUT AS #2
REM BEGIN THE FOR - NEXT LOOP TO READ HOUR CHUNKS
count = 1
startm = timeSTART(1)
endtm = timeEND(1)
PRINT "START DATA POINT "startm
PRINT "END DATA POINT "endtm
REM LOOPS UNTIL THE START DATAPOINT IS FOUND. NO WAY TO ACCESS THERE DIRECTLY
FOR k = 1 to endtm
GET #1,,a
IF k < startm THEN
    GOTO 200 REM LOOP
END IF
REM ***LOOK FOR INFINITE AN NAN SPL VALUES
isinf# = a
isnan# = a
i = math.isInf(isinf#)
j = math.IsNaN(isnan#)
IF (i <> 0) OR (j = 1) THEN
    PUT #2,,v REM dB VALUE*v = 33.33333 WHEN NAN OR INFINITES ARE DETECTED
badspl = badspl  + 1
GOTO 100

This is the BASIC code I use To clean up NAN and Infinites. This prevents you, if you clean up in block, to change the order of SLOW and FAST values when you read data in one column and then you transform to a 2D matrix
In en second time and before transforming the sequential values to a 2D matrix I look for the flag value 33.333333 and replace it with the logarithm average of the preceding 10 SPL values if n > 10 or the next when n is <= 10

ccc

Did you run SPLnFFT_strip.py? Does that do something useful for you or not?

Do you still get INFs and NaNs in files generated by the current SPLnFFT app? Can you please add an INF_counter and a NaN_counter to your basic program and tell me how many of each that you are seeing in files generated by the current SPLnFFT app?