Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Binary files read and write
-
In your last script when the data cleanse is set to False the App crashes. By the way, your former script with array and structure options read all 5.2 MB without problem. I think that the problem is related to the numpy library and the memory usage of float data type, since by default it uses only two 8 bytes
-
This version of your script works fine
#coding: utf-8 # cleanse import numpy data = numpy.fromfile('SPLnFFT_2015_07_23.bin', dtype=numpy.float32).reshape(-1, 2) #data = data[numpy.all(data > 0, axis=1)] # cleanse print(type(data), len(data)) # numpy.ndarray, 2786 print(data[:20]) # print first 20 fast, slow pairs # HELP: A scatter plot is all wrong for this dataset import matplotlib.pyplot as plt fig = plt.figure() ax = fig.add_subplot(1, 1, 1) x = data[:,0] y = data[:,1] ax.scatter(x, y) plt.title('SPLnFFT Noise data') plt.show()
-
the problem with the matlab example is that it doesnt seem to have the right data size relative to the actual file structure.
if you know that the datafile always contains exactly 24 hours worth of samples, the numpy equivalent to the matlab is
t=numpy.linspace(0.0,24.0, N)
and of course you would not cleanse!
you would want the ability to zoom, however, which now requires a little more trickery, and probably you would only plot a given time range to keep the number of points down. while matlab can plot a million points, matplotlib on ios is strained!
-
I'm not quiet sure, but I think that what MATLB script makes is simple artihmetic;
Suppose that the file has a total 635 pairs of data points;
635 × 24 = 15240, hence
634 ÷ 15240 = 0.041601049869
Therfore the point intervals woul be = 0.041601049869 times 0,1,2,3,4,5,,..............635After plotting the SLOW and FAST point values just need to put a label in the Time axis from 0 to 24 hours with one hour interval
Hope it helps -
Thanks again JonB ... I am learning.
I updated the code to remove all the cleansing, add the zero hour to 24h x axis label as you suggested, and add elapsed_time(). The script takes about 2min 20sec to produce a full 24h day plot on my iPad.
I am still not satisfied with the x axis that currently starts at -5h (!) and ends at 30h with ticks every 5 hours. My goal is to have it start at 0h and ends at 24h with a tick every hour. My attempts to do so result in crashes so they are commented out.
There is yet another new release (v5.6) of the SPLnFFT app.
-
This link shows how to compute aggregated data (LEQ) from non-zero SPL values for a given time period
-
I sent an algorithm 2 hours ago to compute the X axis values for a full SPLnFTT binary file with no data cleansing I. Try it, but using the values that come in the MATLAB script. It should work. Regards
-
ManuelU, you need to be a little more specific as to what you want in the end. Do you want a single number(average SPL) over 24 hours... ? or plot of 24 hours where the resolution is dropped by, using a moving average, say dropping the resolution down to once per minute, or 10 minutes, and showing average and peak over that 10 minutes?
note that periods where spl is 0 will stay 0 after averaging!
or do you want the original resolution, but the ability to zoom the plot on each region where you actually have data?
or, detect how many recording sessions there were, and show N subplots, with only the data from that session, but showing the correct original timestamp in each subplot?ccc. you want plt.xlim(0,24), and plt.xticks(numpy.arange(0,24))
-
JonB, In my program I have an option to process the whole data file in 1 hour or more chunks or even arbitrary chunks. The processing power for graphics is amazing. You just need to build up a pair of X,Y (SLOW-FAST) arrays and a few instructions to render a graphic output that response to finger gestures like any other professional graphic App for iOS.
A 24 hours overview is crucial to observe the segments that need to be analyzed in detail, for the search of pikes or a repeating pattern through time. By using this program and the SPLnFFT In an iPhone I detected some asymptomatic people with dangerous periods of sleep apnea, that otherwise couldn't have been detected. Many people work without any protection in noisy environments, like ambulance workers that are submitted to dangerous dB acoustic levels with the risk of permanent ear damage.
Respect to average values during a time period, the most common is the LEQ, that is a logarithmic average and you need to filter out zero values. That's easy done just by reading the array that hold the time values and SPL values for a given time period. For serious epidemiological investigation you need to sincronize with other biometric values.
My intention is to use a Holter-like ECG recorder. There are evidences of the relationship between noise and heart coronary disease, but few with the simultaneous recording of noise and ECG. To handle statistics I developed some years ago an App for the Mac OSX that uses Binary Logistic Regression and Survival analysis with parametric regression models for assessing the risk of diseases that might be related to noise. Statistics render numerical data, but sometimes I used the common sense instead of hypothesis contrast methods to take decisions. As you know, no hypothesis can be demonstrated; the most you obtain are only evidences. That can only be achieved with a team and the necessary tools.
One idea in mind is to analyze the noise data in the frequency domain with the FFT methods of the numpy library. This feature and the option to directly import the binary file from Dropbox was what made me chose Pythonista as a supporting App. Thanks for the valuable support from all people in this excellent Forum
-
The best way to learn about the basics of data processing with Pyhtonista for a new user is to read and try the excellent examples posted to this forum by CCC and JonB. The general Python documentation that come with the App is both obscure and almost sample-less
-
There are tons of cool examples to look at in Pythonista-Tools. I often consult @humberry's ui-tutorial when I am stumped with something in the ui module. Ole's gists also make for great reading.
-
CCC, Thanks a lot for this information. Do you know some good and extensive electronic book for KINDLE OR iBook about the Phyton language?. I bought two and became disappointed because of their poor content. Best regards
-
Hi JonB, Ccc,the last update works just fine. I process the total SPL file in chunks of one hour each. The code in standard basic is simple. Just create two vectors with the start and end times you want to process. Something like
for i=1 to 24 timeSTART(i) = (limit * (i - 1)) + 1 timeEND(i) = limit * i
I CREATE an input file and other output file AS BINARY where y save data filtered of NaN and infinite SPL values for further graphic processing. I use the FTPSERVER of Pythonista in one device and a FTP CLIENT in other remote device in the same Wifi local network. I use a FOR NEXT LOOP that goes from timeSTART(I) TO endSTART(I)
.Best regards -
I can not really understand what you wrote. Perhaps edit your text above to put in a few blank lines to make it easier to understand. Your workflow is unclear to me. I think you want to create 24 binary files, one for each hour in the day. You want to filter out the NaN and INF values. You want ftp them from one iOS device to another iOS device. Is there something else that you need? What is the "ask"?
@JonB did create a fast, pinchable matplotlib view https://forum.omz-software.com/topic/2007/matplotlib-pinch-pan-dynamic-view but you need the current Beta version of Pythonista to run it. I don't quite understand it all myself but it is quite cool.
-
SPLnFFT_hourly_split.py takes about 0.3 seconds to read in 1,382,400 float32 values and write them back out to 24 binary files that each contain 57,600 float32 values that represent one hour of that day.
-
With the former BASIC code I only generate the starting and ending data points for a given time interval. Suppose you observe the relevant data is in the time interval that goes from 8 to 14 hours. Then your loop would go from start(8) to end(14). I only save one binary file for that interval to compute aggregated data and graphics, that look Iess cluttered in a shorter interval tha in a full 24 graph plot.
I wish I knew how to send images to the forum, so you would appreciate differences in quality. "limit" is an integer variable of the 57600 data points per hour. Here are the values for 1 to 8 hours, startling in the left and end time in the right
1 55760
55761 111520
111521 167280
167281 223040
223041 278800
278801 334560
334561 390320
390321 446080 -
The numbers at the end of your post are not correct. The correct (zero-based) numbers are generated by:
floats_per_file = 1382400 / 24 print('{} floats per file'.format(floats_per_file)) # 55760 for i in xrange(24): print(i*floats_per_file, (i+1)*floats_per_file - 1)
This code is similar to the code in SPLnFFT_hourly_split.py. Python's span syntax
my_list[start_index : end_index]
means that we do not need to loop over all elements but instead can just directly grab a block of data elements (such as an hour's worth of data) in a single operation. This helps to explain how we can break one binary file of 1m+ floats into 24 binary files in 1/3 of a second.How did you know that 08h00 to 14h00 were the hours of interest?
Is there a mathematical test that I could run to determine which hours have useful data and which do not? Or must this decision be made by a human looking at a graph of the full day. Numpy supports a dizzying number of operations that can be applied to a ndarray if you can tell me what to look for. Alternatively, a human can tell the program which hours (minutes, seconds) are of interest and the file can be divided that way and rewritten as a smaller binary file.
The way to post an image to the forum is to put the image on an accessible web page (GitHub, Dropbox, etc.) and then add
?raw=1
to the end of the publicly accessible URL. Like![](https://www.dropbox.com/s/00e5iyfealnuzxs/my_image.png?raw=1)
. -
When you save raw data from th SPLnFFT app, you can save a 24 hours plot as well. You see the chunks of data as vertical cluttered lines, no vallies at all. This graph can be used as a hint about what time interval deserves to be analyzed in detail. The arrays are one based. Thanks for the info about sending images through the forum. I'll try it.
Regards.
By the way the time used for loading, saving and processing data is about 2 minutes per hour. I have a 10 MB Wifi, but a fiber 300 MB is coming.
-
OK. So there could be a Pythonista UI where the user would specify a start time and an end time. The script could use those times to determine which exactly which floats to copy from the original, full size binary file to the new, smaller binary file. Is that what you want? Do you want the Matplotlib graph too? Is start hour and end hour good enough or would you want to be able to specify minutes too?
I am still unclear why you transfer the files from one iOS device to another iOS device on the local WiFi network. Why not just do all the data capture and visualization on the first iOS device. The speed of the local WiFi network will not be improved by your move to fiber. Unless a new WiFi hub is included in the fiber upgrade.
-
Please try this. I hope I understood your instructions
https://www.dropbox.com/s/00e5iyfealnuzxs/spl_graphplot_test1.png?raw=1