Data analysis workflow
Its clear Pandas, Scipy and other data modules are not available in Pythonista. But, Numpy, Matplotlib and other python only modules are available. So are feedparser and beautiful soup for web scaping. I've parts of nltk to work as well.
I use the Anaconda stack with pandas, numpy, scipy, keras, tensorflow, nltk etc etc. on desktop with Jupyter as my playground. But, havent tried much in Pythonista since am not sure if i will miss the modules not available in Pythonista too much, or perhaps thats just me.
Curious how many people do data analysis in pythonista, which modules they use, their workflow and examples of what kind of analysis they do. Thanks in advance!
I’m new about python, I used (and use) fortran 90, so I’ve no depth programming knowledge. I’m still learning python that is my choice among all programming languages that I know (the main reason is Pythonista that I discovered some times ago in Apple Store since I have an iPhone).
In spare time I like to code mainly:
-) data analysis scripts (eg rolling average of different types of data or optimization of a function or some functions with or without constraints),
-) simple physics processes scripts, math (symbolic and numeric) processing (eg plotting graphs of parametric supposed-not-infinite summations that make me curious),
-) simple interactive plots (ie with the feature to run indefinitely until user’s stop and the feature to allow user to change some parameters of the plotting function or constrained functions without aborting the calculation).
I’m trying to improve my python programming knowledge with the following python libraries (not pure python):
I’m learning to use these libraries with a Miniconda distribution and the PyScripter IDE (I use Windows PC).
I’d like if these libraries worked with Pythonista, but unfortunately they are not full pure python due to, I think, calculation speed reasons (that is the developers of these libraries decided to use existing Fortran and/or C libraries, for some tasks or calculations, optimized for some platforms and they added them to their python libs).
I’d like if python users could use different versions of all existing not pure python libraries (like Scipy or Sfepy) in order to use them without tedious compilation processes for a specific platform.
For example, It would be nice if I could download an official modified version of Scipy (created by Scipy developers) that uses only pure python libraries.
That is: the not-pure python internal libraries are replaced by pure python scripts that perform the same tasks. For example a nonlinear numerical optimization algorithm written in Fortran and implemented in Scipy could be replaced by an identical pure python algorithm: I think that for standard numerical simulations (eg for hobby) the speed decrease would be minimal, thinking of the hardware we find today…). If user must do calculations with a great number of data, so he would be invited to use a compiled version and a specific platform, but in the meantime he could do tests and experiments on the fly with a small device (phone).
In conclusion: I'm still studying to find a simple, easy-to-use solution (set of pure python scripts) that can be used on any platform with python (computers, any smartphone) to use it as a computing language for any tasks.
In the absence of
pandas, I previously used
agateto do some basic analysis. The gist below shows an example of creating a test dataset of employee data, constructing a data table and assigning each record to a random group according to a specific frequency distribution. A summary table is then produced to provide some descriptive salary statistics.
This was a start for a Monte Carlo simulation model. I completed the rest of the project using
pandason my Mac