omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Regarding NLTK on Pythonista

    Pythonista
    7
    12
    9178
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • ltddev
      ltddev last edited by

      Hi, I'm new to both Pythonista and to this forum. I'm slightly new to Python, spending the last 15 years or so in the Enterprise Java world. I am very interested in AI in general and computational linguistics/natural language analysis at the hobby or after work interest level.
      I have seen the discussions on/about NLTK in this forum which I too am very interested in having support for. Since it is a pure Python library I thought to try it out in Pythonista. I put together a little test harness to try it out. You can see it or grab it here - NLTK Test script

      The reason I post this (my first post to this forum) here in New Discussion rather than in the Share Code section is that the general discussion about NLTK is here in this section of the forum.

      What I found from my little test script of interest was:

      1 - You can run the NLTK data sets downloader, in non-graphical,commandline/ interactive mode right from Pythonista and that's how I downloaded my data

      2 - You don't need to download all of the hundreds of copora/data, only the sets you are interested in and most are a few megabytes only. The ENTIRE set when unarchived is about 1.5 gig. I hava a 120 gig iPad 4 so this was not really an issue.

      3 - You can put the data sets anywhere you like provided you set the NLTKDATA environment variable to the location of nltk_data. That means even on a non-jailbroken device there should be somewhere you can put them. For my test, I used the Pythonista app itself to locate the data since my device is jailbroken.

      4 - I noticed that I only needed to run a script --that sets the NLTKDATA environment variable -- once. On subsequent times I could comment out that section of my test case and NLTK was able to still find the data. I even shut down the Pythonista process and started it again and ran the script without the explicit setting the variable and it still worked. This leads me to believe that NLTK is persisting the data path somewhere such that it seems feasible to have perhaps a separate script in your library just to set the data path for when you change or add to nltk_data in a new location.

      5 - I tried some other more involved sample scripts that used the same (Brown) data set and I found the load and execute times very reasonable. I did not see any of the 30s load times described elsewhere in the forum.

      6 - Although numpy and scipy -- and a number of others - support clearly extends the utility of NLTK, and we currently can only run pure Python libs, even NLTK itself provides me with the tools I need to construct some serious applications in Pythonista rather than mere toy apps. Numpy and SciPy will be welcome additions however.

      1 Reply Last reply Reply Quote 0
      • omz
        omz last edited by

        I think the iPad 4 wasn't out yet when I experimented with this, so performance on recent devices is probably much better than what I saw.

        I guess it might be interesting to build some sort of NLTK installer script for Pythonista (that perhaps downloads common corpora etc. as well and configures the data path correctly)...

        Aside: While NumPy will be part of the next update, it's very unlikely that I'll be able to get SciPy to work. It contains a lot on Fortran code, and I frankly have no idea how to cross-compile that for iOS...

        1 Reply Last reply Reply Quote 0
        • ltddev
          ltddev last edited by

          NLTK + Numpy would be a great combination for a lot of other general ai besides just nl.

          1 Reply Last reply Reply Quote 0
          • ltddev
            ltddev last edited by

            As far as your comment about "some sort of NLTK installer script for Pythonista", I wonder to what degree the NLTK downloader itself is extensible? It's what, as I said, has a non-graphical version and is, in fact, the way I got the data, specifically the "brown" corpus. See my wrapper for the down-loader at https://gist.github.com/swosnick/10702869

            It's very simple to use but I wonder if it is programmable itself. That way such a method doesn't have to start from scratch, or reduplicate available and open-sourced code. I am investigating that and if I find anything I will report back.

            1 Reply Last reply Reply Quote 0
            • pvanallen
              pvanallen last edited by

              Great work ltddev! And omz, a tested and documented NLTK installer script sounds like a great solution! I think many of us would like NLTK accessible in Pythonista, but understand that it doesn't make sense as a part of the standard install. Please add my vote to a solution like this.

              1 Reply Last reply Reply Quote 0
              • ccc
                ccc last edited by

                So... Who is willing to volunteer to create the github repository (not a gist!) and merge in pull requests so this community can collaborate to build "a tested and documented NLTK installer script"?

                1 Reply Last reply Reply Quote 0
                • Avisual68
                  Avisual68 last edited by

                  Now just to get a nosql db and I could play with data mining on ipad. Anyone tried to get this running CodernityDB

                  1 Reply Last reply Reply Quote 0
                  • ltddev
                    ltddev last edited by

                    As I said, I think a good place to start pulling the data sets or corpora is with the NLTK.downloader module. Once you have downloaded the NLTK module itself and sucked it into Pythonista and can start to use it, the NLTK.downloader module has a fairly rich API to search, list and download selected individual corpora or download logical groupings of corpora. For more information about what I'm getting at see the API doc for the script able downloader here:

                    http://www.nltk.org/api/nltk.html#module-nltk.downloader

                    1 Reply Last reply Reply Quote 0
                    • ltddev
                      ltddev last edited by

                      @Avisual, I have played around with CodernityDB ironically with NLTK. You have challenged my interest to demonstrate NLTK + Corpora + CodernityDB all from Pythonista. I will report back :)

                      1 Reply Last reply Reply Quote 0
                      • ltddev
                        ltddev last edited by

                        @Avisual,

                        It appears straightforward to run CodernityDB on Pythonista because, like NLTK, it is pure Python and in the case of CodernityDB, there are absolutely no 3rd party dependencies. See my test code here, based on one their examples meant to highlight easy support for insert/save/store. It stores 15 objects in a database: https://gist.github.com/swosnick/11065623

                        1 Reply Last reply Reply Quote 0
                        • ihf
                          ihf last edited by

                          Before I attempt to get NLTK working, has anyone already done what @omz mentioned above ("I guess it might be interesting to build some sort of NLTK installer script for Pythonista (that perhaps downloads common corpora etc. as well and configures the data path correctly)...")?

                          1 Reply Last reply Reply Quote 0
                          • karthikmaiya
                            karthikmaiya last edited by

                            On a non-jailbroken device has anyone figured out how to corrextly set the data paths to allow u to download and run brown correctly

                            1 Reply Last reply Reply Quote 0
                            • First post
                              Last post
                            Powered by NodeBB Forums | Contributors