omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Get webpage content

    Pythonista
    4
    4
    2396
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Drizzel
      Drizzel last edited by Drizzel

      I’m trying to get the website content of this website, it simply shows the lessons (I’m still in school) that aren’t going to take place.
      When I first manually log into here , and then manually open the previously mentioned website, I get some usable source code.

      But, if I then close Safari, reopen it, and repeat these steps without logging in, there is no source code whatsoever.

      I didn’t manage to first login first with requests and then scrape the content of the other website, but I’m confident it’s possible. How could I do that?

      mikael 1 Reply Last reply Reply Quote 0
      • eddo888
        eddo888 last edited by

        two excellent modules to use are

        • requests , to retrieve html content
        • beautiful soup (bs4), to parse html content

        you can load these with StaSh and use "pip install requests bs4"

        1 Reply Last reply Reply Quote 1
        • JonB
          JonB last edited by

          bs4 and requests come preinstalled, no reason to update, which usually only causes issues

          1 Reply Last reply Reply Quote 1
          • mikael
            mikael @Drizzel last edited by

            @Drizzel, some sites are use so much JS that they are hard to scrape with just requests and bs4. If this is the case here, you can use WebView to act as a browser and run the JS. I have a small helper class for this, discussed here.

            In all cases, web scraping seems to be a lot of detective work, trial and error.

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Powered by NodeBB Forums | Contributors