Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Get webpage content
-
I’m trying to get the website content of this website, it simply shows the lessons (I’m still in school) that aren’t going to take place.
When I first manually log into here , and then manually open the previously mentioned website, I get some usable source code.But, if I then close Safari, reopen it, and repeat these steps without logging in, there is no source code whatsoever.
I didn’t manage to first login first with requests and then scrape the content of the other website, but I’m confident it’s possible. How could I do that?
-
two excellent modules to use are
- requests , to retrieve html content
- beautiful soup (bs4), to parse html content
you can load these with StaSh and use "pip install requests bs4"
-
bs4 and requests come preinstalled, no reason to update, which usually only causes issues
-
@Drizzel, some sites are use so much JS that they are hard to scrape with just requests and bs4. If this is the case here, you can use WebView to act as a browser and run the JS. I have a small helper class for this, discussed here.
In all cases, web scraping seems to be a lot of detective work, trial and error.