Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Web scraping
-
I made this python script for web scraping, it can map a web site in a dictionary for example:
{url:[url_type,{url1_in_url:[url1_in_url_type,{...}],url2_inurl:[...],...}]}
if someone also want to download all files in that web_site only need specify it:
with the variable 'descargar' and set it to Trueurl = 'https://en.wikipedia.org/wiki/Roy_Clark' descargar = True profundidad = 2 archivo = 'clark.json' s = Scraper() s.lineal(url,profundidad,descargar,archivo)
I want to implement some threading algorithm to speed up the analysis.
Note:it sometimes have an error when downloading files.
-
@sulcud, would be interesting to know if you have considered the pros and cons of threading vs. asyncio, and why you would pick one or the other.
-
@mikael I try to implement the async function, really I don’t know if i do it well, now it download all more faster than before and in other hand i also correct the link extraction function because some times (most of the time ☹️) the function only outputs 20-50 urls, now it can extract all or some number near to all of the links in the page, I also make a setup.py file but I truly don’t know if it works.
Now the way to use it is:
from scrapthor import scrap url=“some url” scrap(url)
Please. can you check it?
-
-
@mikael WOW with that package the speed increase a lot, thanks, now I know the real power of async programming