I made this python script for web scraping, it can map a web site in a dictionary for example:
if someone also want to download all files in that web_site only need specify it:
with the variable 'descargar' and set it to True
url = 'https://en.wikipedia.org/wiki/Roy_Clark' descargar = True profundidad = 2 archivo = 'clark.json' s = Scraper() s.lineal(url,profundidad,descargar,archivo)
I want to implement some threading algorithm to speed up the analysis.
it sometimes have an error when downloading files.
@sulcud, would be interesting to know if you have considered the pros and cons of threading vs. asyncio, and why you would pick one or the other.
@mikael I try to implement the async function, really I don’t know if i do it well, now it download all more faster than before and in other hand i also correct the link extraction function because some times (most of the time ☹️) the function only outputs 20-50 urls, now it can extract all or some number near to all of the links in the page, I also make a setup.py file but I truly don’t know if it works.
Now the way to use it is:
from scrapthor import scrap url=“some url” scrap(url)
Please. can you check it?
@mikael WOW with that package the speed increase a lot, thanks, now I know the real power of async programming