Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Web scraping script not working
-
The following script returns the current weather model from the url in the the script and works fine on my mac however I'm getting socket errors on pythonista3
from bs4 import BeautifulSoup as bs import urllib import re import urllib2 as ul import html5lib base_url = 'https://weather.cod.edu/forecast/menus/gfs_menu.php?type=2018110518-AK-700-spd-0-0' all_str = '' def extract_names(filename, td): text = re.findall(td,str(filename)) data_list = [] for line in text: data_list.append(line) return data_list source = ul.urlopen(base_url).read() tree = bs(source, 'html5lib') filename = tree.find_all('td') i = '\*' td = re.compile(r'(\d+)Z' + i) line_str = extract_names(filename, td) line_str = str(line_str).strip('[]') line_str = str(line_str).strip("'") #all_str = all_str + line_str + '\n\n' print line_str
I have used python 2.7 and tried python 3.6 with changing line 17 to :
source = urllib.request.urlopen(base_url).read()
I'm just starting on pythonista and trying to adapt some scripts that scrape weather sites for information that will be used to build custom url's to access the weather sites.
Any help is appreciated,
Thanks - Jerry
-
@cubflier, intermittent or consistent errors?
In any case, I would suggest using the
requests
module, comes standard with Pythonista, and usually works with no fuss. -
To use requests, you would use:
source = requests.get(base_url).contents
-
Also, stupid questions, but is your network available? If on cellular, you would need to ensure data is on, and pythonista is authorized to use data.
-
Requests worked.
Changed code to:
from bs4 import BeautifulSoup as bs import requests import re base_url = 'https://weather.cod.edu/forecast/menus/gfs_menu.php?type=2018110518-AK-700-spd-0-0' all_str = '' def extract_names(filename, td): text = re.findall(td,str(filename)) data_list = [] for line in text: data_list.append(line) return data_list source = bs(requests.get(base_url).text) filename = source.find_all('td') i = '\*' td = re.compile(r'(\d+)Z' + i) line_str = extract_names(filename, td) line_str = str(line_str).strip('[]') line_str = str(line_str).strip("'") print(line_str)
Network was on and available. I'm still not sure why the old code that has worked in all other application platforms(linux and mac) failed but then again my skills in python are minimal.
It now returns a two digit number for the current weather model that I need to go forward.
I sure do appreciate the help.
Thanks - Jerry
-
Were you getting SSL: CERTIFICATE_VERIFY_FAILED socket errors?
I recall some issues with the version of OpenSSL used by pythonista, either not supporting all of the latest protocols, or uses an old set of root ca's, or is otherwise unable to validate certificates.
I think the default setting in requests essentially ignores these issues. -
Yes - that was the error that I was getting with the original code. The error was consistent and the code did not execute.
Jerry