Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
BeautifulSoup Bug
-
I have a script that is running perfectly on my Mac, but giving me an error in Pythonista.
BeautifulSoup is throwing an
AttributeError: 'NoneType' object has no attribute 'next_element'
on finding all data points in an HTML table:soup.find('table').find_all('td')
.I can verify that
soup
appears correct and has thetd
that I'm looking for. I can printsoup.find('table')
in the console and it is correct. I can break it down totable = soup.find('table'); table.find_all('td');
and it still doesn't work. I've tried changing to the old.findAll
instead of.find_all
and that doesn't work either.In fact, even
soup.find('table').find('td')
works correctly, but gives the error when changing.find('td')
to.find_all('td')
.find_all seems to work in some contexts, e.g. `bs4.BeautifulSoup(requests.get('http://omz-software.com').content).find('p').find_all('a') seems to work fine.
I can verify the identical code (synced by Dropbox) works fine on Python 2.7.8 in OS X.
Has anyone run into this?
-
Have you tried saving off the soup and trying on your OSX? The user agent might be different, so you might be comparing different soups.
Also, it is possible that bs4 is an older version on pythonista.
-
Thanks for the response.
Same version on both.
$ python -c 'import bs4; print(bs4.__version__)' 4.3.2
I think the thing that seals it as a bug is that
soup.find('table').find('td')
works, butsoup.find('table').find_all('td')
throws an error, on the samesoup
object. -
I'd make sure you are getting the same web page that you are using for your soup. Maybe you are pulling a mobile version on your ipad. I think this is what JonB means by a different user-agent.
-
I understand what he means, and I'll check, but I don't think that would explain in any way why
.find('td')
would have a result but.find_all('td')
would cause an error. It wouldn't even make sense if it came up empty (it should at least find the result that.find()
found), but it should definitely not cause an error. -
As suspected, I wrote
html
from Pythonista to apickle
file, loaded it on OS X, converted tosoup
, and had no problem usingfind_all('td')
on OS X.I also used
difflib
to inspect the differences between HTML content of the Pythonista file and that downloaded on OS X , and as far as I can tell the only differences are timestamps (as the content was downloaded minutes apart). -
Thats really odd, I've been using bs4 for awhile, no issues. Can you setup a gist of the page and I'll try?
-
Unfortunately, I would have done that already except that it's a password-protected site I use for work. I haven't been able to replicate yet on a couple of other sites, but I'll try to find a public site that has the same bug.
-
Getting all
td
from thetable
at w3schools.com/html/html_tables.asp works fine.Here's my traceback.
2014-10-27 13:12:43 /var/mobile/Containers/Data/Application/3664C317-2455-4F95-AFC5-EAF05BC6B8BF/Documents/scratchpad.py :: __main__ ERROR There was an error. 2014-10-27 13:12:44 /var/mobile/Containers/Data/Application/3664C317-2455-4F95-AFC5-EAF05BC6B8BF/Documents/scratchpad.py :: __main__ ERROR 'NoneType' object has no attribute 'next_element' Traceback (most recent call last): File "/var/mobile/Containers/Data/Application/3664C317-2455-4F95-AFC5-EAF05BC6B8BF/Documents/scratchpad.py", line 28, in <module> print(len(table.find('td').find_all('td'))) File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 1180, in find_all return self._find_all(name, attrs, text, limit, generator, **kwargs) File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 497, in _find_all return ResultSet(strainer, result) File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 1610, in __init__ super(ResultSet, self).__init__(result) File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 494, in <genexpr> result = (element for element in generator File "/private/var/mobile/Containers/Bundle/Application/B8E731EE-F2DC-466D-BEEA-D2EF5E76AEAC/Pythonista.app/pylib/site-packages/bs4/element.py", line 1198, in descendants current = current.next_element AttributeError: 'NoneType' object has no attribute 'next_element'
-
Is it possible for you to "sanitize" the html so it is no longer contains any work info? I.e just strip out text and replace with random text?
Have you tried pickling the soup itself? (Mmmm pickle soup) either going from OSX to pythonista, or vice versa?