Welcome!
This is the community forum for my apps Pythonista and Editorial.
For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.
Webpage Slices are Different from what is There
-
CVP, so no printed string slice takes the html one character at a time. It combines them into groups and adds apostrophes.
-
@TomD You can see the string is between b' '
And characters with \ are not printable: ex: \n = next line
Thus b'\n' is only one character "next line " -
Thanks CVP. That has me onto something.
I am data scraping. Maybe better off using a package like beautifulsoup? -
@TomD try this
st = tda.decode('utf8') print(st)
And you will see that there are empty lines at begin, which are \n
-
It doesn't like
print (st) -
@TomD try this script
import urllib.request with urllib.request.urlopen("https://www.asx.com.au/asx/statistics/todayAnns.do") as response: tda=response.read() st = tda.decode('utf8') print(st)
-
I see so I could work on that utf8 more easily
-
@TomD st contains a string, thus yes, good luck
-
I much appreciate. You have helped me around an obstacle
-
@TomD, definitely recommend using BeautifulSoup or webview with Javascript. Latter especially if you are trying to scrape pages with dynamic content.