Webpage Slices are Different from what is There
I downloaded a webpage successfuly which looks correct in content. When printing slices the characters are different.
Specifically, when I print the first character it shows the first plus the next 3 characters and an apostrophe on the end. So one character becomes 5 characters.
On printing longer slices of the webpage the number of characters is also greater and the apostrophe is always added on the end.
What is happening?
@TomD You can see the string is between b' '
And characters with \ are not printable: ex: \n = next line
Thus b'\n' is only one character "next line "
Thanks CVP. That has me onto something.
I am data scraping. Maybe better off using a package like beautifulsoup?
@TomD try this
st = tda.decode('utf8') print(st)
And you will see that there are empty lines at begin, which are \n
It doesn't like
@TomD try this script
import urllib.request with urllib.request.urlopen("https://www.asx.com.au/asx/statistics/todayAnns.do") as response: tda=response.read() st = tda.decode('utf8') print(st)
I see so I could work on that utf8 more easily
@TomD st contains a string, thus yes, good luck
I much appreciate. You have helped me around an obstacle