omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Webpage Slices are Different from what is There

    Pythonista
    webpage slices re different fr m what is there
    3
    14
    4462
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • TomD
      TomD last edited by

      CVP, so no printed string slice takes the html one character at a time. It combines them into groups and adds apostrophes.

      cvp 1 Reply Last reply Reply Quote 0
      • cvp
        cvp @TomD last edited by

        @TomD You can see the string is between b' '
        And characters with \ are not printable: ex: \n = next line
        Thus b'\n' is only one character "next line "

        1 Reply Last reply Reply Quote 0
        • TomD
          TomD last edited by

          Thanks CVP. That has me onto something.
          I am data scraping. Maybe better off using a package like beautifulsoup?

          cvp 1 Reply Last reply Reply Quote 0
          • cvp
            cvp @TomD last edited by

            @TomD try this

            st = tda.decode('utf8')
            print(st)
            

            And you will see that there are empty lines at begin, which are \n

            1 Reply Last reply Reply Quote 0
            • TomD
              TomD last edited by

              It doesn't like
              print (st)

              cvp 1 Reply Last reply Reply Quote 0
              • cvp
                cvp @TomD last edited by

                @TomD try this script

                import urllib.request
                with urllib.request.urlopen("https://www.asx.com.au/asx/statistics/todayAnns.do") as response:
                	tda=response.read()
                st = tda.decode('utf8')
                print(st)
                
                1 Reply Last reply Reply Quote 0
                • TomD
                  TomD last edited by

                  I see so I could work on that utf8 more easily

                  cvp 1 Reply Last reply Reply Quote 0
                  • cvp
                    cvp @TomD last edited by

                    @TomD st contains a string, thus yes, good luck

                    1 Reply Last reply Reply Quote 0
                    • TomD
                      TomD last edited by

                      I much appreciate. You have helped me around an obstacle

                      mikael 1 Reply Last reply Reply Quote 0
                      • mikael
                        mikael @TomD last edited by

                        @TomD, definitely recommend using BeautifulSoup or webview with Javascript. Latter especially if you are trying to scrape pages with dynamic content.

                        1 Reply Last reply Reply Quote 0
                        • First post
                          Last post
                        Powered by NodeBB Forums | Contributors