omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    ru in re.sub unsupported?

    Pythonista
    4
    6
    1723
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • peiriannydd
      peiriannydd last edited by

      I get a complaint of a syntax error when I try
      replaced = re.sub(ru'anā\B', u'nā', s)
      It is complaining about the ru together. This syntax does work on my Mac. Is there a way this works in Pythonista?

      mikael 1 Reply Last reply Reply Quote 0
      • ccc
        ccc last edited by ccc

        This is about Python 2 vs. Python 3... https://github.com/ymcui/Chinese-ELECTRA/pull/57

        The way to fix this and see if the r"string" == ur"string" or the u"string" == ur"string" and then use that string on both Py2 and Py3.

        #!/usr/bin/env python2
        # -*- coding: utf-8 -*-
        print(ur'anā\B' == r'anā\B')  # True
        print(ur'anā\B' == u'anā\B')  # True
        
        1 Reply Last reply Reply Quote 0
        • peiriannydd
          peiriannydd last edited by

          @ccc I'm afraid I don't follow. I'm using Py3, and neither u'anā\B' nor r'anā\B' catch the expression I need to replace. Is there anything I can do?

          Thank you very much for your time.

          1 Reply Last reply Reply Quote 0
          • ccc
            ccc last edited by

            Perhaps provide 3 strings that you want the expression to match and three similar strings that you don’t want it to match. With that set of test cases, we will see what we can do.

            1 Reply Last reply Reply Quote 0
            • JonB
              JonB last edited by JonB

              re.sub(re.escape('anā\B'),'nā', 'anā\B')
              #seems to work.  or, 
              re.sub('anā\\\\B','nā', 'anā\B')
              #or
              re.sub('anā\\\B','nā', 'anā\B')
              #or
              re.sub(r'anā\\B', 'nā', 'anā\B')
              

              Not sure i fully understand why, but i dont understand unicode at a practical level... -- I guess\Bis not a valid escape code, so must be represented by \\B in fact, if you just type '\B' at the console, and you will see it gets converted for you. But, i guess re doesnt like that -- using the flags=re.DEBUG it complains about a NIN BOUNDARY CHARACTER

              1 Reply Last reply Reply Quote 0
              • mikael
                mikael @peiriannydd last edited by mikael

                @peiriannydd

                This:

                re.sub(r'anā\\B', 'nā', 'anā\\B')
                

                ... is to me the ”stereotypical” approach: use r to avoid too many backslashes in the regexp, and avoid confusing the meaning of regexp backslashes with those used to denote special characters in string literals with those already in a string and having no special meaning whatever.

                Unicode has no impact here, as long as we are all happily using str in Py3.

                To me, the first paragraphs of the official Python docs for re seem to cover all of this nicely.

                1 Reply Last reply Reply Quote 0
                • First post
                  Last post
                Powered by NodeBB Forums | Contributors