omz:forum

    • Register
    • Login
    • Search
    • Recent
    • Popular

    Welcome!

    This is the community forum for my apps Pythonista and Editorial.

    For individual support questions, you can also send an email. If you have a very short question or just want to say hello — I'm @olemoritz on Twitter.


    Regex oddity

    Pythonista
    2
    4
    2681
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • userista
      userista last edited by

      When trying to use the re module in Pythonista - I get some weird behavior. Specifically the re.sub method doesn't work as documented. Here's my code with sample text. This has been tested in multiple python regex "testers" (e.g. http://regex101.com/r/yP7bA9/1 )

      gist here

      import re
      
      scores = [[u'Orlando 81   Washington 90 (3:55 IN 4TH)'], [u'Atlanta 59   Cleveland 87 (3:51 IN 3RD)'], [u'Utah 62   Toronto 69 (3:59 IN 3RD)'], [u'Indiana 46   Chicago 42 (0:03 IN 2ND)'], [u'Detroit 50   Memphis 51 (0:18 IN 2ND)'], [u'Minnesota 22   Dallas 28 (0:00 IN 1ST)'], [u'Brooklyn at Portland (10:00 PM ET)'], [u'San Antonio at Sacramento (10:00 PM ET)'], [u'Charlotte at Golden State (10:30 PM ET)'], [u'Phoenix at LA Clippers (10:30 PM ET)']]
      
      for score in scores:
          print score
          print re.sub('([a-zA-Z^ ]+?)(\\d+|at)\\s+?([a-zA-Z^ ]+?)(\\d+)?\\s+?(\\(.+\\))\\s+?', 'whatever replacment', score[0])
      

      and the sample text is (there are extra spaces on the end of some lines) - it's an array of arrays:

      Orlando 38   Washington 46 (1:36 IN 2ND) 
      Atlanta 25   Cleveland 37 (0:28 IN 1ST) 
      Utah 25   Toronto 23 (0:00 IN 1ST) 
      Indiana at Chicago (8:00 PM ET) 
      Detroit at Memphis (8:00 PM ET) 
      Minnesota at Dallas (8:30 PM ET) 
      Brooklyn at Portland (10:00 PM ET) 
      San Antonio at Sacramento (10:00 PM ET) 
      Charlotte at Golden State (10:30 PM ET) 
      Phoenix at LA Clippers (10:30 PM ET)
      

      The weird thing is that this seems to work when not in a for loop.....

      1 Reply Last reply Reply Quote 0
      • JonB
        JonB last edited by

        Part of the problem is that your regex101 does not match your gist expression... You are missing a few ?'s.

        In general, it is easier to use raw strings for your expressions, that is, prefixed by an r, since you can paste the expression directly from other tools without needing to escape them.

        Also, personally I find it easier to debug regular expressions first using one of the match or findall methods, building up the expression as I go using implicit string concatenation on multiple lines with comments in each group, E.g

        re.findall( ('([a-zA-Z^ ]+?)'  # first team name, letters, spaces and carrots
                         '(\d+?|at)'.   # either score, or word at
                          .....
                         ), score[0])
        
        

        then you can comment out entire lines to make sure each group works before enabling the next.

        Anyway, Here's your code, all I did was copy the expression from regex101, and pasted it as a raw string. Well, I added a findall printout, and showed how you can use your groups in a sub call.
        Guessing at what you are doing, I suspect sub might not be what you want... I'm thinking findall might be what your really want, which breaks this up into a table.

        for score in scores:
            print re.findall( r'([a-zA-Z^ ]+?)(\d+?|at)\s+([a-zA-Z^ ]+?)(\d+)?\s+?(\(.+\))\s*?',score[0])
            print re.sub( r'([a-zA-Z^ ]+?)(\d+?|at)\s+([a-zA-Z^ ]+?)(\d+)?\s+?(\(.+\))\s*?',r'\1 --- \3',score[0])
        
        1 Reply Last reply Reply Quote 0
        • userista
          userista last edited by

          @JonB
          Yes, thank you!! - the raw string tip really helps - I was getting lost in "escape character hell"

          1 Reply Last reply Reply Quote 0
          • userista
            userista last edited by

            Turns out I was having an issue backreferencing an empty group - see http://bugs.python.org/issue1519638 - so even though re.findall was returning 5 groups - I wasn't able to use re.sub to match/replace all the groups.

            EDIT: I ended up using this workaround - adding an empty sub-group
            http://bugs.python.org/msg69541

            1 Reply Last reply Reply Quote 0
            • First post
              Last post
            Powered by NodeBB Forums | Contributors