Python Equivalent Of Ruby's Stringscanner?

April 21, 2024 Post a Comment

Is there a python class equivalent to ruby's StringScanner class? I Could hack something together, but i don't want to reinvent the wheel if this already exists.

Solution 1:

Interestingly there's an undocumented Scanner class in the re module:

import re

defs_ident(scanner, token): return token
defs_operator(scanner, token): return"op%s" % token
defs_float(scanner, token): returnfloat(token)
defs_int(scanner, token): returnint(token)

scanner = re.Scanner([
    (r"[a-zA-Z_]\w*", s_ident),
    (r"\d+\.\d*", s_float),
    (r"\d+", s_int),
    (r"=|\+|-|\*|/", s_operator),
    (r"\s+", None),
    ])

print scanner.scan("sum = 3*foo + 312.50 + bar")

Following the discussion it looks like it was left in as experimental code/a starting point for others.

Solution 2:

There is nothing exactly like Ruby's StringScanner in Python. It is of course easy to put something together:

import re

classScanner(object):
    def__init__(self, s):
        self.s = s
        self.offset = 0defeos(self):
        return self.offset == len(self.s)
    defscan(self, pattern, flags=0):
        ifisinstance(pattern, basestring):
            pattern = re.compile(pattern, flags)
        match = pattern.match(self.s, self.offset)
        if match isnotNone:
            self.offset = match.end()
            return match.group(0)
        returnNone

along with an example of using it interactively

>>>s = Scanner("Hello there!")>>>s.scan(r"\w+") 
'Hello'
>>>s.scan(r"\s+") 
' '
>>>s.scan(r"\w+")
'there'
>>>s.eos()
False
>>>s.scan(r".*")
'!'
>>>s.eos()
True
>>>

However, for the work I do I tend to just write those regular expressions in one go and use groups to extract the needed fields. Or for something more complicated I would write a one-off tokenizer or look to PyParsing or PLY to tokenize for me. I don't see myself using something like StringScanner.