Skip to content Skip to sidebar Skip to footer

Python Equivalent Of Ruby's Stringscanner?

Is there a python class equivalent to ruby's StringScanner class? I Could hack something together, but i don't want to reinvent the wheel if this already exists.

Solution 1:

Interestingly there's an undocumented Scanner class in the re module:

import re

defs_ident(scanner, token): return token
defs_operator(scanner, token): return"op%s" % token
defs_float(scanner, token): returnfloat(token)
defs_int(scanner, token): returnint(token)

scanner = re.Scanner([
    (r"[a-zA-Z_]\w*", s_ident),
    (r"\d+\.\d*", s_float),
    (r"\d+", s_int),
    (r"=|\+|-|\*|/", s_operator),
    (r"\s+", None),

print scanner.scan("sum = 3*foo + 312.50 + bar")

Following the discussion it looks like it was left in as experimental code/a starting point for others.

Solution 2:

There is nothing exactly like Ruby's StringScanner in Python. It is of course easy to put something together:

import re

    def__init__(self, s):
        self.s = s
        self.offset = 0defeos(self):
        return self.offset == len(self.s)
    defscan(self, pattern, flags=0):
        ifisinstance(pattern, basestring):
            pattern = re.compile(pattern, flags)
        match = pattern.match(self.s, self.offset)
        if match isnotNone:
            self.offset = match.end()

along with an example of using it interactively

>>>s = Scanner("Hello there!")>>>s.scan(r"\w+") 
' '

However, for the work I do I tend to just write those regular expressions in one go and use groups to extract the needed fields. Or for something more complicated I would write a one-off tokenizer or look to PyParsing or PLY to tokenize for me. I don't see myself using something like StringScanner.

Solution 3:

Solution 4:

Seems a more maintained and feature complete solution. But it uses oniguruma directly.

Solution 5:

Maybe look into the built in module tokenize. It looks like you can pass a string into it using the StringIO module.

Post a Comment for "Python Equivalent Of Ruby's Stringscanner?"