DZone Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Snippets has posted 5883 posts at DZone. View Full User Profile

RegEx Calculator Tokenizer

07.14.2009
| 6584 views |
  • submit to reddit
        Tokenizes a calculator expression string returning a list of tokens.
For example,  "x = (5.0 - 3) / 4^2" 
return  ['x', '=', '(', 5.0, '-', 3, ')', '/', 4, '^', 2] 

This could then be parsed to a tree.

#!/usr/bin/env python
'''
Use regex to tokenize a string expression.

adapted from:
http://effbot.org/zone/xml-scanner.htm
'''
import re

reg_token = re.compile(r"""
    \s*                 #skip whitespace
    ([0-9\.]+|          #one or more digits or '.'
                         aka floats or ints
    \w+|                #words
    [+\-*/!^%&|]{1,2}|  #operators
    .)                  #any character except newline
    """, 
    re.VERBOSE)

def tokenize(expr):
    ''' 
    Returns a list of tokens for an expression string.
    Allows operators +-*/!^%&| 
    Treats doubled operator e.g., **, ++ as single token
    ''' 
    def v_token(obj):
        try:
            if '.' in obj:
                return float(obj)
            else:
                return int(obj)
        except:
            return obj 
        
    return [v_token(tkn.group()) for tkn 
                        in reg_token.finditer(expr)]