Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

About this user

http://www.rbs.me.uk

« Newer Snippets
Older Snippets »
Showing 1-1 of 1 total  RSS 

Handling Accented Characters with Python Regular Expressions

[A-z] just isn't good enough!

import re
string = 'riché'
print string
riché

richre = re.compile('([A-z]+)')
match = richre.match(string)
print match.groups()
('rich',)

richre = re.compile('(\w+)',re.LOCALE)
match = richre.match(string)
print match.groups()
('rich',)

richre = re.compile('([é\w]+)')
match = richre.match(string)
print match.groups()
('rich\xe9',)

richre = re.compile('([\xe9\w]+)')
match = richre.match(string)
print match.groups()
('rich\xe9',)

richre = re.compile('([\xe9-\xf8\w]+)')
match = richre.match(string)
print match.groups()
('rich\xe9',)

string = 'richéñ'
match = richre.match(string)
print match.groups()
('rich\xe9\xf1',)

richre = re.compile('([\u00E9-\u00F8\w]+)')
print match.groups()
('rich\xe9\xf1',)

matched = match.group(1)
print matched
richéñ


« Newer Snippets
Older Snippets »
Showing 1-1 of 1 total  RSS