Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

About this user

http://www.rbs.me.uk

« Newer Snippets
Older Snippets »
Showing 1-1 of 1 total  RSS 

Handling Accented Characters with Python Regular Expressions

[A-z] just isn't good enough!

   1  
   2  import re
   3  string = 'riché'
   4  print string
   5  riché
   6  
   7  richre = re.compile('([A-z]+)')
   8  match = richre.match(string)
   9  print match.groups()
  10  ('rich',)
  11  
  12  richre = re.compile('(\w+)',re.LOCALE)
  13  match = richre.match(string)
  14  print match.groups()
  15  ('rich',)
  16  
  17  richre = re.compile('([é\w]+)')
  18  match = richre.match(string)
  19  print match.groups()
  20  ('rich\xe9',)
  21  
  22  richre = re.compile('([\xe9\w]+)')
  23  match = richre.match(string)
  24  print match.groups()
  25  ('rich\xe9',)
  26  
  27  richre = re.compile('([\xe9-\xf8\w]+)')
  28  match = richre.match(string)
  29  print match.groups()
  30  ('rich\xe9',)
  31  
  32  string = 'richéñ'
  33  match = richre.match(string)
  34  print match.groups()
  35  ('rich\xe9\xf1',)
  36  
  37  richre = re.compile('([\u00E9-\u00F8\w]+)')
  38  print match.groups()
  39  ('rich\xe9\xf1',)
  40  
  41  matched = match.group(1)
  42  print matched
  43  richéñ
  44  
  45  
« Newer Snippets
Older Snippets »
Showing 1-1 of 1 total  RSS