Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

« Newer Snippets
Older Snippets »
Showing 1-1 of 1 total  RSS 

Convert cp1252-> utf-8 character set (python and ruby)

Oooh, I hate character sets. Specifically that there are more than one of them. Here is a Ruby version of a Python script I found to convert cp1252 (aka windows-1252) into utf-8.

  def clean_up dirty_text
    newstr = ""
    dirty_text.length.times do |i|
      character = dirty_text[i]
      newstr += if character < 0x80
        character.chr
      elsif character < 0xC0
        "\xC2" + character.chr
      else
        "\xC3" + (character - 64).chr
      end
    end
    newstr
  end


The original Python script was (http://miscoranda.com/96):

#!/usr/bin/python
import sys
for c in sys.stdin.read(): 
   if ord(c) < 0x80: sys.stdout.write(c)
   elif ord(c) < 0xC0: sys.stdout.write('\xC2' + c)
   else: sys.stdout.write('\xC3' + chr(ord(c) - 64))
« Newer Snippets
Older Snippets »
Showing 1-1 of 1 total  RSS