<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DZone Snippets: utf-8 code</title>
    <link>http://snippets.dzone.com/posts</link>
    <pubDate>Sun, 12 Oct 2008 18:13:42 GMT</pubDate>
    <description>DZone Snippets: utf-8 code</description>
    <item>
      <title>Convert cp1252-&gt; utf-8 character set (python and ruby)</title>
      <link>http://snippets.dzone.com/posts/show/5367</link>
      <description>Oooh, I hate character sets. Specifically that there are more than one of them. Here is a Ruby version of a Python script I found to convert cp1252 (aka windows-1252) into utf-8.&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;  def clean_up dirty_text&lt;br /&gt;    newstr = ""&lt;br /&gt;    dirty_text.length.times do |i|&lt;br /&gt;      character = dirty_text[i]&lt;br /&gt;      newstr += if character &lt; 0x80&lt;br /&gt;        character.chr&lt;br /&gt;      elsif character &lt; 0xC0&lt;br /&gt;        "\xC2" + character.chr&lt;br /&gt;      else&lt;br /&gt;        "\xC3" + (character - 64).chr&lt;br /&gt;      end&lt;br /&gt;    end&lt;br /&gt;    newstr&lt;br /&gt;  end&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;The original Python script was (http://miscoranda.com/96):&lt;br /&gt;&lt;br /&gt;&lt;code&gt;&lt;br /&gt;#!/usr/bin/python&lt;br /&gt;import sys&lt;br /&gt;for c in sys.stdin.read(): &lt;br /&gt;   if ord(c) &lt; 0x80: sys.stdout.write(c)&lt;br /&gt;   elif ord(c) &lt; 0xC0: sys.stdout.write('\xC2' + c)&lt;br /&gt;   else: sys.stdout.write('\xC3' + chr(ord(c) - 64))&lt;br /&gt;&lt;/code&gt;</description>
      <pubDate>Wed, 16 Apr 2008 11:39:47 GMT</pubDate>
      <guid>http://snippets.dzone.com/posts/show/5367</guid>
      <author>nicwilliams (Dr Nic Williams)</author>
    </item>
  </channel>
</rss>
