def clean_up dirty_text newstr = "" dirty_text.length.times do |i| character = dirty_text[i] newstr += if character < 0x80 character.chr elsif character < 0xC0 "\xC2" + character.chr else "\xC3" + (character - 64).chr end end newstr end
The original Python script was (http://miscoranda.com/96):
#!/usr/bin/python import sys for c in sys.stdin.read(): if ord(c) < 0x80: sys.stdout.write(c) elif ord(c) < 0xC0: sys.stdout.write('\xC2' + c) else: sys.stdout.write('\xC3' + chr(ord(c) - 64))
for Ruby 1.8 & Ruby 1.9 cross-compatibility?
If not, dirty_text.each_byte do |i| takes 60% less time. (That said, in this case I think you'd maybe always want to go per byte rather than character due to the conversion.)