UTF-8 compatible String ranges in Ruby
class String def [] (*params) if params.all? { |p| Integer===p } || params.size==1 && Range===params[0] res = self.unpack("U*").[](*params) res = [res] unless Array===res return res.pack("U*") end super end end
12388 users tagging and storing useful source code snippets
Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world
class String def [] (*params) if params.all? { |p| Integer===p } || params.size==1 && Range===params[0] res = self.unpack("U*").[](*params) res = [res] unless Array===res return res.pack("U*") end super end end
It seems that HTTP/Daemon.pm is bogus when it calculate the header 'Content-Length' when the data contains UTF8 data . In attachement a patch , to calculate the length in bytes of the data . -- ____________________________________________________________ / Erwan MAS /\ | mailto:[EMAIL PROTECTED] |_/ ___|________________________________________________________ | \___________________________________________________________\__/ --- Daemon.pm.orig 2004-12-11 16:13:22.000000000 +0100 +++ Daemon.pm 2006-05-02 22:53:33.660393022 +0200 @@ -436,7 +436,7 @@ } } elsif (length($content)) { - $res->header("Content-Length" => length($content)); + $res->header("Content-Length" => bytes::length($content)); } else { $self->force_last_request;
import re string = 'riché' print string riché richre = re.compile('([A-z]+)') match = richre.match(string) print match.groups() ('rich',) richre = re.compile('(\w+)',re.LOCALE) match = richre.match(string) print match.groups() ('rich',) richre = re.compile('([é\w]+)') match = richre.match(string) print match.groups() ('rich\xe9',) richre = re.compile('([\xe9\w]+)') match = richre.match(string) print match.groups() ('rich\xe9',) richre = re.compile('([\xe9-\xf8\w]+)') match = richre.match(string) print match.groups() ('rich\xe9',) string = 'richéñ' match = richre.match(string) print match.groups() ('rich\xe9\xf1',) richre = re.compile('([\u00E9-\u00F8\w]+)') print match.groups() ('rich\xe9\xf1',) matched = match.group(1) print matched richéñ
escape-control-chars: func [
"Convert all control chars in string to \uxxxx format"
s [any-string!] /local ctrl-ch c
][
ctrl-ch: charset [#"^@" - #"^_"]
parse/all s [
any [
mark: copy c ctrl-ch (
change/part mark encode-control-char to char! c 1
) 5 skip
| skip
]
]
s
]
encode-control-char: func [char [char! integer!]] [
join "\u" at to-hex to integer! char 5
]
replace-unicode-escapes: func [s [string!] /local c uc] [
parse s [
any [
some chars
| [mark: #"\"
#"u" copy c 4 hex-c (
change/part mark uc: decode-unicode-char c 6 ; 6 = length "\uxxxx"
) -1 skip :mark
| escaped]
]
]
]
decode-unicode-char: func [val /local c] [
c: to-integer debase/base val 16
rejoin either c < 128 [[to-char c]] [
either c < 2048 [[
to-char (192 or to-integer (c / 64))
to-char (128 or (c and 63))
]] [[
to-char (224 or to-integer (c / 4096))
to-char (128 or ((to-integer (c / 64)) and 63))
to-char (128 or (c and 63))
]]
]
]
def entities( str ) converted = [] str.split(//).collect { |c| converted << ( c[0] > 127 ? "&##{c[0]};" : c ) } converted.join('') end
# -*- coding: utf-8 -*-
private class RefreshAction extends AbstractAction {
private RefreshAction() {
super("Refresh\u2026");
}
public void actionPerformed(ActionEvent e) {
...
}
}
…