Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

« Newer Snippets
Older Snippets »
Showing 11-20 of 21 total

UTF-8 compatible String ranges in Ruby

As found at http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/123935

class String
        def [] (*params)
                if params.all? { |p| Integer===p } ||
                   params.size==1 && Range===params[0]
                        res = self.unpack("U*").[](*params)
                        res = [res] unless Array===res
                        return res.pack("U*")
                end
                super
        end
end

Fix Content-Length header on UTF8 with HTTP::Daemon in Perl

It seems that HTTP/Daemon.pm is bogus when it calculate
the header 'Content-Length' when the data contains UTF8 data .

In attachement a patch , to calculate the length in bytes
of the data .

-- 
     ____________________________________________________________
    / Erwan MAS                                                 /\
   | mailto:[EMAIL PROTECTED]                                   |_/
___|________________________________________________________   |
\___________________________________________________________\__/

--- Daemon.pm.orig      2004-12-11 16:13:22.000000000 +0100
+++ Daemon.pm   2006-05-02 22:53:33.660393022 +0200
@@ -436,7 +436,7 @@
            }
        }
        elsif (length($content)) {
-           $res->header("Content-Length" => length($content));
+           $res->header("Content-Length" => bytes::length($content));
        }
        else {
            $self->force_last_request;

Handling Accented Characters with Python Regular Expressions

[A-z] just isn't good enough!

import re
string = 'riché'
print string
riché

richre = re.compile('([A-z]+)')
match = richre.match(string)
print match.groups()
('rich',)

richre = re.compile('(\w+)',re.LOCALE)
match = richre.match(string)
print match.groups()
('rich',)

richre = re.compile('([é\w]+)')
match = richre.match(string)
print match.groups()
('rich\xe9',)

richre = re.compile('([\xe9\w]+)')
match = richre.match(string)
print match.groups()
('rich\xe9',)

richre = re.compile('([\xe9-\xf8\w]+)')
match = richre.match(string)
print match.groups()
('rich\xe9',)

string = 'richéñ'
match = richre.match(string)
print match.groups()
('rich\xe9\xf1',)

richre = re.compile('([\u00E9-\u00F8\w]+)')
print match.groups()
('rich\xe9\xf1',)

matched = match.group(1)
print matched
richéñ


escape-control-chars

    escape-control-chars: func [
        "Convert all control chars in string to \uxxxx format"
        s [any-string!] /local ctrl-ch c
    ][
        ctrl-ch: charset [#"^@" - #"^_"]
        parse/all s [
            any [
                mark: copy c ctrl-ch (
                    change/part mark encode-control-char to char! c 1
                ) 5 skip
                | skip
            ]
        ]
        s
    ]

encode-control-char

    encode-control-char: func [char [char! integer!]] [
         join "\u" at to-hex to integer! char 5
    ]

replace-unicode-escapes

    replace-unicode-escapes: func [s [string!] /local c uc] [
        parse s [
            any [
                some chars
                | [mark: #"\"
                   #"u" copy c 4 hex-c (
                    change/part mark uc: decode-unicode-char c 6  ; 6 = length "\uxxxx"
                    ) -1 skip :mark
                | escaped]
            ]
        ]
    ]

decode-unicode-char - decode hex-encoded char values (2 or 3 chars in 4 hex digits)

    decode-unicode-char: func [val /local c] [
        c: to-integer debase/base val 16
        rejoin either c < 128 [[to-char c]] [
            either c < 2048 [[
                to-char (192 or to-integer (c / 64))
                to-char (128 or (c and 63))
            ]] [[
                to-char (224 or to-integer (c / 4096))
                to-char (128 or ((to-integer (c / 64)) and 63))
                to-char (128 or (c and 63))
            ]]
        ]
    ]

Convert unicode characters to HTML entities in Ruby

def entities( str )
  converted = []
  str.split(//).collect { |c| converted << ( c[0] > 127 ? "&##{c[0]};" : c ) }
  converted.join('')
end

Write your code using utf-8

# -*- coding: utf-8 -*-


Add this line at the beginning of your code. Python won't complain anymore about non-ascii characters in your code.

Ellipsis in Unicode ("...")

Character 0x2026 is the ellipsis "..." (three dots):

Use it in java for Buttons, e.g: "Refresh...":

    private class RefreshAction extends AbstractAction {

        private RefreshAction() {
            super("Refresh\u2026");
        }

        public void actionPerformed(ActionEvent e) {
            ...
        }
    }


Or in HTML:
&#8230;
« Newer Snippets
Older Snippets »
Showing 11-20 of 21 total