Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

« Newer Snippets
Older Snippets »
Showing 1-10 of 14 total  RSS 

A simple html template for a moved site (DTD XHTML 1.0 Strict) with russian text

!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ru" lang="ru">
	<head>
		<title>Сайт tclogic.ru переехал! (301 Site moved)</title>
		<meta http-equiv="content-type" content="text/html; charset=utf-8" />
		<link href="main.css" rel="stylesheet" type="text/css" media="Screen, projection, tv" />
		<style type="text/css">
			body {width:40%;margin:3em auto;}
			img {float:left;margin-top:-1em;}
			h1 {margin:4em 0 0 8em;font-size:120%;font-weight:normal;}
			a {color:green;}
		</style>
	</head>
	<body>
		<div id="header">
			<img src="http://mkorinets.googlepages.com/truck.gif" alt="Грузовик, перевозящий сайт" />
			<h1>Сайт <strong>tclogic.ru</strong> переехал на новый адрес: <a href="http://etrans.ru/">http://etrans.ru/</a></h1>
		</div>
	</body> 
</html>

A general html template (DTD XHTML 1.0 Strict) with some russian text

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ru" lang="ru">
	<head>
		<title>An XHTML 1.0 Strict standard template</title>
		<meta name="keywords" content="ключевые, слова" />
		<meta name="description" content="Описание страницы" />
		<meta http-equiv="content-type" content="text/html; charset=utf-8" />
		<link rel="icon" href="/new/assets/templates/cosy/images/favicon.ico" type="image/x-con" />
		<link rel="shortcut icon" href="/new/assets/templates/cosy/images/favicon.ico" type="image/x-con" />
		<link href="main.css" rel="stylesheet" type="text/css" media="Screen, projection, tv" />
		<script type="text/javascript" src="/new/show_hide.js"></script>
		<link href="/new/main.css" rel="stylesheet" type="text/css" media="Screen, projection" />
		<!--[if IE 6]>
			<style type="text/css" media="Screen, projection">
				@import url('/new/main_ie6.css');
			</style>
		<![endif]-->
			<!--[if IE 7]>
			<link href="/new/main_ie7.css" rel="stylesheet" type="text/css" media="Screen, projection" />
		<![endif]-->
	</head>
	<body>
		<div id="header">
			<img id="logo" src="images/logo.jpg" alt="" />
			<h1>Заголовок сайта</h1>
			<p>Слоган, описание, циатта, все что угодно</p>
		</div>
		<div id="subnav">
			<a class="hide" href="/">[Меню]</a>
			<a class="hide" href="/">[На главную]</a>
			<a class="hide" href="/">[Контакты]</a>
		</div>
		<div id="content">
			<h2>Заголовок страницы</h2>
			<p>Атмосферу домашнего уюта создает не только хорошо продуманный интерьер, но и такие необходимые в быту вещи, как полотенца и постельное белье, другими словами домашний текстиль. Человек проводит треть своей жизни во сне, так почему бы не позаботится о том, чтобы эта часть нашей жизни прошла комфортно? Сегодня на прилавках магазинов можно встретить разнообразное постельное белье и полотенца как отечественных, так и зарубежных производителей, однако качественный и недорогой домашний текстиль встречается в магазинах нечасто. Кроме таких популярных отечественных тканей для постельного белья (КПБ), как бязь и сатин, в последнее время широкое распространение получает текстиль (постельное белье, махровые халаты и полотенца) из импортных тканей, гобелен, поликотон, а также такие ткани для спецодежды и рабочей одежды, как плащевые, сорочечные, и ткань Оксфорд (Oxford).</p>

			<p>Компания "I-TEX" предлагает широкий ассортимент домашнего текстиля от лучших производителей. Под такими известными торговыми марками, как "TEKA", "MIRANDA", "FLORENZA" и "ХЛОПКОВЫЙ РАЙ" представлены: бязь, сатин, поликотон, гобелен, фланель, КПБ (комплекты постельного белья), махровые полотенца, ткани для спецодежды и текстиль для рабочей одежды.</p>

			<p>Особого внимания заслуживает ткань Оксфорд (Oxford). Это прочная ткань с нанесенным полиуретановым покрытием, которое обеспечивает высокую водоупорность текстиля и препятствует накоплению грязи между волокнами. Последнее время постельное белье и другой текстиль из этой ткань пользуется огромной популярностью благодаря низкой цене и отменному качеству. Ткань Оксфорд (Oxford) представлена в широкой цветовой гамме, которая включает в себя до 10 расцветок. Мы предлагаем набивную и гладкокрашеную ткань Оксфорд (Oxford) по самым низким ценам.</p>

			<p>Также у нас Вы можете купить постельное белье по очень выгодным ценам. Вся продукция и текстиль оптом продается со склада в Москве. У нас всегда в ассортименте махровые полотенца</p>
		</div>		
		<div id="nav">
			<h2>Меню</h2>
			<ul>
				<li class="first"><a href="/">О компании</a></li>
				<li><a href="/">Новости</a></li>
				<li><a href="/">Прайс-лист</a></li>
				<li><a href="/">Контакты</a></li>
			</ul>
		</div>
		<div id="search">
			<form id="search_form" action="/">
				<fieldset>
					<legend>Поиск</legend>
					<label>Запрос</label>
					<input type="text" name="quiry" />
					<button type="submit">Ок</button>
				</fieldset>
			</form>
		</div>
		<div id="footer" class="vcard">
			<div class="contacts">
				<h2>Контакты</h2>
				<span class="label">Тел.:</span> <span class="tel">139-3127</span>
				<span class="label">Адрес:</span> <span class="adr"><span class="postal-code">109147</span>, <span class="locality">г.Москва</span>, <span class="street-address">Абельмановская ул., д.11</span></span>
			</div>
			<div id="copyright">
				<h2>Копирайт</h2>
				&copy; 2007 <span class="fn org">Автор копирайта</span>
			</div>
			<div id="counters">
				<h2>Счетчики</h2>
				<a href="/"><img src="img/hit.gif" alt="counter" /></a>
				<a href="/"><img src="img/hit.gif" alt="counter" /></a>
			</div>
			<div class="adlinks">
				<h2>Рекламные ссылки</h2>
				Керамзит с доставкой  <a href="http://www.shopvira.ru/catalog-21-2110-0-0.html">керамзит - цена</a>  | Обратите внимание:  <a href="http://mydomain-in.net/">Регистрация домена jobs</a>  |  <a href="http://tenderit.ru/v8/p28/t29/l314/f160/index.html">Тендерит.РУ - IPM GROUP</a>
			</div>
		</div>
	</body>
</html>

Convert cp1252-> utf-8 character set (python and ruby)

Oooh, I hate character sets. Specifically that there are more than one of them. Here is a Ruby version of a Python script I found to convert cp1252 (aka windows-1252) into utf-8.

  def clean_up dirty_text
    newstr = ""
    dirty_text.length.times do |i|
      character = dirty_text[i]
      newstr += if character < 0x80
        character.chr
      elsif character < 0xC0
        "\xC2" + character.chr
      else
        "\xC3" + (character - 64).chr
      end
    end
    newstr
  end


The original Python script was (http://miscoranda.com/96):

#!/usr/bin/python
import sys
for c in sys.stdin.read(): 
   if ord(c) < 0x80: sys.stdout.write(c)
   elif ord(c) < 0xC0: sys.stdout.write('\xC2' + c)
   else: sys.stdout.write('\xC3' + chr(ord(c) - 64))

Convert a UTF-8 string to ISO-8859-1

Convert a utf string to iso, used this when generating a pdf with pdf-writer in Rails, all my text is UTF8 but pdf-writer does not support this.

#add this to environment.rb
#call to_iso on any UTF8 string to get a ISO string back
#example : "Cédez le passage aux français".to_iso

class String
  require 'iconv' #this line is not needed in rails !
  def to_iso
    Iconv.conv('ISO-8859-1', 'utf-8', self)
  end
end

Some problems with charset in UTF-8 ?

So you can use this request MySQL before all others, for fix your problems :
...
mysql_query( "SET NAMES 'utf8' " );
...


Source: ab-d.fr
Languages: PHP and MySQL

Match UTF-8 characters

var string = 'abcde ąbćdę';

// this wont find anythin
string.match( /^[a-z]*$/i );

// and this one will work fine :)
string.match( /^[a-z\u00A1-\uFFFF]*$/i );

Convert Unicode codepoints to UTF-8 characters with Module#const_missing

From: http://www.davidflanagan.com/blog/2007_08.html#000136
Author: David Flanagan


# This module lazily defines constants of the form Uxxxx for all Unicode
# codepoints from U0000 to U10FFFF. The value of each constant is the
# UTF-8 string for the codepoint.
# Examples:
#   copyright = Unicode::U00A9
#   euro = Unicode::U20AC
#   infinity = Unicode::U221E
#
module Unicode
  def self.const_missing(name)  
    # Check that the constant name is of the right form: U0000 to U10FFFF
    if name.to_s =~ /^U([0-9a-fA-F]{4,5}|10[0-9a-fA-F]{4})$/
      # Convert the codepoint to an immutable UTF-8 string,
      # define a real constant for that value and return the value
      #p name, name.class
      const_set(name, [$1.to_i(16)].pack("U").freeze)
    else  # Raise an error for constants that are not Unicode.
      raise NameError, "Uninitialized constant: Unicode::#{name}"
    end
  end
end


puts copyright = Unicode::U00A9
puts euro = Unicode::U20AC
puts euro = Unicode::U20AC
puts infinity = Unicode::U221E
puts Unicode.const_get(:U221E)
p Unicode.constants
puts Unicode.constants
Unicode.constants.each { |u| puts Unicode.const_get(u) }


UTF8-aware string methods in Ruby

Author: ntk
License: The MIT License, Copyright (c) 2007 ntk
Description: some basic UTF8-aware string methods for Ruby's String class (Ruby 1.8.6)
Requirements: save this snippet to an UTF-8 encoded file and set the character set encoding of Terminal.app
to UTF-8 (on Mac OS X: Terminal menu -> Window Settings -> Display -> Character Set Encoding; to enable additional features see here)


Further tools:
- rbuconv, a pure Ruby library for Unicode translation
- unicode, a library for Unicode Normalization (sudo gem install unicode); for a Windows version see Unicode in Ruby on Rails
- ICU4R, a Ruby C-extension binding for the ICU library
- Msort, a command-line sorting program
- punycode4r, a pure Ruby implementation of Punycode (RFC 3492; sudo gem install punycode4r)
- utf8proc, library for processing UTF-8 encoded Unicode strings, (sudo gem install utf8proc)
- Oniguruma, Ruby's regular expression engine; cf. Secure UTF-8 Input in Rails and Migrating your Rails application to Unicode
- character-encodings, seamless integration of character encodings into Ruby's String class, (sudo gem install character-encodings)



class String

   require 'iconv' 
   require 'open-uri'      # cf. http://www.ruby-doc.org/stdlib/libdoc/open-uri/rdoc/index.html

   # taken from: http://www.w3.org/International/questions/qa-forms-utf-8
   UTF8REGEX = /\A(?:                               # ?: non-capturing group (grouping with no back references)
                 [\x09\x0A\x0D\x20-\x7E]            # ASCII
               | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
               |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
               | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
               |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
               |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
               | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
               |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
               )*\z/mnx


#  create UTF-8 character arrays (as class instance variables)
#
#  mapping tables: - http://www.unicode.org/Public/UCA/latest/allkeys.txt
#                  - http://unicode.org/Public/UNIDATA/UnicodeData.txt 
#                  - http://unicode.org/Public/UNIDATA/CaseFolding.txt
#                  - http://www.decodeunicode.org 
#                  - ftp://ftp.mars.org/pub/ruby/Unicode.tar.bz2
#                  - http://camomile.sourceforge.net
#                  - Character Palette (Mac OS X)


   # test data
   @small_letters_utf8 = ["U+00F1", "U+00F4", "U+00E6", "U+00F8", "U+00E0", "U+00E1", "U+00E2", "U+00E4", "U+00E5", "U+00E7", "U+00E8", "U+00E9", "U+00EA", "U+00EB", "U+0153"].map { |x| u = [x[2..-1].hex].pack("U*"); u =~ UTF8REGEX ? u : nil }


   @capital_letters_utf8 = ["U+00D1", "U+00D4", "U+00C6", "U+00D8", "U+00C0", "U+00C1", "U+00C2", "U+00C4", "U+00C5", "U+00C7", "U+00C8", "U+00C9", "U+00CA", "U+00CB", "U+0152"].map { |x| u = [x[2..-1].hex].pack("U*"); u =~ UTF8REGEX ? u : nil }


   @other_letters_utf8 = ["U+03A3", "U+0639", "U+0041", "U+F8D0", "U+F8FF", "U+4E2D", "U+F4EE", "U+00FE", "U+10FFFF", "U+00A9", "U+20AC", "U+221E", "U+20AC", "U+FEFF", "U+FFFD", "U+00FF", "U+00FE", "U+FFFE", "U+FEFF"].map { |x| u = [x[2..-1].hex].pack("U*"); u =~ UTF8REGEX ? u : nil }

   if @small_letters_utf8.size != @small_letters_utf8.nitems then raise "Invalid UTF-8 char in @small_letters_utf8!" end
   if @capital_letters_utf8.size != @capital_letters_utf8.nitems then raise "Invalid UTF-8 char in @capital_letters_utf8!" end
   if @other_letters_utf8.size != @other_letters_utf8.nitems then raise "Invalid UTF-8 char in @other_letters_utf8!" end


   @unicode_array = []
   #open('http://unicode.org/Public/UNIDATA/UnicodeData.txt') do |f| f.each(nil) { |line| line.scan(/^[^;]+/) { |u| @unicode_array << u } }  end
   #open('http://unicode.org/Public/UNIDATA/UnicodeData.txt') do |f|                                                                               
   #   f.each do |line| line =~ /LATIN|GREEK|CYRILLIC/  ?  ( line.scan(/^[^;]+/) { |u| @unicode_array << u } )  :  next  end
   #end

   #@letters_utf8 = @unicode_array.map { |x| u = [x.hex].pack("U*"); u =~ UTF8REGEX ? u : nil }.compact   # code points from UnicodeData.txt
   @letters_utf8 = @small_letters_utf8 + @capital_letters_utf8 + @other_letters_utf8                      # test data only

   # Hash[*array_with_keys.zip(array_with_values).flatten]
   @downcase_table_utf8 = Hash[*@capital_letters_utf8.zip(@small_letters_utf8).flatten]
   @upcase_table_utf8 = Hash[*@small_letters_utf8.zip(@capital_letters_utf8).flatten]
   @letters_utf8_hash = Hash[*@letters_utf8.zip([]).flatten]    #=> ... "\341\272\242"=>nil ...

   class << self 
      attr_accessor :small_letters_utf8
      attr_accessor :capital_letters_utf8
      attr_accessor :other_letters_utf8
      attr_accessor :letters_utf8
      attr_accessor :letters_utf8_hash
      attr_accessor :unicode_array
      attr_accessor :downcase_table_utf8
      attr_accessor :upcase_table_utf8
   end


   def each_utf8_char
      scan(/./mu) { |c| yield c }
   end

   def each_utf8_char_with_index
      i = -1
      scan(/./mu) { |c| i+=1; yield(c, i) }
   end

   def length_utf8
      #scan(/./mu).size
      count = 0
      scan(/./mu) { count += 1 }
      count
   end
   alias :size_utf8 :length_utf8

   def reverse_utf8
      split(//mu).reverse.join
   end

   def reverse_utf8!
      split(//mu).reverse!.join
   end

   def swapcase_utf8
     gsub(/./mu) do |char|  
         if !String.downcase_table_utf8[char].nil? then String.downcase_table_utf8[char]
         elsif !String.upcase_table_utf8[char].nil? then String.upcase_table_utf8[char]
         else char.swapcase
         end
      end
   end

   def swapcase_utf8!
      gsub!(/./mu) do |char|  
         if !String.downcase_table_utf8[char].nil? then String.downcase_table_utf8[char]
         elsif !String.upcase_table_utf8[char].nil? then String.upcase_table_utf8[char]
         else ret = char.swapcase end
      end
   end

   def downcase_utf8
      gsub(/./mu) do |char|  
         small_char = String.downcase_table_utf8[char]
         small_char.nil? ? char.downcase : small_char
      end
   end

   def downcase_utf8!
      gsub!(/./mu) do |char|  
         small_char = String.downcase_table_utf8[char]
         small_char.nil? ? char.downcase : small_char
      end
   end

   def upcase_utf8
      gsub(/./mu) do |char|  
         capital_char = String.upcase_table_utf8[char]
         capital_char.nil? ? char.upcase : capital_char
      end
   end

   def upcase_utf8!
      gsub!(/./mu) do |char|  
         capital_char = String.upcase_table_utf8[char]
         capital_char.nil? ? char.upcase : capital_char
      end
   end

   def count_utf8(c)
      return nil if c.empty?
      r = %r{[#{c}]}mu
      scan(r).size
   end

   def delete_utf8(c)
      return self if c.empty?
      r = %r{[#{c}]}mu
      gsub(r, '')
   end

   def delete_utf8!(c)
      return self if c.empty?
      r = %r{[#{c}]}mu
      gsub!(r, '')
   end

   def first_utf8
      self[/\A./mu]
   end

   def last_utf8
      self[/.\z/mu]
   end

   def capitalize_utf8
     return self if self =~ /\A[[:space:]]*\z/m
     ret = ""
     split(/\x20/).each do |w| 
         count = 0
         w.gsub(/./mu) do |char|  
            count += 1
            capital_char = String.upcase_table_utf8[char]
            if count == 1 then 
               capital_char.nil? ? char.upcase : char.upcase_utf8
            else
               capital_char.nil? ? char.downcase : char.downcase_utf8
            end
         end
         ret << w + ' '
     end
     ret =~ /\x20\z/ ? ret.sub!(/\x20\z/, '') : ret  
   end

   def capitalize_utf8!
     return self if self =~ /\A[[:space:]]*\z/m 
     ret = ""
     split(/\x20/).each do |w| 
         count = 0
         w.gsub!(/./mu) do |char|  
            count += 1
            capital_char = String.upcase_table_utf8[char]
            if count == 1 then 
               capital_char.nil? ? char.upcase : char.upcase_utf8
            else
               capital_char.nil? ? char.downcase : char.downcase_utf8
            end
         end
         ret << w + ' '
     end
     ret =~ /\x20\z/ ? ret.sub!(/\x20\z/, '') : ret
   end


   def index_utf8(s)

      return nil unless !self.empty? && (s.class == Regexp || s.class == String)
      #raise(ArgumentError, "Wrong argument for method index_utf8!", caller) unless !self.empty? && (s.class == Regexp || s.class == String)

      if s.class == Regexp
         opts = s.inspect.gsub(/\A(.).*\1([eimnosux]*)\z/mu, '\2')
         if  opts.count('u') == 0 then opts = opts + "u" end
         str = s.source
         return nil if str.empty?
         str = "%r{#{str}}" + opts
         r = eval(str)
         l = ""
         sub(r) { l << $`; " " }  # $`: The string to the left of the last successful match (cf. http://www.zenspider.com/Languages/Ruby/QuickRef.html)
         l.empty? ? nil : l.length_utf8

      else

         return nil if s.empty?
         r = %r{#{s}}mu
         l = ""
         sub(r) { l << $`; " " }
         l.empty? ? nil : l.length_utf8

# this would be a non-regex solution
=begin 
         return nil if s.empty?
         return nil unless self =~ %r{#{s}}mu
         indices = []
         s.split(//mu).each do |x|
            ar = []
            self.each_utf8_char_with_index { |c,i| if c == x then ar << i end  }   # first get all matching indices c == x
            indices << ar unless ar.empty?
         end
         if indices.empty?
            return nil
         elsif indices.size == 1 
            indices.first.first
         else 
            #p indices
            ret = []
            a0 = indices.shift
            a0.each do |i|
               ret << i
               indices.each { |a| if a.include?(i+1) then i += 1; ret << i else ret = []; break end  }
               return ret.first unless ret.empty?
            end
            ret.empty? ? nil : ret.first
         end
=end

      end
   end   


   def rindex_utf8(s)

      return nil unless !self.empty? && (s.class == Regexp || s.class == String)
      #raise(ArgumentError, "Wrong argument for method index_utf8!", caller) unless !self.empty? && (s.class == Regexp || s.class == String)

      if s.class == Regexp
         opts = s.inspect.gsub(/\A(.).*\1([eimnosux]*)\z/mu, '\2')
         if  opts.count('u') == 0 then opts = opts + "u" end
         str = s.source
         return nil if str.empty?
         str = "%r{#{str}}" + opts
         r = eval(str)
         l = ""
         scan(r) { l = $` }  
         #gsub(r) { l = $`; " " }  
         l.empty? ? nil : l.length_utf8
      else
         return nil if s.empty?
         r = %r{#{s}}mu
         l = ""
         scan(r) { l = $` }  
         #gsub(r) { l = $`; " " }
         l.empty? ? nil : l.length_utf8
      end

   end   


   # note that the i option does not work in special cases with back references
   # example: "àÀ".slice_utf8(/(.).*?\1/i) returns nil whereas "aA".slice(/(.).*?\1/i) returns "aA"
   def slice_utf8(regex)   
      opts = regex.inspect.gsub(/\A(.).*\1([eimnosux]*)\z/mu, '\2')
      if  opts.count('u') == 0 then opts = opts + "u" end
      s = regex.source
      str = "%r{#{s}}" + opts
      r = eval(str)
      slice(r)
   end

   def slice_utf8!(regex)   
      opts = regex.inspect.gsub(/\A(.).*\1([eimnosux]*)\z/mu, '\2')
      if  opts.count('u') == 0 then opts = opts + "u" end
      s = regex.source
      str = "%r{#{s}}" + opts
      r = eval(str)
      slice!(r)
   end

   def cut_utf8(p,l)    # (index) position, length
      raise(ArgumentError, "Error: argument is not Fixnum", caller) if p.class != Fixnum or l.class != Fixnum
      s = self.length_utf8
      #if p < 0 then p = s - p.abs end
      if p < 0 then p.abs > s ? (p = 0) : (p = s - p.abs) end      #  or:  ... p.abs > s ? (return nil) : ...
      return nil if l > s or p > (s - 1)
      ret = ""
      count = 0
      each_utf8_char_with_index do |c,i| 
         break if count >= l
         if i >= p && count < l then count += 1; ret << c; end
      end
      ret
   end

   def starts_with_utf8?(s)
      return nil if self.empty? or s.empty?
      cut_utf8(0, s.size_utf8) == s 
   end

   def ends_with_utf8?(s)
      return nil if self.empty? or s.empty?
      cut_utf8(-(s.size_utf8), s.size_utf8) == s
   end

   def insert_utf8(i,s)                                  # insert_utf8(index, string)
      return self if s.empty?
      l = self.length_utf8
      if l == 0 then return s end
      if i < 0 then i.abs > l ? (i = 0) : (i = l - i.abs) end          #  or:  ... i.abs > l ? (return nil) : ...
      #return nil if i > (l - 1)                         # return nil ...
      spaces = ""
      if i > (l-1) then spaces = " " * (i - (l-1)) end   # ... or add spaces
      str = self << spaces
      s1 = str.cut_utf8(0, i)
      s2 = str.cut_utf8(i, l - s1.length_utf8)
      s1 << s << s2
   end

   def split_utf8(regex)
      opts = regex.inspect.gsub(/\A(.).*\1([eimnosux]*)\z/mu, '\2')
      if  opts.count('u') == 0 then opts = opts + "u" end
      s = regex.source
      str = "%r{#{s}}" + opts
      r = eval(str)
      split(r)
   end

   def scan_utf8(regex)
      opts = regex.inspect.gsub(/\A(.).*\1([eimnosux]*)\z/mu, '\2')
      if  opts.count('u') == 0 then opts = opts + "u" end
      s = regex.source
      str = "%r{#{s}}" + opts
      r = eval(str)
      if block_given? then scan(r) { |a,*m| yield(a,*m) } else scan(r) end
   end

   def range_utf8(r)

      return nil if r.class != Range
      #raise(ArgumentError, "No Range object given!", caller) if r.class != Range

      a = r.to_s[/^[\+\-]?\d+/].to_i
      b = r.to_s[/[\+\-]?\d+$/].to_i
      d = r.to_s[/\.+/]

      if d.size == 2 then d = 2 else d = d.size end 

      l = self.length_utf8

      return nil if b.abs > l || a.abs > l || d < 2 || d > 3

      if a < 0 then a = l - a.abs end
      if b < 0 then b = l - b.abs end
      
      return nil if a > b

      str = ""

      each_utf8_char_with_index do |c,i|
         break if i > b
         if d == 2
            (i >= a && i <= b) ? str << c : next
         else
            (i >= a && i < b) ? str << c : next
         end
      end

      str

   end
 
   def utf8?
     self =~ UTF8REGEX
   end

   def clean_utf8
       t = ""
       self.scan(/./um) { |c| t << c if c =~ UTF8REGEX }
       t
   end


   def utf8_encoded_file?   # check (or rather guess) if (HTML) file encoding is UTF-8 (experimental, so use at your own risk!)

      file = self
      str = ""

      if file =~ /^http:\/\//

         url = file

         if RUBY_PLATFORM =~ /darwin/i   # Mac OS X 10.4.10
          
            seconds = 30  

            # check if web site is reachable
            # on Windows try to use curb, http://curb.rubyforge.org (sudo gem install curb)
            var = %x{ /usr/bin/curl -I -L --fail --silent --connect-timeout #{seconds} --max-time #{seconds+10} #{url}; /bin/echo -n $? }.to_i

            #return false unless var == 0
            raise "Failed to create connection to web site: #{url}  --  curl error code: #{var}  --  " unless var == 0

            str = %x{ /usr/bin/curl -L --fail --silent --connect-timeout #{seconds} --max-time #{seconds+10} #{url} | \
                      /usr/bin/grep -Eo -m 1 \"(charset|encoding)=[\\"']?[^\\"'>]+\" | /usr/bin/grep -Eo \"[^=\\"'>]+$\" }
            p str
            return true if str =~ /