Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

About this user

Peter Cooperx http://www.petercooper.co.uk/

« Newer Snippets
Older Snippets »
Showing 1-5 of 5 total  RSS 

Fast stop word detection in Ruby

Requires BloominSimple (a pure Ruby Bloom filter class).

List of stop words obtained from http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words

   1  # Detect stop words QUICKLY
   2  # Uses a bloom filter instead of searching literally through a list of stopwords
   3  # for > 3x speed increase
   4  # 
   5  #    using bloom filter: 2.580000   0.030000   2.610000 (  2.698829)
   6  #  using literal search: 7.850000   0.120000   7.970000 (  8.181684)
   7  
   8  
   9  require 'bloominsimple'
  10  require 'digest/sha1'
  11  require 'pp'
  12  
  13  # Create a simple bloom filter that uses a SHA1 hash (more effective than BloominSimple's default hashing)
  14  b = BloominSimple.new(50000) do |word|
  15    Digest::SHA1.digest(word.downcase.strip).unpack("VVV")
  16  end
  17  
  18  # Add stopwords to the bloom filter!
  19  stopwords = []
  20  File.open('stopwords').each { |a| b.add(a); stopwords << a.downcase.strip }
  21  
  22  # Read in a whole dictionary of regular words
  23  words = File.open('/usr/share/dict/words').read.split.collect{|a| a.downcase.strip }
  24  
  25  # Define two ways to detect stopwords for comparison..
  26  using_filter = lambda { |word| b.includes?(word) }
  27  using_array = lambda { |word| stopwords.include?(word.downcase.strip) }
  28  techniques = [using_filter, using_array]
  29  
  30  # Run stopword comparisons with both techniques
  31  t = techniques.collect { |l| words.collect { |a| l[a] } }
  32  
  33  # See how effective the bloom filter has been compared to the literal search
  34  if t[0] == t[1]
  35    puts "GOOD"
  36  else
  37    words.zip(t[0],t[1]).each do |x|
  38      puts x.first if x[1] != x[2]
  39    end
  40  end
  41  
  42  # Now do speed benchmarks..
  43  techniques.each { |l| puts Benchmark.measure { words.each { |a| l[a] } } }

How to create a OpenSearch reference for your site (as used by Firefox 2's search box)

Create a file like this one as used for Wikipedia, but that refers to your own site's search:

   1  
   2  <?xml version="1.0"?>
   3  <OpenSearchDescription xmlns="http://a9.com/-/spec/opensearch/1.1/">
   4  <ShortName>Wikipedia (English)</ShortName>
   5  <Description>Wikipedia (English)</Description>
   6  <Image height="16" width="16" type="image/x-icon">http://en.wikipedia.org/favicon.ico</Image>
   7  <Url type="text/html" method="get" template="http://en.wikipedia.org/w/index.php?title=Special:Search&amp;search={searchTerms}"/>
   8  <Url type="application/x-suggestions+json" method="GET" template="http://en.wikipedia.org/w/api.php?action=opensearch&amp;search={searchTerms}"/>
   9  </OpenSearchDescription>


Then link to it from your pages like so:

   1  
   2  <link rel="search" type="application/opensearchdescription+xml" href="/w/opensearch_desc.php" title="Wikipedia (English)" />

Change default action of non-URLs in Internet Explorer location bar

Turn this into a .reg file for people to use:

   1  REGEDIT4
   2  
   3  [HKEY_CURRENT_USER\Software\Microsoft\Internet Explorer\SearchURL]
   4  ""="http://www.yoursite.com/search?&q=%s"
   5  "provider"="x"

Let user add your search engine to their Firefox search bar

Built your own queryable search engine? Let FireFox users easily add it to their search box with this on an HTML page:

   1  <a href="javascript:window.sidebar.addSearchEngine('http://yoursrcfile.src','http://yourpngfile.png','Name','Type Of App');">Click here to add my search engine</a>


And you have to make those 'src' and PNG icon files too.. the 'src' file has a syntax like this:

   1  <search name="YourSite" description="" method="GET" 
   2  action="http://www.yoursite.whatever/search" 
   3  searchForm="url-to-your-actual-search-form" queryEncoding="UTF-8" 
   4  queryCharset="UTF-8">
   5  <input name="sourceid" value="FireFox-Search-Box">
   6  <input name="q (or whatever your preferred arg is)" user=""><inputprev>
   7  </search>
   8  
   9  <browser update="URL back to this SRC file" updateIcon="URL to your PNG" 
  10  updateCheckDays="30">


More info here.

Search and replace over file(s) with Perl

A quick bit of Perl can come in handy if you have an old site to update that has no CMS, or something similar.

To change 'source' to 'destination' in all HTML files in the current directory:

   1  perl -pi -e 's/source/destination/g' *.html


You could use this to update copyright notices, etc.. but bear in mind you need to stay with Perl/regex syntax, so escape those forward slashes, etc :)
« Newer Snippets
Older Snippets »
Showing 1-5 of 5 total  RSS