Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

About this user

Peter Cooperx http://www.petercooper.co.uk/

« Newer Snippets
Older Snippets »
Showing 1-4 of 4 total  RSS 

Fast stop word detection in Ruby

Requires BloominSimple (a pure Ruby Bloom filter class).

List of stop words obtained from http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words

   1  # Detect stop words QUICKLY
   2  # Uses a bloom filter instead of searching literally through a list of stopwords
   3  # for > 3x speed increase
   4  # 
   5  #    using bloom filter: 2.580000   0.030000   2.610000 (  2.698829)
   6  #  using literal search: 7.850000   0.120000   7.970000 (  8.181684)
   7  
   8  
   9  require 'bloominsimple'
  10  require 'digest/sha1'
  11  require 'pp'
  12  
  13  # Create a simple bloom filter that uses a SHA1 hash (more effective than BloominSimple's default hashing)
  14  b = BloominSimple.new(50000) do |word|
  15    Digest::SHA1.digest(word.downcase.strip).unpack("VVV")
  16  end
  17  
  18  # Add stopwords to the bloom filter!
  19  stopwords = []
  20  File.open('stopwords').each { |a| b.add(a); stopwords << a.downcase.strip }
  21  
  22  # Read in a whole dictionary of regular words
  23  words = File.open('/usr/share/dict/words').read.split.collect{|a| a.downcase.strip }
  24  
  25  # Define two ways to detect stopwords for comparison..
  26  using_filter = lambda { |word| b.includes?(word) }
  27  using_array = lambda { |word| stopwords.include?(word.downcase.strip) }
  28  techniques = [using_filter, using_array]
  29  
  30  # Run stopword comparisons with both techniques
  31  t = techniques.collect { |l| words.collect { |a| l[a] } }
  32  
  33  # See how effective the bloom filter has been compared to the literal search
  34  if t[0] == t[1]
  35    puts "GOOD"
  36  else
  37    words.zip(t[0],t[1]).each do |x|
  38      puts x.first if x[1] != x[2]
  39    end
  40  end
  41  
  42  # Now do speed benchmarks..
  43  techniques.each { |l| puts Benchmark.measure { words.each { |a| l[a] } } }

Send and receive SMS text messages with Ruby and a GSM/GPRS modem

   1  require 'serialport'
   2  require 'time'
   3  
   4  class GSM
   5    
   6    SMSC = "+447785016005"  # SMSC for Vodafone UK - change for other networks
   7  
   8    def initialize(options = {})
   9      @port = SerialPort.new(options[:port] || 3, options[:baud] || 38400, options[:bits] || 8, options[:stop] || 1, SerialPort::NONE)
  10      @debug = options[:debug]
  11      cmd("AT")
  12      # Set to text mode
  13      cmd("AT+CMGF=1")
  14      # Set SMSC number
  15      cmd("AT+CSCA=\"#{SMSC}\"")    
  16    end
  17    
  18    def close
  19      @port.close
  20    end
  21    
  22    def cmd(cmd)
  23      @port.write(cmd + "\r")
  24      wait
  25    end
  26    
  27    def wait
  28      buffer = ''
  29      while IO.select([@port], [], [], 0.25)
  30        chr = @port.getc.chr;
  31        print chr if @debug == true
  32        buffer += chr
  33      end
  34      buffer
  35    end
  36  
  37    def send_sms(options)
  38      cmd("AT+CMGS=\"#{options[:number]}\"")
  39      cmd("#{options[:message][0..140]}#{26.chr}\r\r")
  40      sleep 3
  41      wait
  42      cmd("AT")
  43    end
  44    
  45    class SMS
  46      attr_accessor :id, :sender, :message, :connection
  47      attr_writer :time
  48      
  49      def initialize(params)
  50          @id = params[:id]; @sender = params[:sender]; @time = params[:time]; @message = params[:message]; @connection = params[:connection]
  51      end
  52      
  53      def delete
  54        @connection.cmd("AT+CMGD=#{@id}")
  55      end
  56      
  57      def time
  58        # This MAY need to be changed for non-UK situations, I'm not sure
  59        # how standardized SMS timestamps are..
  60        Time.parse(@time.sub(/(\d+)\D+(\d+)\D+(\d+)/, '\2/\3/20\1'))
  61      end
  62    end
  63    
  64    def messages
  65      sms = cmd("AT+CMGL=\"ALL\"")
  66      # Ugly, ugly, ugly!
  67      msgs = sms.scan(/\+CMGL\:\s*?(\d+)\,.*?\,\"(.+?)\"\,.*?\,\"(.+?)\".*?\n(.*)/)
  68      return nil unless msgs
  69      msgs.collect!{ |m| GSM::SMS.new(:connection => self, :id => m[0], :sender => m[1], :time => m[2], :message => m[3].chomp) } rescue nil
  70    end
  71  end
  72  
  73  
  74  destination_number = "+44 someone else"
  75  
  76  p = GSM.new(:debug => false)
  77  
  78  # Send a text message
  79  p.send_sms(:number => destination_number, :message => "Test at #{Time.now}")
  80  
  81  # Read text messages from phone
  82  p.messages.each do |msg|
  83    puts "#{msg.id} - #{msg.time} - #{msg.sender} - #{msg.message}"
  84    # msg.delete
  85  end

Text synonymizer in Perl - unintelligent text rewriter

Very scrappy and silly, but you get some funny results. It uses the great Lingua::EN::Tagger for POS (Parts of Speech) tagging.

   1  use WordNet::QueryData;
   2  use Lingua::EN::Tagger;
   3  
   4  my $t = new Lingua::EN::Tagger;
   5  my $wn = WordNet::QueryData->new;
   6  
   7  my $text;
   8  
   9  open (FH, "<" . $ARGV[0]);
  10  while (<FH>) { $text .= $_; }
  11  close (FH);
  12  
  13  my $tagged = $t->add_tags($text);
  14  
  15  while ($tagged =~ /\<(.+?)\>(\w+)\<.+?\>/g) {
  16          my $sense = $1;
  17          my $word = $2;
  18          my $newsense = "";
  19          $newsense = "n" if ($sense =~ /nn/i);
  20          $newsense = "a" if ($sense =~ /jj/i);
  21          $newsense = "v" if ($sense =~ /vb/i);
  22          if ($newsense) {
  23                  foreach ($wn->querySense($word . "#" . $newsense . "#1" , "syns")) {
  24                          s/\#.+//;
  25                          next if (/$word/);
  26                          $text =~ s/$word/$_/;
  27                          last;
  28                  }
  29          }
  30  
  31  };
  32  
  33  print $text;
  34  exit;


Or to do it to a Web page / URL, use HTML::Parser like so:

   1  use WordNet::QueryData;
   2  use Lingua::EN::Tagger;
   3  use HTML::Parser;
   4  use LWP::Simple;
   5  
   6  my $t = new Lingua::EN::Tagger;
   7  my $wn = WordNet::QueryData->new;
   8  my $p = HTML::Parser->new( text_h => [\&text, "text"] );
   9  
  10  $p->parse(get("http://www.petercooper.co.uk/"));
  11  
  12  exit;
  13  
  14  sub text {
  15          my $text = shift;
  16          $text =~ s/\s+/\ /g;
  17          if ($text =~ /\w{5}/) {        
  18                  print "WAS: " . $text . "\n\n";
  19                  print "BECOMES: " . &synonymize($text) . "\n\n\n\n";
  20          }
  21  }
  22  
  23  sub synonymize {
  24          my $text = shift;
  25  
  26          my $tagged = $t->add_tags($text);
  27  
  28          while ($tagged =~ /\<(.+?)\>(\w+)\<.+?\>/g) {
  29          my $sense = $1;
  30          my $word = $2;
  31          my $newsense = "";
  32          $newsense = "n" if ($sense =~ /nn/i);
  33          $newsense = "a" if ($sense =~ /jj/i);
  34          $newsense = "v" if ($sense =~ /vb/i);
  35          if ($newsense) {
  36                  foreach ($wn->querySense($word . "#" . $newsense . "#1" , "syns")) {
  37                          s/\#.+//;
  38                          next if (/$word/);
  39                          $text =~ s/$word/$_/;
  40                          last;
  41                  }
  42          }
  43  
  44          };
  45          return $text;
  46  }

Capitalizing titles with Ruby

This (mostly) follows the Microsoft Manual of Style for Technical Publications. Lame I know, but it's a reference:

   1  class String
   2     def titlecase
   3        non_capitalized = %w{of etc and by the for on is at to but nor or a via}
   4        gsub(/\b[a-z]+/){ |w| non_capitalized.include?(w) ? w : w.capitalize  }.sub(/^[a-z]/){|l| l.upcase }.sub(/\b[a-z][^\s]*?$/){|l| l.capitalize }
   5     end
   6  end


Examples:

   1  "this is a story in the new york times".titleize # => "This is a Story In the New York Times"
   2  "what in the world was that for?".titleize # => "What In the World Was That For?"
   3  "searching for a CHEAP overhead projector?".titleize # => "Searching for a CHEAP Overhead Projector?"
« Newer Snippets
Older Snippets »
Showing 1-4 of 4 total  RSS