Never been to DZone Snippets before?

Snippets is a public source code repository. Easily build up your personal collection of code snippets, categorize them with tags / keywords, and share them with the world

Weighted Random array selection (See related posts)

Say you have an array of ['apples', 'oranges', 'bananas', 'kiwis']

Selecting a random item is simple enough
  arr = ['apples', 'oranges', 'bananas', 'kiwis']
  arr[rand(arr.size)]


But say you want to have apples come up 50% of the time, oranges 25%, bananas 15% and kiwis 10% ...


  class Array
    
    #sum (and mean) found on http://snippets.dzone.com/posts/show/2161
    def sum
      inject( nil ) { |sum,x| sum ? sum+x : x }
    end
    
    def probability(spread = 2)
      z = 1.0
      collect {|x| z = z / spread}
    end
    def weighted_random_index(probability_array = probability(2) )
      size.times do |x|
        return x if rand < probability_array[0..x].sum
      end
      return 0
    end
    
    def get_weighted_random_item(probability_array = probability(2))
      self[weighted_random_index(probability_array)]
    end
    
    def get_weighted_random_indexes(num_items, p = probability(2))
      res = []
      escape = 1000
      while res.size < num_items and escape > 0
        escape -= 1
        tmp = weighted_random_index(p) 
        res << tmp unless res.include?(tmp)
      end
      return res.sort
    end
  
  end


arr = ['apples', 'oranges', 'bananas', 'kiwis']
10.times do |i|
  puts arr.get_weighted_random_item([0.5, 0.25, 0.15, 0.10])
end

oranges
oranges
apples
apples
bananas
apples
apples
bananas
oranges
apples



I'm sure there's some math term for this but I'm calling it 'spread' until I learn otherwise. I put in a default probability array of p = p / 2 which gives you (for an array of 4)

arr.probability
=> [0.5, 0.25, 0.125, 0.0625]


On larger arrays the probabilities will get (for all practicle purposes) zero quickly

arr = [1,2,3,4,5,6,7,8,9]
=> [1, 2, 3, 4, 5, 6, 7, 8, 9]
arr.probability
=> [0.5, 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125]


I put in a simple spread concept to spread out the probabilities a bit more... the higher the spread, the more even the probability distribution (this varies a lot with array size - but here's a simple example):

   arr = ['apples', 'oranges', 'bananas', 'guava']
   iterations = 10000
   4.times do |t|
     p = arr.probability(t+2)
     res = Array.new(arr.size).fill(0)
     iterations.times do |x|
       res[arr.weighted_random_index(p)] += 1
     end
     res2 = res.collect {|x| (x/iterations.to_f) * 100}
     puts "For probability spread #{t+2}"
     puts "Results #{res2.join(", ")} = #{res2.sum}"
     puts ""
   end

   For probability spread 2
   Results 49.71, 37.55, 11.35, 1.39 = 100.0

   For probability spread 3
   Results 43.44, 29.99, 17.79, 8.78 = 100.0

   For probability spread 4
   Results 48.3, 23.56, 16.73, 11.41 = 100.0

   For probability spread 5
   Results 54.22, 19.31, 14.65, 11.82 = 100.0


Here is a longer example:

    def weighted_random_index_example
      arr = ['apples', 'oranges', 'bananas', 'guava']
      puts "Sample array = [#{arr.join(",")}]"
      p = [0.5,0.25, 0.15, 0.10]
      puts "Probability that each will show up [#{p.join(', ')}]"
      puts "1000 runs..."
      res = Array.new(arr.size).fill(0)
      1000.times do |t|
        res[arr.weighted_random_index(p)] += 1
      end
      res2 = res.collect {|x| (x/1000.0) * 100}
      puts "Results:"
      4.times do |t|
        puts "    #{arr[t]}: #{res2[t]}%"
      end
      
      puts ""
      puts "You can also use more spread out probability arrays"
      p = arr.probability(3)
      puts "Probability that each will show up with a spread of 3 [#{p.join(', ')}]"
      puts "1000 runs..."
      res = Array.new(arr.size).fill(0)
      1000.times do |t|
        res[arr.weighted_random_index(p)] += 1
      end
      res2 = res.collect {|x| (x/1000.0) * 100}
      puts "Results:"
      4.times do |t|
        puts "    #{arr[t]}: #{res2[t]}%"
      end
      
      puts ""
      puts "You can also just get selected indexes"
      puts "arr.get_weighted_random_indexes(3,p) = [#{arr.get_weighted_random_indexes(3,p).join(', ')}]"
      
      puts "The probability spread will depend on the number of items in your array - for an array of 4 it looks like this:"
      8.times do |t|
        puts "  probability(#{t+2}):  [#{arr.probability(t+2).collect {|x| sprintf('%0.5f',x)}.join(', ')}]"
      end
      return nil
    end
    weighted_random_index_example

Outputs...
Sample array = [apples,oranges,bananas,guava]
Probability that each will show up [0.5, 0.25, 0.15, 0.1]
1000 runs...
Results:
    apples: 49.1%
    oranges: 38.3%
    bananas: 11.3%
    guava: 1.3%

You can also use more spread out probability arrays
Probability that each will show up with a spread of 3 [0.333333333333333, 0.111111111111111, 0.037037037037037, 0.0123456790123457]
1000 runs...
Results:
    apples: 43.6%
    oranges: 29.2%
    bananas: 18.0%
    guava: 9.2%

You can also just get selected indexes
arr.get_weighted_random_indexes(3,p) = [0, 1, 3]
The probability spread will depend on the number of items in your array - for an array of 4 it looks like this:
  probability(2):  [0.50000, 0.25000, 0.12500, 0.06250]


Comments on this post

haakon posts on Sep 26, 2007 at 10:40
This functionality is similar to something I wanted a while back. The main difference was that I did not want to pass in the weighting, but wanted the weighting to be a property of the objects in the array. Here is the code:

class Array
  def random_weighted(method)
    total = self.map(&method).sum
    
    running_total = 0
    index = rand(total) + 1

    #puts "INDEX: #{index}"
    
    # we assume it is already sorted in descending order, although I suppose order does not matter
    self.each do |item|
      running_total += item.send(method)
      return item if index <= running_total
    end
    
    # it is possible to arrive here if all the elements had weight of zero.  Handle this:
    return self[rand(self.size)]
  end
end


So, for a contrived example, I can now do something like:

>> [1,2,4,8].random_weighted(:to_i)
=> 8
>> [1,2,4,8].random_weighted(:to_i)
=> 8
>> [1,2,4,8].random_weighted(:to_i)
=> 8
>> [1,2,4,8].random_weighted(:to_i)
=> 8
>> [1,2,4,8].random_weighted(:to_i)
=> 4
>> [1,2,4,8].random_weighted(:to_i)
=> 4
>> [1,2,4,8].random_weighted(:to_i)
=> 8
>> [1,2,4,8].random_weighted(:to_i)
=> 4
>> [1,2,4,8].random_weighted(:to_i)
=> 4
>> [1,2,4,8].random_weighted(:to_i)
=> 1


Or, I could do:

>> %w(a one a_really_long_word ).random_weighted(:size)


These are of course contrived examples. I am using it to do a random weighted selection of things that get clicked on the most on a web site.

You need to create an account or log in to post comments to this site.


Click here to browse all 4861 code snippets

Related Posts