Selecting a random item is simple enough
arr = ['apples', 'oranges', 'bananas', 'kiwis'] arr[rand(arr.size)]
But say you want to have apples come up 50% of the time, oranges 25%, bananas 15% and kiwis 10% ...
class Array #sum (and mean) found on http://snippets.dzone.com/posts/show/2161 def sum inject( nil ) { |sum,x| sum ? sum+x : x } end def probability(spread = 2) z = 1.0 collect {|x| z = z / spread} end def weighted_random_index(probability_array = probability(2) ) size.times do |x| return x if rand < probability_array[0..x].sum end return 0 end def get_weighted_random_item(probability_array = probability(2)) self[weighted_random_index(probability_array)] end def get_weighted_random_indexes(num_items, p = probability(2)) res = [] escape = 1000 while res.size < num_items and escape > 0 escape -= 1 tmp = weighted_random_index(p) res << tmp unless res.include?(tmp) end return res.sort end end arr = ['apples', 'oranges', 'bananas', 'kiwis'] 10.times do |i| puts arr.get_weighted_random_item([0.5, 0.25, 0.15, 0.10]) end oranges oranges apples apples bananas apples apples bananas oranges apples
I'm sure there's some math term for this but I'm calling it 'spread' until I learn otherwise. I put in a default probability array of p = p / 2 which gives you (for an array of 4)
arr.probability => [0.5, 0.25, 0.125, 0.0625]
On larger arrays the probabilities will get (for all practicle purposes) zero quickly
arr = [1,2,3,4,5,6,7,8,9] => [1, 2, 3, 4, 5, 6, 7, 8, 9] arr.probability => [0.5, 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125]
I put in a simple spread concept to spread out the probabilities a bit more... the higher the spread, the more even the probability distribution (this varies a lot with array size - but here's a simple example):
arr = ['apples', 'oranges', 'bananas', 'guava'] iterations = 10000 4.times do |t| p = arr.probability(t+2) res = Array.new(arr.size).fill(0) iterations.times do |x| res[arr.weighted_random_index(p)] += 1 end res2 = res.collect {|x| (x/iterations.to_f) * 100} puts "For probability spread #{t+2}" puts "Results #{res2.join(", ")} = #{res2.sum}" puts "" end For probability spread 2 Results 49.71, 37.55, 11.35, 1.39 = 100.0 For probability spread 3 Results 43.44, 29.99, 17.79, 8.78 = 100.0 For probability spread 4 Results 48.3, 23.56, 16.73, 11.41 = 100.0 For probability spread 5 Results 54.22, 19.31, 14.65, 11.82 = 100.0
Here is a longer example:
def weighted_random_index_example
arr = ['apples', 'oranges', 'bananas', 'guava']
puts "Sample array = [#{arr.join(",")}]"
p = [0.5,0.25, 0.15, 0.10]
puts "Probability that each will show up [#{p.join(', ')}]"
puts "1000 runs..."
res = Array.new(arr.size).fill(0)
1000.times do |t|
res[arr.weighted_random_index(p)] += 1
end
res2 = res.collect {|x| (x/1000.0) * 100}
puts "Results:"
4.times do |t|
puts " #{arr[t]}: #{res2[t]}%"
end
puts ""
puts "You can also use more spread out probability arrays"
p = arr.probability(3)
puts "Probability that each will show up with a spread of 3 [#{p.join(', ')}]"
puts "1000 runs..."
res = Array.new(arr.size).fill(0)
1000.times do |t|
res[arr.weighted_random_index(p)] += 1
end
res2 = res.collect {|x| (x/1000.0) * 100}
puts "Results:"
4.times do |t|
puts " #{arr[t]}: #{res2[t]}%"
end
puts ""
puts "You can also just get selected indexes"
puts "arr.get_weighted_random_indexes(3,p) = [#{arr.get_weighted_random_indexes(3,p).join(', ')}]"
puts "The probability spread will depend on the number of items in your array - for an array of 4 it looks like this:"
8.times do |t|
puts " probability(#{t+2}): [#{arr.probability(t+2).collect {|x| sprintf('%0.5f',x)}.join(', ')}]"
end
return nil
end
weighted_random_index_example
Outputs...
Sample array = [apples,oranges,bananas,guava]
Probability that each will show up [0.5, 0.25, 0.15, 0.1]
1000 runs...
Results:
apples: 49.1%
oranges: 38.3%
bananas: 11.3%
guava: 1.3%
You can also use more spread out probability arrays
Probability that each will show up with a spread of 3 [0.333333333333333, 0.111111111111111, 0.037037037037037, 0.0123456790123457]
1000 runs...
Results:
apples: 43.6%
oranges: 29.2%
bananas: 18.0%
guava: 9.2%
You can also just get selected indexes
arr.get_weighted_random_indexes(3,p) = [0, 1, 3]
The probability spread will depend on the number of items in your array - for an array of 4 it looks like this:
probability(2): [0.50000, 0.25000, 0.12500, 0.06250]
So, for a contrived example, I can now do something like:
Or, I could do:
These are of course contrived examples. I am using it to do a random weighted selection of things that get clicked on the most on a web site.