Say you have an array of ['apples', 'oranges', 'bananas', 'kiwis']
Selecting a random item is simple enough
arr = ['apples', 'oranges', 'bananas', 'kiwis']
arr[rand(arr.size)]
But say you want to have apples come up 50% of the time, oranges 25%, bananas 15% and kiwis 10% ...
class Array
def sum
inject( nil ) { |sum,x| sum ? sum+x : x }
end
def probability(spread = 2)
z = 1.0
collect {|x| z = z / spread}
end
def weighted_random_index(probability_array = probability(2) )
size.times do |x|
return x if rand < probability_array[0..x].sum
end
return 0
end
def get_weighted_random_item(probability_array = probability(2))
self[weighted_random_index(probability_array)]
end
def get_weighted_random_indexes(num_items, p = probability(2))
res = []
escape = 1000
while res.size < num_items and escape > 0
escape -= 1
tmp = weighted_random_index(p)
res << tmp unless res.include?(tmp)
end
return res.sort
end
end
arr = ['apples', 'oranges', 'bananas', 'kiwis']
10.times do |i|
puts arr.get_weighted_random_item([0.5, 0.25, 0.15, 0.10])
end
oranges
oranges
apples
apples
bananas
apples
apples
bananas
oranges
apples
I'm sure there's some math term for this but I'm calling it 'spread' until I learn otherwise. I put in a default probability array of p = p / 2 which gives you (for an array of 4)
arr.probability
=> [0.5, 0.25, 0.125, 0.0625]
On larger arrays the probabilities will get (for all practicle purposes) zero quickly
arr = [1,2,3,4,5,6,7,8,9]
=> [1, 2, 3, 4, 5, 6, 7, 8, 9]
arr.probability
=> [0.5, 0.25, 0.125, 0.0625, 0.03125, 0.015625, 0.0078125, 0.00390625, 0.001953125]
I put in a simple spread concept to spread out the probabilities a bit more... the higher the spread, the more even the probability distribution (this varies a lot with array size - but here's a simple example):
arr = ['apples', 'oranges', 'bananas', 'guava']
iterations = 10000
4.times do |t|
p = arr.probability(t+2)
res = Array.new(arr.size).fill(0)
iterations.times do |x|
res[arr.weighted_random_index(p)] += 1
end
res2 = res.collect {|x| (x/iterations.to_f) * 100}
puts "For probability spread #{t+2}"
puts "Results #{res2.join(", ")} = #{res2.sum}"
puts ""
end
For probability spread 2
Results 49.71, 37.55, 11.35, 1.39 = 100.0
For probability spread 3
Results 43.44, 29.99, 17.79, 8.78 = 100.0
For probability spread 4
Results 48.3, 23.56, 16.73, 11.41 = 100.0
For probability spread 5
Results 54.22, 19.31, 14.65, 11.82 = 100.0
Here is a longer example:
def weighted_random_index_example
arr = ['apples', 'oranges', 'bananas', 'guava']
puts "Sample array = [#{arr.join(",")}]"
p = [0.5,0.25, 0.15, 0.10]
puts "Probability that each will show up [#{p.join(', ')}]"
puts "1000 runs..."
res = Array.new(arr.size).fill(0)
1000.times do |t|
res[arr.weighted_random_index(p)] += 1
end
res2 = res.collect {|x| (x/1000.0) * 100}
puts "Results:"
4.times do |t|
puts " #{arr[t]}: #{res2[t]}%"
end
puts ""
puts "You can also use more spread out probability arrays"
p = arr.probability(3)
puts "Probability that each will show up with a spread of 3 [#{p.join(', ')}]"
puts "1000 runs..."
res = Array.new(arr.size).fill(0)
1000.times do |t|
res[arr.weighted_random_index(p)] += 1
end
res2 = res.collect {|x| (x/1000.0) * 100}
puts "Results:"
4.times do |t|
puts " #{arr[t]}: #{res2[t]}%"
end
puts ""
puts "You can also just get selected indexes"
puts "arr.get_weighted_random_indexes(3,p) = [#{arr.get_weighted_random_indexes(3,p).join(', ')}]"
puts "The probability spread will depend on the number of items in your array - for an array of 4 it looks like this:"
8.times do |t|
puts " probability(#{t+2}): [#{arr.probability(t+2).collect {|x| sprintf('%0.5f',x)}.join(', ')}]"
end
return nil
end
weighted_random_index_example
Outputs...
Sample array = [apples,oranges,bananas,guava]
Probability that each will show up [0.5, 0.25, 0.15, 0.1]
1000 runs...
Results:
apples: 49.1%
oranges: 38.3%
bananas: 11.3%
guava: 1.3%
You can also use more spread out probability arrays
Probability that each will show up with a spread of 3 [0.333333333333333, 0.111111111111111, 0.037037037037037, 0.0123456790123457]
1000 runs...
Results:
apples: 43.6%
oranges: 29.2%
bananas: 18.0%
guava: 9.2%
You can also just get selected indexes
arr.get_weighted_random_indexes(3,p) = [0, 1, 3]
The probability spread will depend on the number of items in your array - for an array of 4 it looks like this:
probability(2): [0.50000, 0.25000, 0.12500, 0.06250]