Whilst creating a natural language parser, one of the things I was presented with was multiple merged dictionaries, which needed some processing. I was asked to supply a list of duplicated words back, and after trawling the web finding slow code, I decided that going the fast, but inefficient (in terms of space) was the way to go.
The object is to return an array of just those elements that are duplicated in as little time as possible, from a 50,000 word list. One sort and one lookup per element was what I came up with
1
2 array=["long","array","with","lots","and","lots","and","lots","of","array","duplicates","long","and","array"]
3 arr2=array.sort
4 arr3=[]
5 arr4=[]
6 arr2.each { |a| (arr3[-1]==a) ? (arr4[-1]!=a) ? arr4 << a : "" : arr3 << a }
7 arr3 => ["and","array","duplicates","long","lots","of","with"]
8 arr4 => ["and","array","long","lots"]
If you are sure that there are fewer duplicates in the returned array of just duplicated elements, you can remove the arr4 checks at the expense of a final .uniq! pass. i.e.
1
2 array=["long","array","with","lots","and","lots","and","lots","of","array","duplicates","long","and","array"]
3 arr2=array.sort
4 arr3=[]
5 arr4=[]
6 arr2.each { |a| (arr3[-1]==a) ? arr4 << a : arr3 << a }
7 arr4.uniq!
8
9 arr3 => ["and","array","duplicates","long","lots","of","with"]
10 arr4 => ["and","array","long","lots"]