Sunday, March 15, 2009

Shuffle shuffle

Several weeks ago, I found myself thinking, how I could shuffle a list/array given to me in a random order. This is a typically commonplace thing to do in several applications: online card games, your favourite music player etc.

The interesting thing about this problem is how some naive approaches, even though easy to code and efficient enough, do not achieve the desired randomness.

Lets define the problem: You have an input array which has elements in positions 1 through n. The objective is to produce a random permutation of the array. By a random permutation, we mean to say that each permutation of the array is equally likely. So a truly randomized algorithm will generate each permutation with a probability of 1/n!

The first approach that comes to mind is the naive approach of generating a random number between 1 and n for each element in the array and placing the element at the position indicated by the random number. This could be accomplished by using another auxillary array and placing elements into the new positions generated. The pseudocode could look something like the following:

NaiveShuffle(A[1...n], B[1....n])            //randomly permutes the elements in array A
for i from 1 to n
random <- RandomNumber(1,n)
B[random] <- A[i]
for i from 1 to n
A[i] <- B[i]



The above approach is very crude in the sense it uses an auxillary array and also parses through the array twice instead of just once. Still, the worst case running time of the above algorithm is O(n).

Random(1, n) generates a random number 1 and n (both inclusive). We assume that the random number generator generates truly random numbers in the interval specified. Also, we assume some kind of collision resolution mechanism. We are assuming that this method returns a random number in O(1) time.

All that said, a O(n) algorithm is not bad at all for this purpose. There is only one problem, this algorithm is WRONG! (hah...fat chance, didn't we name it a naive algorithm?). Why is that? This is because in every iteration, we generate n possible choices. Since there are n such iterations, the total number of permutations generated is n.n.n.n.......n times = n^n

The  total number of permutations possible while shuffling an array is n!. Since n^n is not exactly divisible by n!, there have to be some permutations which appear more frequently than the others (basic pigeonhole principle). Thus this naive algorithm does not generate truly random permutations.

Lets try something else. We generate a random priority between 1 and n^3 the Random(1, n^3)  routine, and assign a it to each element in the array. Then sort the array based on these weights.

e.g. if the original array is A<1,2,3,4> and we generate priorities randomly as P<34,56,8,77>, then when we sort array A based on the increasing order of the priorities assigned using array P, then we get the shuffled array as <3, 1, 2, 4>.

We used the interval [1, n^3] to generate priorities so as to reduce collisions. I will not delve into the correctness of this algorithm. The running time of this shuffling by sorting algorithm depends on which sorting algorithm we use. Typically it is Big-Theta(n lg n).

Another approach to solve this problem, is to shuffle by swapping:
Shuffle[1...n]
for i from 2 to n
temp = Generate(x from 1 to i)
swap(i,temp)


Notice that the interval for generating random numbers keeps decreasing. For the first iteration, there are n choices. For the 2nd iteration, there are (n-1) choices and so on.

Hence the total number of choices generated by this algorithm is: n(n-1)(n-2).....3.2.1 = n!

which is exactly equal to the total number of possible permutations. Moreover, since we iterate over the array only once, this algorithm runs in O(n) time.