Min-wise Independent Permutations

We define and study the notion of min-wise independent families of permutations. We say that ${\cal F} \subseteq S_n$ is {\em min-wise independent} if for any set $X \subseteq [n]$ and any $x \in X$, when $\pi$ is chosen at random in ${\cal F}$ we have $$\Pr\bigl(\min\{\pi(X)\} = \pi(x)\bigr) = {1\over |X|}.$$ In other words we require that all the elements of any fixed set $X$ have an equal chance to become the minimum element of the image of $X$ under $\pi$.

Our research was motivated by the fact that such a family (under some relaxations) is essential to the algorithm used in practice by the AltaVista web index software to detect and filter near-duplicate documents. However, in the course of our investigation we have discovered interesting and challenging theoretical questions related to this concept -- we present the solutions to some of them and we list the rest as open problems.

Originally appeared in the Proceedings of the 30th ACM Symposium on Theory of Computing, pp. 327-336, 1998. Invited to appear in the journal special issue of JCSS.