Reservoir Sampling

08 Oct 2011 | By Mihai Roman | Tags sampling probability algorithms google C/C++

Reservoir sampling is and often disguised problem in Google interviews:

There is a linked list of numbers of length N. N is very large and you don’t know N. You have to write a function that will return k random numbers from the list. Numbers should be completely random. Hint: 1. Use random function rand() (returns a number between 0 and 1) and irand() (return either 0 or 1) 2. It should be done in O(n).

or

You have a stream of infinite queries (ie: real time Google search queries that people are entering). Describe how you would go about finding a good estimate of 1000 samples from this never ending set of data and then write code for it.

And here’s some C/C++ code:

typedef struct list {int val; list* next;} list;

void select(list* l, int* v, int n) {
  int k = 0;
  while (l != NULL) {
    if (k < n) {
      v[k] = l->val;
    } else {
      int c = rand() % (k + 1);
      if (c < n) v[c] = l->val;
    }
    l = l->next;
    k++;
  }
}

blog comments powered by Disqus