Reservoir Sampling
Reservoir sampling is and often disguised problem in Google interviews:
There is a linked list of numbers of length N. N is very large and you don’t know N. You have to write a function that will return k random numbers from the list. Numbers should be completely random. Hint: 1. Use random function rand() (returns a number between 0 and 1) and irand() (return either 0 or 1) 2. It should be done in O(n).
or
You have a stream of infinite queries (ie: real time Google search queries that people are entering). Describe how you would go about finding a good estimate of 1000 samples from this never ending set of data and then write code for it.
And here’s some C/C++ code:
typedef struct list {int val; list* next;} list;
void select(list* l, int* v, int n) {
int k = 0;
while (l != NULL) {
if (k < n) {
v[k] = l->val;
} else {
int c = rand() % (k + 1);
if (c < n) v[c] = l->val;
}
l = l->next;
k++;
}
}