Problem of the Week 1115

Randy the Distracted Researcher

Randy the Researcher is a good graduate student, so he likes his graphs. He would like to graph the relationship between IQ and weight for 9 Chihuahuas with surgically enhanced brains. Randy decides to use a bar chart with three bars. Each bar will display the mean IQ corresponding to the lightest, middle, and heaviest dogs respectively.

If the individual dog weights are unique, the natural binning is obvious. For example, given weights (in lbs) 10, 12, 30, 27, 13, 18, 15, 16, and 17 the natural binning would be:

light: 10, 12, 13
medium: 15, 16, 17
heavy: 18, 27, 30

When multiple Chihuahuas have equal weights, things get trickier. If the dogs have weight 10, 12, 16, 17, 18, 16, 22, 16, and 16, Randy believes the most even binning is:

light: 10, 12
medium: 16, 16, 16, 16
heavy: 17, 18, 22

Randy finds this to be an interesting problem -- much more interesting than designing uber-smart canines. He decides that the appropriate measure of the "evenness" of a binning is entropy, which is Σ(p*log2(p)), where p is the proportion of dogs falling into each of the bins.

For example, the entropy for the second set of bins (with four 16 lb'ers) would be:

[-2/9*log(2/9) - 4/9*log(4/9) - 3/9*log(3/9)] ≅ 1.530.....

Provide an algorithm to solve the general version of Randy's problem: given d dogs and b bins, calculate the assignment of dogs to bins that yields maximum entropy. Your binning must have the following properties:

  1. Each dog in bin i > 1 must have weight strictly less than each dog in bin i + 1.
  2. (A corollary of 1) All dogs with equal weight must be in the same bin.

You can assume that there are more distinct dog weights than bins. An O(m*n2) algorithm is possible. It might be helpful to have an undergraduate-level knowledge of algorithms.

© Copyright 2009 Dave Ehren and Shilad Sen. Reproduced with permission.



25 February 2009