Ph.D. Qual Study Assistant: Closest Pair of Points

1. The Problem

Given an array of `n` points in a 2D plane, `P = {p1, p2, ..., pn}`, where each point `pi` has coordinates `(xi, yi)`, find the pair of points `(pa, pb)` that are closest to each other. The distance is the standard Euclidean distance.

2. The Idea: Divide and Conquer

A brute-force approach would compare every pair of points, taking O(n²) time. To achieve O(n log n), we use a divide-and-conquer strategy.

Pre-computation: Sort the array of points `P` based on their x-coordinates to get `Px`.
Divide: Find a vertical line `L` (based on the median x-coordinate) that splits the points into two equal-sized sets: `Q` (left half) and `R` (right half).
Conquer: Recursively find the closest pair distance in the left half, `d_Q`, and in the right half, `d_R`. Let `d = min(d_Q, d_R)`.
Combine: This is the crucial step. The closest pair could be a "split pair," with one point in `Q` and the other in `R`. We don't need to check all pairs across the divide. A closer pair can only exist within a vertical strip of width `2d` centered on the line `L`. Any pair of points further apart than this cannot be closer than `d`.

The Strip Optimization

1. Create an array `S` of all points that lie within this `2d` strip.
2. Sort the points in `S` by their y-coordinate.
3. For each point `p` in `S`, we only need to check its distance to a small, constant number of its neighbors that follow it in the sorted strip array. It can be proven that we only need to check at most the next 7 points. This is because any two points in one of the `d x d` sub-squares of the strip that are farther apart in the y-sorted list must be more than `d` distance apart. This makes the combine step linear, not quadratic.

3. The Algorithm

Create two copies of the points: `Px` (sorted by x) and `Py` (sorted by y).
Define a recursive function `findClosest(Px, Py)`:
- Base Case: If `|Px| <= 3`, find the closest pair by brute force and return the distance.
- Divide: Split `Px` into left half `Qx` and right half `Rx`. Split `Py` into `Qy` and `Ry` accordingly.
- Conquer: `d = min(findClosest(Qx, Qy), findClosest(Rx, Ry))`.
- Combine:
  - Find the median x-coordinate `x_mid`.
  - Create a strip array `S` containing points from `Py` where `|p.x - x_mid| < d`.
  - Iterate through `S`. For each point `p_i`, compare it with `p_j` for `j > i` as long as `p_j.y - p_i.y < d`. Update `d` if a smaller distance is found.
- Return `d`.

4. Complexity Analysis

Time Complexity: O(n log n): The initial sort takes O(n log n). The recursive part follows the recurrence relation T(n) = 2T(n/2) + O(n). The O(n) term comes from the "combine" step, where creating and iterating through the strip takes linear time. By the Master Theorem, this recurrence solves to O(n log n). Therefore, the total time complexity is dominated by the sorting and recursive steps, resulting in O(n log n).
Space Complexity: O(n): Storing the `Px` and `Py` arrays requires O(n) space. The recursive calls create a call stack of depth O(log n). The strip array `S` can have up to `n` points in the worst case. Thus, the space complexity is O(n).

1. The Problem

2. The Idea: Divide and Conquer

The Strip Optimization

3. The Algorithm

4. Complexity Analysis

Algorithm Visualization