There’s a quiz on Leetcode that asked: “What is the fastest an arbitrary array can be converted into a heap?”
My first response was O(n log n), where n being the number of elements in the array. Each call for heapification costs O(log n), and we need to make n calls corresponding to n nodes, so the overall time complexity is O(n log n), isn’t it?
But the correct answer was O(n). On second thought, the running time above assumes that each node takes the same time to process, O(log n) time. Does it really take that much time?
I’m going to assume that the heap is mmin heap because the analysis of the min heap and max heap are the same. How do we convert an arbitrary array into a heap? There are two ways:
- We create an empty heap. The iterate through each element of the array and insert them one by one into the heap.
- Heapify directly
The first method indeed takes O(n log n) time. But we can do better than that with heapification. This means treating the array as a binary tree and rearrange nodes so that the root node is the smallest element in the array, and each node is smaller than its children nodes. The idea is to compare a node with its children nodes. If it is larger than one child node, swap it with the child node. If it is larger than both of its children nodes, swap it with the smaller child node.
We can observe that the leaf nodes have no children, so we don’t need to do any comparison and heapify them. Knowing this, we start with the last non-leaf node, heapify it and repeat in reverse level order traversal.
Here is an example heapification.

Since not all nodes need to be heapified, the time complexity will be definitely smaller than O(n log n), but how do we derive at O(n) time?
Comparing a node with its children takes constant time, but we may have to swap a node multiple times in order to put it in the right place. So the running time is bounded by the maximum number of times we swap all the nodes in the binary tree. This number is the summation of the maximum number of times we swap nodes at each level.
Notice the word “maximum”, because some nodes may be smaller than both of its children, thus requiring zero swapping.
At height 0 (the leaf node level), we have n/2 nodes, and each node requires 0 swap: n / 2 * 0.
At height 1, we have n/2^2 nodes. Each node can only be swapped with the leaf nodes below, so it requires at most 1 swap. Maximum possible swaps: n/2^2 * 1.
At height 2, we have n/2^3 nodes. Each node can be swaped with nodes in the two levels below, so it requires at most 2 swaps. Maximum possible swaps: n/2^3 * 2.

You get the gist! What is the maximum number of swaps at height i? n/2^(i+1) * i.
Now is the proof.

Justification for each step:
- (1) Represent the sum as an infinite series
- (2) Extract n because it is a constant number
- (3) Multiply the series by 1/2, then 2 so that it is essentially the same
- (4) Calculate 2/2^(i + 1) = 2^i
- (5) Expand sum into an arithmetico-geometric sequence
You can see that in the arithmetico-geometric sequence, the first term is 0, second term is 1/2, thrid term is also half. The next terms get increasingly smaller. When n converges to infinity, the summation of the elements of this sequence is proven to converge to 2. In the end, the time complexity is O(n).