Natural Merge Sort

Adaptive merge sort that detects existing sorted runs and merges them.

Natural Merge Sort

Natural merge sort improves merge sort by exploiting existing order in the input. Instead of splitting blindly, it scans the array to find already sorted runs, then merges those runs.

This approach adapts to partially sorted data and can reduce the number of passes.

Problem

Given an array $A$ of length $n$, reorder it such that:

$$ A[0] \le A[1] \le \dots \le A[n-1] $$

Idea

A run is a maximal nondecreasing subsequence:

$$ A[i] \le A[i+1] \le \dots \le A[j] $$

Natural merge sort first identifies such runs, then merges adjacent runs until only one remains.

Algorithm

  1. Scan the array and partition it into runs
  2. Merge adjacent runs pairwise
  3. Repeat until one run remains
natural_merge_sort(A):
    runs = []

    i = 0
    while i < n:
        j = i
        while j + 1 < n and A[j] <= A[j + 1]:
            j += 1
        runs.append((i, j))
        i = j + 1

    while length(runs) > 1:
        new_runs = []
        for k from 0 to length(runs) step 2:
            if k + 1 < length(runs):
                merged = merge(A, runs[k], runs[k+1])
                new_runs.append(merged)
            else:
                new_runs.append(runs[k])
        runs = new_runs

Example

Let:

$$ A = [1, 3, 5, 2, 4, 6] $$

Detected runs:

  • [1, 3, 5]
  • [2, 4, 6]

Merge:

  • [1, 3, 5] + [2, 4, 6] → [1, 2, 3, 4, 5, 6]

Only one pass is needed.

Correctness

Each run is already sorted. Merging two sorted runs produces a sorted result. Repeated merging eventually produces a single sorted run covering the entire array.

Complexity

Worst case occurs when the input has no structure:

$$ T(n) = O(n \log n) $$

Best case occurs when the array is already sorted:

$$ T(n) = O(n) $$

case time
best $O(n)$
worst $O(n \log n)$

Space:

$$ O(n) $$

Properties

property value
stable yes
adaptive yes
in-place no
practical use foundation of Timsort

Notes

Natural merge sort benefits from real world data, which often contains sorted segments. It reduces unnecessary work compared to fixed splitting strategies.

Modern algorithms such as Timsort extend this idea by using more sophisticated run detection and merging strategies.

Implementation

def natural_merge_sort(a):
    n = len(a)
    temp = [0] * n

    def merge(l, mid, r):
        i, j, k = l, mid + 1, l
        while i <= mid and j <= r:
            if a[i] <= a[j]:
                temp[k] = a[i]
                i += 1
            else:
                temp[k] = a[j]
                j += 1
            k += 1
        while i <= mid:
            temp[k] = a[i]
            i += 1
            k += 1
        while j <= r:
            temp[k] = a[j]
            j += 1
            k += 1
        for i in range(l, r + 1):
            a[i] = temp[i]

    while True:
        runs = []
        i = 0
        while i < n:
            j = i
            while j + 1 < n and a[j] <= a[j + 1]:
                j += 1
            runs.append((i, j))
            i = j + 1

        if len(runs) == 1:
            break

        for k in range(0, len(runs), 2):
            if k + 1 < len(runs):
                l1, r1 = runs[k]
                l2, r2 = runs[k + 1]
                merge(l1, r1, r2)

    return a
func NaturalMergeSort(a []int) {
	n := len(a)
	temp := make([]int, n)

	var merge func(int, int, int)
	merge = func(l, mid, r int) {
		i, j, k := l, mid+1, l

		for i <= mid && j <= r {
			if a[i] <= a[j] {
				temp[k] = a[i]
				i++
			} else {
				temp[k] = a[j]
				j++
			}
			k++
		}

		for i <= mid {
			temp[k] = a[i]
			i++
			k++
		}

		for j <= r {
			temp[k] = a[j]
			j++
			k++
		}

		for i := l; i <= r; i++ {
			a[i] = temp[i]
		}
	}

	for {
		runs := [][2]int{}
		i := 0

		for i < n {
			j := i
			for j+1 < n && a[j] <= a[j+1] {
				j++
			}
			runs = append(runs, [2]int{i, j})
			i = j + 1
		}

		if len(runs) == 1 {
			break
		}

		for k := 0; k < len(runs); k += 2 {
			if k+1 < len(runs) {
				l1, r1 := runs[k][0], runs[k][1]
				l2, r2 := runs[k+1][0], runs[k+1][1]
				merge(l1, r1, r2)
			}
		}
	}
}