Project Color

Overview

Sergei Mikhailovich Prokudin-Gorskii (1863–1944) was a visionary color photographer in the early 20th century. In 1907, he secured permission from the Tzar to document the Russian Empire in color, using a groundbreaking method of recording three exposures on glass plates with red, green, and blue filters. His collection includes the only color portrait of Leo Tolstoy, along with thousands of images of people, landscapes, and architecture. Although his dream of projecting these images in classrooms across Russia never materialized, his glass plate negatives survived and were later preserved by the Library of Congress, offering a rare glimpse of pre-revolutionary Russia.

In this project, I aim to colorize the Prokudin-Gorskii collection by aligning the three individual glass plate negatives (red, green, and blue) to reconstruct and produce vibrant, colored images from the early 20th century with as few visual artifacts as possible.

An example of three glass plate images from Produkin-Gorskii collection can be seen on the right and the final aligned and produced image can be seen below.

Final aligned and produced image

Three glass plate images from Prokudin-Gorskii collection

Approach

Initial Approach

The most straightforward approach to align the color channels is by keeping a reference channel (I chose blue) and exhaustively searching over a [15, 15] pixel window for optimal translations of the target channels (green and red). This method requires a metric to compute similarity between the reference and target channels. I experimented with Sum of Squared Differences (SSD), Normalized Cross Correlation (NCC), and Normalized Mutual Information (NMI). Among these, NMI performed best in my tests (as can be seen below), likely due to its robustness in handling varying intensity distributions across channels. Thus, I chose NMI.

SSD Result — Sum of Squared Differences (SSD)

NCC Result — Normalized Cross Correlation (NCC)

NMI Result with 10% cropped — Normalized Mutual Information (NMI) (with 10% cropped)

To counteract the border artifacts, I cropped 10% from all sides of the final image. I also tried cropping before aligning but cropping after produced better results.

This brute-force strategy works well for smaller images but becomes inefficient for larger ones where the optimal alignment might fall outside the search window. Increasing the window size would lead to prohibitively long processing times. Thus, we need a more efficient strategy for larger .tif images, but here are the results for smaller images from the same collection with their optimal alignments (G, R):

Cathedral Image Result — Cathedral: G(2, 5), R(3, 12)

Monastery Image Result — Monastery: G(2, -3), R(2, 3)

Tobolsk Image Result — Tobolsk: G(2, 3), R(3, 6)

Pyramid Approach

For larger images, we could do a pyramid-based approach to efficiently handle larger displacements. This method starts by downscaling the image by powers of 2 until its smallest dimension is around 200 pixels. The alignment process begins at the coarsest level, using a larger search window of [-50, 50] pixels to accommodate potentially significant misalignments. As we progress to finer levels, the algorithm refines the alignment using a smaller [-2, 2] pixel window around the previously computed shift.

This coarse-to-fine strategy significantly reduces computation time while maintaining accuracy. It allows for efficient handling of large displacements that would be computationally prohibitive with the brute-force method. As with the initial approach, I used Normalized Mutual Information (NMI) as the similarity metric due to its robust performance across varying channel intensities.

To enhance the alignment quality for these high-resolution images, I implemented additional preprocessing steps. The input image is converted to float32 and normalized to the [0, 1] range before alignment. After alignment, I crop 15% from all sides to remove potential border artifacts. Here are some results from larger .tif images in the collection, showing the effectiveness of this approach:

Train Image Result — Church: G(-6, 24), R(-7, 55)

Harvest Image Result — Sculpture: G(-11, 33), R(-27, 140)

Icon Image Result — Icon: G(17, 41), R(23, 90)

The pyramid approach successfully aligned these large images, handling significant displacements that would have been missed by the initial approach's smaller search window. The results demonstrate the method's ability to produce vibrant, well-aligned color images from Prokudin-Gorskii's glass plate negatives, even for high-resolution scans with large misalignments.

Results

Here are the rest of the results from the Prokudin-Gorskii collection.

Images of the Russian Empire:

Colorizing the Prokudin-Gorskii photo collection

Atahan Ozdemir

Overview

Approach

Initial Approach

Pyramid Approach

Results