DC's Digital Archives Are Drowning in Duplicate Images — and the Numbers Tell a Costly Story
From the National Mall to Anacostia, Washington's public institutions are spending millions to store the same image files twice, three times, or more.
From the National Mall to Anacostia, Washington's public institutions are spending millions to store the same image files twice, three times, or more.
Washington's cultural and government institutions collectively hold tens of millions of digital image files — and a significant share of those files are exact or near-exact duplicates sitting in redundant storage systems, burning through budget allocations that administrators say they can scarcely afford under the current federal restructuring climate.
The problem has a name in archival circles: duplicate image proliferation. And in a city where the District of Columbia government, the Smithsonian Institution, the Library of Congress, and dozens of federally linked agencies all maintain their own digital asset management systems, the overlap has grown quietly catastrophic. Storage costs for uncompressed high-resolution image files — common in museum and archival work — can run $3,000 to $8,000 per terabyte annually when factoring in redundant backup infrastructure, according to industry benchmarks published by the Digital Preservation Coalition in 2024.
The District's own Office of the Chief Technology Officer, headquartered at 200 I Street SE, has flagged digital redundancy as a cost driver in internal efficiency reviews. Federal agencies clustered along the National Mall corridor — including the Smithsonian's 19 museums and galleries — operate largely siloed content repositories, meaning a single photograph of, say, the Martin Luther King Jr. Memorial taken on the same day by two different agency photographers may exist in four or five separate storage environments by the time it has been uploaded, backed up, and catalogued.
The Library of Congress, which manages one of the largest digital image collections in the world from its Capitol Hill campus, began a deduplication initiative in 2022 under its Digital Futures program. Industry analysts who track federal archival spending note that storage rationalization projects at institutions of that scale typically identify 15 to 30 percent of total image inventory as redundant or near-duplicate — a figure that, applied to a collection of 20 million digital objects, translates to millions of unnecessary file instances consuming active server space.
For the District government specifically, the tension with federal funding uncertainty matters directly here. Mayor Muriel Bowser's administration has been navigating a budget environment squeezed by reduced federal grants and the ripple effects of DOGE-driven workforce cuts that reduced contracting pipelines for DC-based tech firms. The DC Department of General Services, which oversees physical and digital infrastructure for municipal agencies, renewed its enterprise cloud storage contract in fiscal year 2025 — but budget documents reviewed by reporters indicated pressure to reduce storage expenditures by at least 12 percent before the end of fiscal year 2026, which ends September 30.
The issue isn't academic. The Martin Luther King Jr. Memorial Library at 901 G Street NW — the DC Public Library's central branch, reopened after a $211 million renovation in 2020 — runs a digitization lab that has processed tens of thousands of images related to DC history, including collections documenting Anacostia's pre-gentrification streetscape and the historic Barry Farm neighborhood. Staff there have identified duplicate entry problems stemming from multiple digitization passes performed by different contractors across different grant cycles.
The DC Archives, housed at 1300 Naylor Court NW in the Blagden Alley area, faces the same structural problem. Images digitized under one grant program get re-digitized under a subsequent program without cross-referencing the prior catalog, creating layered duplication that inflates both storage requirements and retrieval complexity.
Solutions do exist. Deduplication software — tools that generate perceptual hash values to identify visually identical or near-identical images regardless of filename — can reduce image storage loads by 20 to 40 percent in mixed institutional environments, according to case studies published by the Federal Agencies Digital Guidelines Initiative. Implementation costs typically run between $50,000 and $200,000 for a mid-size institutional rollout, with ongoing licensing fees.
For DC-area institutions watching every line item in a politically volatile funding year, the math is straightforward: pay once now for deduplication, or keep paying monthly for storage that is, in a measurable and documentable portion of cases, holding the exact same photograph twice. The next round of DC Technology Office efficiency reviews is scheduled for the fall budget cycle, starting October 1. Institutions that haven't audited their image inventories before then may find the decision made for them.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Washington DC
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News