Washington's public institutions are sitting on millions of duplicate digital images — redundant files clogging servers, inflating storage costs, and quietly undermining the city's push to modernize its archives. The problem isn't new, but a combination of federal restructuring under the Trump administration and tightened budgets has forced the conversation into the open in 2026.
At stake is more than hard-drive space. The District government, the National Archives at College Park, and the Library of Congress on Independence Avenue all maintain overlapping digitization programs. When agencies scan the same photograph twice, store competing versions on separate servers, or fail to reconcile databases after departmental mergers, the result is sprawling, redundant collections that cost real money to maintain and make genuine historical research harder.
Why the Problem Is Getting Worse Right Now
The DOGE-driven efficiency reviews hitting federal agencies since early 2025 have put a spotlight on storage infrastructure. The National Archives and Records Administration, which operates facilities in College Park, Maryland, and maintains reading rooms at the Pennsylvania Avenue building near the Capitol, has faced scrutiny over its data management practices as part of broader federal workforce restructuring. Contracts for cloud storage migration — some valued at tens of millions of dollars — have been examined for redundancies.
Meanwhile, the DC Office of the Chief Technology Officer, headquartered at One Judiciary Square on D Street NW, has been running its own audit of municipal digital assets since March 2026. The office has not published final figures, but the initiative covers records from agencies including the DC Department of General Services and the Office of Planning, both of which maintain image libraries tied to permitting, zoning, and historic preservation work in neighborhoods like Anacostia and NoMa — areas undergoing rapid physical transformation that generate high volumes of documentation.
Archivists and information scientists who work with these institutions describe the duplicate image issue as a structural one, not a staff failure. When departments digitize independently, without a shared metadata standard or a central deduplication protocol, copies multiply. A single photograph of, say, the 11th Street Bridge corridor might exist in three separate agency databases in different resolutions, with different file names and inconsistent date stamps.
What Specialists Are Recommending
The American Library Association and the Society of American Archivists — both of which have members working across DC's dense cluster of federal and municipal institutions — have published guidance in recent years calling for hash-based deduplication tools and unified metadata frameworks. The approach uses a digital fingerprint for each image file; if two files produce the same fingerprint, one is flagged as redundant. The technology is mature and widely used in the private sector.
The Smithsonian Institution, whose digitization program on the National Mall has scanned more than four million objects as of its most recent public report, uses proprietary systems to cross-reference files before ingest. That model is increasingly cited by DC-area archivists as a benchmark for what municipal and federal agencies could adopt more broadly.
Cost is the sticking point. Cloud storage for large image files — particularly high-resolution TIFFs used in preservation-grade digitization — runs roughly $0.02 per gigabyte per month on major platforms, a figure that compounds fast across collections numbering in the hundreds of terabytes. Eliminating duplicate files across a mid-size agency archive can reduce storage overhead by 20 to 40 percent, according to published case studies from peer institutions in New York and London.
The DC Public Library system, which operates the Washingtoniana Collection at the Martin Luther King Jr. Memorial Library on G Street NW, completed a partial deduplication review of its digital photograph holdings in late 2025. The library has not released a public report on outcomes, but the review was cited in the Office of the Chief Technology Officer's March 2026 audit framework as a local model worth examining.
For anyone working with DC's digital archives right now — researchers, journalists, historians mapping Anacostia's development or NoMa's transformation — the practical advice is straightforward: cross-reference what you find in one database against at least one other before treating any single record as definitive. The clean-up is underway. It isn't finished.