DC's Digital Archive Problem: The Numbers Behind a City's Duplicate Image Crisis
Washington's public agencies and cultural institutions are sitting on millions of redundant digital files — and the storage bill is quietly climbing.
Washington's public agencies and cultural institutions are sitting on millions of redundant digital files — and the storage bill is quietly climbing.

Washington DC's government agencies and federally linked cultural institutions collectively store an estimated tens of millions of digital image files, and a significant share of those files are duplicates — identical or near-identical copies consuming server space, slowing archival retrieval, and costing taxpayers real money. The scale of the problem is drawing fresh scrutiny in a city already rattled by federal efficiency mandates and budget uncertainty.
The timing matters. The Trump administration's DOGE restructuring effort has pushed federal and District agencies alike to account for every line of IT spending. For a city where the municipal government runs its own digital infrastructure alongside dozens of federally adjacent institutions, the hidden cost of unmanaged digital redundancy has become harder to ignore. Storage is not free, and in a government environment, the cost compounds across contracts, backup systems, and compliance requirements.
Industry benchmarks for large public-sector digital archives suggest that between 20 and 40 percent of stored image files in unmanaged repositories are exact or near-exact duplicates. For an institution like the Smithsonian Institution — which operates 21 museums and research centers across the National Mall and beyond, and has digitized millions of objects through its Open Access program launched in February 2020 — even a conservative duplication rate translates into substantial wasted storage. The Smithsonian's Open Access release alone made 4.7 million images publicly available, according to figures the institution published at the time of the launch. Multiply that by internal working copies, backup generations, and version iterations, and the actual stored file count runs far higher.
At the District government level, the DC Office of the Chief Technology Officer manages data infrastructure for dozens of city agencies operating out of buildings from the Wilson Building on Pennsylvania Avenue NW to the Department of Public Works facilities in Ivy City. Municipal records management guidelines require agencies to retain photographic documentation for code enforcement, permitting, and public safety — categories that generate high volumes of images with significant overlap. A single block of construction activity in NoMa or along the Anacostia waterfront can generate hundreds of near-duplicate site inspection photographs filed across multiple departments.
Cloud storage costs for government contracts typically run between $0.02 and $0.05 per gigabyte per month under standard federal schedule pricing, according to General Services Administration rate structures. A single uncompressed RAW image file can exceed 25 megabytes. At that rate, one million redundant images stored for a year represents a storage cost that, while modest per file, accumulates into a budget line that DOGE-era auditors are now trained to flag.
Deduplication software has existed for years. Tools that scan repositories, generate hash values for each file, and identify bit-for-bit or perceptual duplicates are widely used in the private sector. The obstacle in government settings is rarely technical. It is procedural. Many DC agencies lack unified digital asset management systems, meaning images live in siloed drives across departments, some hosted locally and some through contracts with vendors including Microsoft Azure and Amazon Web Services. Merging those silos requires legal sign-off on retention schedules, IT coordination across agencies, and in some cases Congressional approval for federally administered collections.
The DC Public Library system, which maintains digital collections across its 26 branch locations including the Martin Luther King Jr. Memorial Library on G Street NW — reopened after a major renovation in 2020 — has moved toward centralized digital asset management in recent years. Smaller agencies have not followed at the same pace.
For District residents and federal watchdogs watching the DOGE process, the practical upshot is straightforward. Agencies that complete deduplication audits before the next budget cycle can credibly document storage cost reductions, which gives them negotiating leverage in a funding environment that is penalizing perceived inefficiency. The Mayor's Office of Budget and Finance is scheduled to present its fiscal year 2027 budget framework to the DC Council later this fall. Agencies that arrive with documented savings will be in a stronger position than those that do not.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Washington DC
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News