Washington's public institutions are sitting on a quiet, unglamorous problem: their digital image archives are bloated with duplicates, and the bill for storing them keeps climbing. City technology officers and archivists have been raising the issue in recent months, pushing for systematic deduplication policies before federal funding uncertainties make the situation worse.
The timing matters because the Trump administration's restructuring of the federal workforce — and the DOGE-driven efficiency reviews sweeping agencies along the National Mall corridor — has put every line of municipal IT spending under sharper scrutiny. For DC agencies already navigating reduced federal grant flows, redundant digital storage is the kind of inefficiency that turns up in audits and becomes a budget argument.
What the Experts Are Saying
Archivists at the DC Public Library's Special Collections division on G Street NW have been working for more than a year on an internal review of their digitized photograph holdings. The library's digital preservation staff have described the problem in public-facing documentation as one of the most underestimated drains on storage infrastructure — not just in terms of raw gigabytes, but in cataloguing labor. When the same image exists under three different file names across two separate folders, every future search query pulls redundant results, and staff time gets spent reconciling records that should have been clean from the start.
The Office of the Chief Technology Officer for the District of Columbia, which oversees citywide data infrastructure, has flagged digital asset management as a priority area in its operational planning. While the office has not announced a formal deduplication program, technology policy specialists familiar with municipal IT systems note that Washington's situation mirrors what mid-sized American cities have encountered after rapid, uncoordinated digitization pushes in the 2010s — years when grant money was available and the pressure was to get materials online fast, not clean.
At the National Archives and Records Administration facility in College Park, Maryland — just outside the District boundary but central to the region's professional archivist community — similar conversations have been happening at the federal level. Records management professionals there have long treated deduplication as standard practice, and some city archivists have pointed to NARA's established protocols as a model DC agencies could adapt for municipal-scale collections.
The Cost Problem Is Real
Cloud storage costs have dropped significantly over the past decade, but they have not dropped to zero — and large, poorly managed image collections can still represent tens of thousands of dollars in annual overhead for a mid-sized city agency. Industry benchmarks from digital preservation consortia suggest that duplicate files can account for anywhere from 15 to 30 percent of total image archive volume when no deduplication policy has been enforced. For a library or government agency managing collections in the hundreds of thousands of files, that translates directly to wasted budget.
The DC Commission on the Arts and Humanities, which funds cultural organizations across all eight wards, has also touched on the issue indirectly through its digital equity grant program. Several grantee organizations in neighborhoods including Anacostia and Columbia Heights have received support for digitization projects over the past three years — and advocates working with those groups say that sustainable digital stewardship, including deduplication, is frequently treated as an afterthought once a digitization project closes out.
Mayor Muriel Bowser's office has not made a public statement specifically addressing duplicate image management in city archives, but the broader push toward government efficiency that has characterized the current fiscal environment in the District makes the issue harder to ignore than it once was.
For institutions looking to get ahead of the problem, archivists and digital preservation specialists recommend a straightforward starting point: auditing existing collections using hash-based file comparison tools, which can identify identical files regardless of filename, before committing to any new storage contracts. Organizations in the District can also consult the Digital Preservation Coalition's published guidance, last updated in 2024, which includes specific recommendations for image collections held by public institutions. The work is not glamorous. But in a city where every IT dollar is getting a second look, cleaning up the archive is increasingly the kind of project that earns its keep.