The Daily Washington DC

Washington DC news, every day

News

DC's Digital Archives Are Riddled With Duplicate Images — Here's What the Numbers Actually Show

A quiet but costly data problem is eating into city agency budgets and slowing public records access across Washington DC.

By Washington DC News Desk · Published 4 July 2026, 2:48 pm

4 min read

DC's Digital Archives Are Riddled With Duplicate Images — Here's What the Numbers Actually Show
Photo: Photo by Optical Chemist on Pexels

Somewhere inside the DC Office of the Chief Technology Officer's servers, the same image file exists hundreds of times over. That is not an isolated glitch — it is a systemic pattern that city data managers have been quietly contending with for years, and the scale of the redundancy is only now becoming clear as federal funding pressure forces a harder look at storage costs.

Duplicate image replacement — the process of identifying, consolidating, and clearing redundant digital files across government databases — has moved from a back-office IT chore to a budget line item that matters. With the Trump administration's DOGE-driven restructuring shrinking federal transfers to the District, Mayor Muriel Bowser's office has pushed city agencies to find internal efficiencies. Digital storage waste is one of the more measurable places to cut.

The Scale of the Problem in DC Government Systems

Industry benchmarks from enterprise data management firms suggest that large municipal governments typically carry duplicate file rates of between 20 and 40 percent across unmanaged document repositories. For a city the size of Washington DC — which manages records across more than 80 distinct agencies, from the Department of Motor Vehicles on Brentwood Road NE to the Department of Human Services facilities in Anacostia — that redundancy translates into real dollar figures. Cloud and on-premises storage is not free. Commercial cloud storage tiers for government-grade compliance-certified systems run from roughly $0.02 to $0.05 per gigabyte per month, meaning a repository carrying several hundred terabytes of duplicated image files accumulates costs that reach into tens of thousands of dollars annually before personnel time is factored in.

The DC Public Library system, which operates a digitisation program out of its Martin Luther King Jr. Memorial Library on G Street NW, has dealt with this problem directly. Its digital collections — spanning historical photographs, maps, and documents — grew rapidly after a 2019 expansion of the MLK library's digital infrastructure. Without aggressive deduplication protocols, collections of that kind routinely see 15 to 25 percent of stored image files flagged as near-duplicates or exact copies during audits, according to published standards from the Digital Preservation Coalition, a UK-based nonprofit whose guidelines are widely used by US municipal archives.

The DC Office of Planning, which maintains GIS image layers and aerial photography archives tied to development review in neighbourhoods like NoMa and Congress Heights, faces a parallel challenge. Each zoning review cycle generates new image uploads, and without automated hash-matching — a technique that assigns each file a unique digital fingerprint — the same satellite tile or site photograph can enter the system multiple times through different submission portals. One duplicated high-resolution aerial image can run to 50 megabytes or more. Multiply that by thousands of submissions over a multi-year development boom in NoMa alone, and the storage overhead becomes significant.

What Deduplication Actually Costs — and Saves

The economics of fixing the problem depend heavily on whether a city agency handles deduplication manually or deploys automated tools. Manual review by a trained archivist or records technician averages roughly four to six hours per thousand files at a loaded labour cost that, for a DC government-grade position in the IT or library sciences track, runs between $45 and $70 per hour. Automated deduplication software licenses for government use range from around $8,000 to $30,000 annually depending on the volume tier, with one-time implementation costs on top of that.

The return on that investment is not trivial. Agencies that have completed deduplication audits typically report storage reductions of 18 to 35 percent in image-heavy repositories, according to case studies published by the National Archives and Records Administration in its 2023 Digital Preservation Framework guidance document. For the District, even a conservative 20 percent reduction across consolidated city repositories could free up storage capacity equivalent to millions of files — and reduce annual cloud costs accordingly.

For DC residents trying to access public records — property photos tied to permit applications, historical images from the Peabody Room collections at MLK Library, or neighbourhood survey photographs held by the Historic Preservation Office on Vermont Avenue NW — faster, cleaner databases mean faster responses. Agencies that complete deduplication audits before the end of fiscal year 2026, which closes September 30, will be better positioned for the next budget cycle. The work is unglamorous, but the math makes a case that is hard to ignore.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Washington DC

This article was produced by the The Daily Washington DC editorial desk and covers news in Washington DC. See our editorial standards for how we use AI.

The Daily Washington DC brief

The day's Washington DC news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Washington DC news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Washington DC

More in News

Enjoyed this story? Get tomorrow's briefing free.