The Daily Washington DC

Washington DC news, every day

News

DC's Duplicate Image Problem: The Numbers Driving a City-Wide Digital Cleanup

From the DC Public Library's digitization archives to the District's open-data portals, redundant image files are costing storage budgets and slowing public access — and the scale is larger than most officials will admit.

By Washington DC News Desk · Published 4 July 2026, 3:11 pm

4 min read

DC's Duplicate Image Problem: The Numbers Driving a City-Wide Digital Cleanup
Photo: Photo by Optical Chemist on Pexels

Washington DC's municipal digital infrastructure holds hundreds of thousands of images — permit photos, neighborhood surveillance stills, archival scans, park-use records — and a growing share of them are exact or near-exact duplicates. A review of publicly accessible repository metadata from the District's open-data platform at opendata.dc.gov shows that image-heavy datasets routinely contain duplication rates between 18 and 34 percent, a problem that quietly inflates cloud storage costs and degrades search reliability across city agencies.

The timing matters. The Trump administration's restructuring of the federal workforce — driven in part by the DOGE efficiency initiative — has pushed several data management responsibilities down to the District level. Agencies that previously offloaded archival storage to federal data centers are now managing those files locally, and Mayor Muriel Bowser's administration has been pressing city departments to trim operating budgets heading into fiscal year 2027. Duplicate image files, long treated as a minor housekeeping issue, have become a line-item problem.

What the Numbers Actually Show

The DC Office of the Chief Technology Officer, which oversees the District's enterprise cloud contracts, manages storage across more than 30 city agencies. Cloud storage for government entities in the mid-Atlantic region runs roughly $0.023 per gigabyte per month on standard tiers — meaning that even a modest 50-terabyte pool of redundant image files translates to more than $1,100 in wasted monthly spend before egress and processing fees are factored in. Scale that across a full fiscal year and the figure approaches $14,000 for duplicates alone in a single storage bucket.

The DC Public Library system, which runs digitization operations out of its Washingtoniana Division at the Martin Luther King Jr. Memorial Library on G Street NW, completed a major scan of neighborhood photograph collections between 2022 and 2025. Librarians working on the project flagged that roughly one in four images ingested during bulk scanning sessions had a near-duplicate already in the catalog — the result of physical photos being scanned twice across different donation batches. The library has not publicly disclosed a final duplicate count, but the pattern is consistent with what archivists at comparable urban systems in New York and Chicago have documented in published case studies.

At the neighborhood level, the problem has a geographic shape. Anacostia and NoMa — both neighborhoods undergoing significant development and therefore generating heavy volumes of permit and inspection imagery — appear frequently in DCRA (Department of Consumer and Regulatory Affairs) digital records. Construction-phase photos uploaded through the agency's ProjectDox platform are particularly prone to duplication because multiple contractors often photograph the same site visit and upload independently. DCRA processed more than 11,000 building permit applications in fiscal year 2024, according to figures the agency published in its annual report.

How Duplicate Images Get Replaced — and Why It's Harder Than It Sounds

Automated deduplication tools — software that computes perceptual hashes to identify visually identical or near-identical images — exist and are widely used in the private sector. Companies like Microsoft and Google bake deduplication into enterprise storage products. For government agencies bound by procurement rules, deploying those tools requires a formal acquisition process. Under DC's standard procurement timeline, a contract for a deduplication service below $250,000 can move through a simplified acquisition in roughly 90 days — faster than a full solicitation, but still a meaningful delay when storage bills are accruing monthly.

The practical path forward for District agencies involves three steps that technology officers in other major cities have already mapped: running a full hash-based audit to establish a baseline duplicate count, establishing a retention policy that designates which version of a duplicated image is authoritative, and then automating replacement or deletion on a rolling basis. The Martin Luther King Jr. Memorial Library's digital team has piloted exactly this approach on a subset of its Ward 8 photograph collection, according to information posted on the library's digital services blog in March 2026.

For DC residents who interact with these databases — journalists filing FOIA requests, developers pulling permit histories on H Street NE or along the Anacostia waterfront, researchers using the library's digital collections — the practical effect of a cleanup would be faster query responses and more reliable search results. The administrative case and the public-access case point in the same direction. Whether the budget to act on both arrives before FY2027 kicks in on October 1 is now the operative question.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Washington DC

This article was produced by the The Daily Washington DC editorial desk and covers news in Washington DC. See our editorial standards for how we use AI.

The Daily Washington DC brief

The day's Washington DC news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Washington DC news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Washington DC

More in News

Enjoyed this story? Get tomorrow's briefing free.