The Daily Washington DC

Washington DC news, every day

News

How DC's Public Records Archive Ended Up Flooded With Duplicate Images — And What Comes Next

Years of siloed city agencies, underfunded IT budgets, and a rushed pandemic-era digitization push left Washington's public document systems riddled with redundant files that officials are now scrambling to fix.

By Washington DC News Desk · Published 4 July 2026, 2:58 pm

3 min read

How DC's Public Records Archive Ended Up Flooded With Duplicate Images — And What Comes Next
Photo: Photo by Chris on Pexels

Washington DC's Office of the Chief Technology Officer has quietly begun a systemwide audit of duplicate image files embedded across the District's public-facing document repositories, a cleanup effort that traces its roots to decisions made — and deferred — over the better part of a decade. The problem is not glamorous, but it is expensive: storage bloat, broken search results, and records that return multiple near-identical scans of the same page have eroded the usability of systems that residents and city staff depend on daily.

The timing matters. With the Trump administration's DOGE-driven federal workforce restructuring squeezing contractor pipelines and grant pools that DC agencies have historically tapped for technology upgrades, the District is under pressure to demonstrate fiscal discipline in its own house. Mayor Muriel Bowser's office has flagged technology modernization as a budget priority for Fiscal Year 2027, and the duplication backlog is now part of that conversation in a way it simply was not two years ago.

How the Backlog Built Up

The story starts well before the pandemic. DC's records infrastructure grew agency by agency, each department procuring its own document management platforms on separate procurement cycles. The Department of Consumer and Regulatory Affairs, headquartered on K Street NW, ran different scanning protocols than the DC Office of Tax and Revenue on Indiana Avenue NW. Neither was talking to the other in any coordinated way about file naming conventions, resolution standards, or deduplication logic. Multiply that dynamic across dozens of agencies and you get a sprawl of overlapping image libraries with no single authority over cleanup.

Then came 2020. The District, like virtually every municipal government in the country, pushed hard to digitize paper records so that remote staff could access them during COVID-19 closures. Speed was the priority. Quality control was not. Scanning contractors were brought in under emergency procurement authorities, and files were uploaded in bulk to shared drives and legacy content management systems. According to a District government technology review circulated internally in late 2024, some document sets in the DC Land Records system — managed through the Recorder of Deeds office on Pennsylvania Avenue NW — contained duplication rates above 30 percent for certain property transaction records from the 2020-2022 period. That review has not been published publicly, and its specific figures have not been independently verified by this reporter.

The DC Public Library's digital collections team encountered similar issues in its Washingtoniana Division materials. Staff there have described, in public presentations at the Martin Luther King Jr. Memorial Library on G Street NW, the challenge of reconciling multiple scans of the same photograph or document that arrived through separate digitization grants funded by the Institute of Museum and Library Services.

What Deduplication Actually Requires

Fixing duplicate image problems is not simply a matter of hitting delete. Archivists and records managers must first establish which version of a duplicated file is the authoritative copy — the highest resolution, the most complete metadata, the one that has already been cited in legal or administrative proceedings. Deleting the wrong version can create broken links in published finding aids and, in the case of property or court records, potential legal complications.

The OCTO audit is expected to use perceptual hashing tools to flag near-duplicate images across repositories, followed by manual review for any records tied to pending legal matters. Similar projects in other large American cities — Baltimore and Philadelphia have both undertaken comparable cleanups in recent years — typically take 18 to 24 months to complete even with dedicated staffing.

For DC residents, the practical advice is straightforward: if you are pulling records through the DC Integrated Tax System or the Recorder of Deeds online portal and you receive duplicate document returns, note the file metadata dates and flag the issue through the agency's public feedback form. The audit team is reportedly using citizen-reported anomalies as one input for prioritizing which record sets to address first. The cleanup is underway. Whether the budget survives the next round of federal funding uncertainty will determine how quickly it finishes.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Washington DC

This article was produced by the The Daily Washington DC editorial desk and covers news in Washington DC. See our editorial standards for how we use AI.

The Daily Washington DC brief

The day's Washington DC news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Washington DC news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Washington DC

More in News

Enjoyed this story? Get tomorrow's briefing free.