Washington DC's Office of the Chief Technology Officer has been quietly grappling with a problem hiding in plain sight: tens of thousands of duplicate image files clogging the city's digital records infrastructure, a mess that traces back more than a decade of piecemeal digitisation efforts across agencies that rarely spoke to each other. The redundancy has forced repeated manual reviews, slowed public records requests under the DC Freedom of Information Act, and consumed server capacity that city IT managers say could be better deployed elsewhere.
The timing matters. Mayor Muriel Bowser's administration is operating under tighter fiscal constraints than at any point in the past five years, partly because Trump administration restructuring and DOGE-linked federal workforce reductions have hollowed out spending in the metro economy. Contractors and consultants who once billed DC agencies for digitisation projects are facing reduced renewal pipelines. Against that backdrop, cleaning up a legacy data disaster is not a luxury—it is a budget necessity.
A Problem Built One Scan at a Time
The roots of the duplicate-image crisis run back to roughly 2011 and 2012, when individual DC agencies—from the Department of Consumer and Regulatory Affairs on K Street NW to the DC Public Library system's Washingtoniana Division on G Street NW—began independent scanning drives, often using different vendors, different file-naming conventions, and different metadata standards. Nobody mandated a single unified repository. The DC Archives, housed at 1300 Naylor Court NW in the Blagden Alley neighbourhood, received some of those files. Other batches went to agency-specific SharePoint servers. Still others ended up in early cloud storage arrangements that have since migrated two or three times.
Each migration created opportunities for duplication. A single photograph of, say, the 1968 riots along 14th Street NW might exist in four separate folders across three platforms, each with a slightly different file name and resolution. Multiply that by the sheer volume of historical records DC agencies hold—property deeds, permit photographs, court exhibits, planning commission images—and the scale of the problem becomes clear.
The DC Public Library's Special Collections staff flagged the issue formally in internal communications as early as 2019, according to budget documents posted to the city's Open Data portal. A pilot deduplication project launched in fiscal year 2022 under a contract with a Maryland-based records management firm identified more than 340,000 duplicate image files within a sample set covering just three agencies. The full citywide figure has never been publicly audited.
What the City Is Doing—and What Comes Next
The OCTO has in recent months been piloting automated deduplication software across select agency drives, prioritising the Department of Buildings—formerly DCRA—and the Office of Planning, both of which handle high-volumes of permit and land-use imagery that feeds directly into public-facing portals used by developers, lawyers, and residents in fast-changing neighbourhoods like Anacostia and NoMa. The goal, outlined in the city's fiscal year 2026 technology investment plan, is to reduce redundant storage costs by roughly 30 percent before the end of the calendar year.
For ordinary residents, the practical effect of the cleanup should be faster FOIA responses. DC law requires the city to respond to records requests within 15 business days, a deadline that agency staff have struggled to meet when image-retrieval systems return dozens of duplicate results requiring manual triage. Advocacy groups including the DC Open Government Coalition have documented response-time failures over multiple years, though the city disputes some of those characterisations.
The deduplication work is also setting a precedent for how the city handles future digitisation contracts. New procurement language circulated this spring through the Office of Contracting and Procurement requires vendors to submit a metadata standardisation plan before any imaging work begins—a requirement that did not exist before 2024. Whether that change holds through future budget cycles, particularly as federal grant dollars that have historically subsidised city technology upgrades grow scarcer, is the question that DC's tech and records communities are watching most closely heading into the fall appropriations season.