DC Archives and City Agencies Move to Fix Duplicate Image Problem Plaguing Public Records This Week
A quiet but consequential push to clean up redundant digital files is reshaping how Washington's government stores and shares public imagery.
A quiet but consequential push to clean up redundant digital files is reshaping how Washington's government stores and shares public imagery.

Washington's Office of the Chief Technology Officer confirmed this week that multiple District agencies have begun a coordinated duplicate-image-replacement effort targeting tens of thousands of redundant files clogging the city's shared digital infrastructure — a problem that has quietly inflated storage costs and slowed public records requests for years.
The timing is not accidental. With the Trump administration's Department of Government Efficiency putting federal contractors and local governments alike under pressure to justify every line of IT spending, Mayor Muriel Bowser's technology office has been moving to demonstrate fiscal discipline on its own terms. Duplicate digital assets — photographs, scanned documents, permit images — have long been an embarrassment in municipal data management circles, but this week marked the first coordinated inter-agency sweep the District has mounted in at least three years.
The effort is concentrated initially at two institutions: the DC Office of Planning, headquartered at 1100 4th Street SW near the Waterfront neighborhood, and the District's Department of Consumer and Regulatory Affairs, which manages building permit imagery for properties across all eight wards. Both agencies have been working through backlogs of scanned construction photographs, many of which were uploaded two and three times over due to legacy software handoffs during a 2022 platform migration.
The DC Public Library system's digital collections team at the Martin Luther King Jr. Memorial Library on G Street NW is also participating. Library staff have flagged hundreds of duplicate historical photographs of neighborhoods including Anacostia, NoMa, and Shaw — images that appear more than once in the DC Community Archives catalog, confusing researchers and taking up server space that the library pays for through its annual technology budget.
The deduplication process itself relies on perceptual hashing, a technique that identifies visually identical or near-identical images even when file names or metadata differ. The technology is not new, but its application across agencies that historically kept separate IT systems is the novel piece here. The OCTO is coordinating the cross-agency work under its DC Net infrastructure program.
Storage is not free. Commercial cloud storage rates that the District pays under its enterprise licensing agreements run roughly $0.02 per gigabyte per month — a figure that compounds quickly when agencies collectively hold duplicated image libraries running into the hundreds of terabytes. A 2024 audit by the DC Office of the Inspector General found that unmanaged data redundancy across District agencies contributed to avoidable cloud expenditure, though the office did not publish a specific dollar total for that category alone.
There is also a practical civic dimension. Freedom of Information Act requests filed with DC agencies — which numbered more than 14,000 in fiscal year 2024 according to the District's annual transparency report — frequently involve photographic records. When duplicate images appear in response packages, they slow processing, inflate PDF file sizes, and sometimes generate legal disputes over whether a complete record was produced. Cleaning up the underlying catalogs directly reduces that friction.
The effort also intersects with the ongoing redevelopment pressure in neighborhoods like Anacostia, where permit and inspection photography is central to community review of new construction projects. Accurate, deduplicated records make it easier for Advisory Neighborhood Commissions to track what has been built and what has been approved.
OCTO has set an internal completion target for the initial agency sweep of September 30, the end of the District's fiscal year. Agencies that complete the process before the deadline are expected to report storage savings back to the city's budget office as part of the fiscal year 2027 budget preparation cycle. Residents who use DC's open data portal at opendata.dc.gov should begin seeing cleaner image search results in the planning and permitting datasets within the next four to six weeks, according to guidance the office circulated internally this week. Anyone who spots duplicate entries in public-facing datasets can flag them through the portal's feedback tool.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Washington DC
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News