Washington's public agencies are sitting on millions of duplicate digital images — scanned permits, property records, historical photographs — and the cost of doing nothing is starting to show up in real budget lines. The DC Office of the Chief Technology Officer flagged the issue in internal planning discussions earlier this year, and city archivists say the backlog has grown substantially since a 2022 push to digitize paper records across multiple departments accelerated faster than any deduplication protocol could keep pace with.
The timing matters. With the Trump administration's DOGE-driven restructuring squeezing federal agencies that share data pipelines with District offices, the city's own IT infrastructure is absorbing new pressure. When duplicate files pile up across shared servers — the kind maintained jointly by the DC Office of Planning and the DC Historic Preservation Office on E Street NW — retrieval times slow, storage contracts balloon, and staff hours get eaten up by manual reconciliation work that purpose-built software could handle in minutes.
What the Experts Are Recommending
Digital records specialists have been consistent on the solution: automated deduplication tools that use perceptual hashing, a technique that identifies visually identical or near-identical images even when file names or metadata differ. The technology is not new. The Library of Congress, which maintains its main reading rooms on First Street SE on Capitol Hill, has used variants of the approach for its Prints and Photographs Division for several years. What's new is the pressure on municipal governments to adopt the same standard.
The DC Public Library system, which operates 26 branches including the central Martin Luther King Jr. Memorial Library on G Street NW — renovated and reopened in 2020 — has its own digitized collection that advocates say would benefit from a systematic deduplication review. Library technology staff have noted publicly that the Washingtoniana collection alone contains thousands of historical neighborhood photographs, some scanned multiple times across different donor batches. Without deduplication, researchers pulling records on Anacostia redevelopment projects or NoMa's pre-gentrification streetscapes sometimes encounter three or four identical files where one would do.
The financial argument is straightforward. Cloud storage for government entities typically runs between $0.02 and $0.05 per gigabyte per month depending on the contract tier. Agencies that have audited similar archives in comparable jurisdictions — including Philadelphia's Office of Innovation and Technology, which completed a deduplication project in fiscal year 2024 — have reported storage footprint reductions of 20 to 35 percent after a single pass with deduplication software. For a mid-size DC agency managing tens of terabytes, that translates to real money at a time when Mayor Muriel Bowser's administration is defending the District's budget against federal funding uncertainty on multiple fronts.
The Political Dimension
There's a wrinkle that goes beyond IT housekeeping. Some civic tech advocates and open-government groups point out that duplicate images aren't just a storage problem — they can distort public records responses under DC's Freedom of Information Act. If a FOIA request triggers a file count that includes duplicates, the agency may overstate the burden of compliance, potentially justifying delays or partial responses. That concern has surfaced in discussions at the DC Auditor's office, which has jurisdiction over government transparency and accountability.
City Council member circles and advisory neighborhood commissions in neighborhoods like Shaw and Columbia Heights have separately raised questions about whether digitization funding appropriated in recent budget cycles — the fiscal year 2025 budget allocated money for technology modernization across several agencies — has actually improved public access to records or simply moved paper clutter into digital clutter.
The practical path forward, according to technology policy observers who follow the District government closely, runs through the OCTO issuing updated data governance standards that require deduplication checks as a condition of any new digitization contract. Several pending vendor agreements, expected to be finalized before the end of calendar year 2026, could carry that requirement if the agency moves quickly. Without it, the next batch of scanned documents will land in the same overcrowded storage environment that's causing problems today.