The Daily Washington DC

Washington DC news, every day

News

DC Agencies Race to Purge Duplicate Images From Public Records as Federal Archivists Sound the Alarm

City officials, digital preservation experts, and records managers are weighing in on a quiet but consequential problem eating up storage budgets and clouding public archives across the District.

By Washington DC News Desk · Published 4 July 2026, 3:23 pm

3 min read

DC Agencies Race to Purge Duplicate Images From Public Records as Federal Archivists Sound the Alarm
Photo: Photo by Optical Chemist on Pexels

Washington's city and federal agencies are sitting on millions of duplicated digital images — scanned permits, inspection photos, court filings, historic photographs — that are draining storage capacity, inflating IT contracts, and in some cases making it harder for residents to find accurate public records. The problem has drawn new scrutiny this summer as budget pressures from federal workforce restructuring force both the District government and federal agencies headquartered here to audit what they actually own.

The timing matters. The Trump administration's Department of Government Efficiency initiative has pushed agencies across the capital to document their digital infrastructure costs. That audit process has, at minimum, forced records managers to confront something archivists have flagged for years: nobody has a clean, deduplicated image library, and the redundancy is costing real money on storage contracts and staff hours spent managing it.

What Officials and Experts Are Saying

The DC Office of the Chief Technology Officer, headquartered on 441 4th Street NW, manages cloud and on-premise storage for dozens of District agencies. Procurement records from fiscal year 2025 show the city spent more than $47 million on data storage and cloud services across its enterprise systems — a figure that records management specialists say could be meaningfully reduced if agencies systematically identified and removed duplicate files before migrating to consolidated platforms.

At the DC Archives on Pennsylvania Avenue SE, staff have been working since early 2025 on a multi-phase digitization project covering historical building permits from neighborhoods including Anacostia and LeDroit Park. Archivists involved in similar projects at peer institutions — the National Archives at College Park and the Library of Congress on Independence Avenue — have publicly noted that deduplication is one of the first and most labor-intensive steps in any large-scale scan-and-ingest workflow. Without it, search indexes bloat, retrieval slows, and identical images get catalogued under different reference numbers, creating confusion for researchers.

The DC Office of Planning, which manages a growing repository of aerial survey images and neighborhood development photographs tied to projects in NoMa and along the Anacostia waterfront, has described deduplication as a prerequisite for its planned open-data portal update, expected later in 2026. Records released under a Freedom of Information Act request show the agency flagged the issue internally as early as October 2024.

A Practical Problem With Practical Stakes

The stakes are not abstract. When the District's Department of Consumer and Regulatory Affairs processes a building permit appeal, inspectors pull photographic evidence from a shared repository. Duplicate images — often created when the same file is uploaded by different staff members, or when legacy systems were merged without deduplication protocols — can slow that retrieval, or worse, result in the wrong version of an image being entered into an administrative record.

Digital preservation professionals point to a specific technical standard: the Federal Agencies Digital Guidelines Initiative, known as FADGI, recommends that any institution managing public photographic records implement hash-based deduplication — essentially a fingerprinting system — before ingesting new material. FADGI published updated guidance on this in 2023. Some DC agencies have adopted it; others have not.

Storage costs are not trivial at the municipal level. Cloud storage for uncompressed government image files can run between $0.02 and $0.05 per gigabyte per month depending on the contract tier — and agencies managing tens of millions of files are looking at meaningful annual expenditures on data that is, in some cases, stored two, three, or four times over.

For District residents and businesses dealing with the city's permitting or records offices, the practical advice is straightforward: if you're requesting historical images or photographs through the DC FOIA portal at foia.dc.gov, flag any apparent duplicates you receive, because agencies are actively using public feedback to identify redundancies in their systems. The Office of the Chief Technology Officer has indicated it plans to publish a consolidated deduplication policy for District agencies before the end of fiscal year 2026, which closes September 30.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Washington DC

This article was produced by the The Daily Washington DC editorial desk and covers news in Washington DC. See our editorial standards for how we use AI.

The Daily Washington DC brief

The day's Washington DC news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Washington DC news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Washington DC

More in News

Enjoyed this story? Get tomorrow's briefing free.