The Daily Washington DC

Washington DC news, every day

News

How Washington DC's Public Archives Got Buried Under Thousands of Duplicate Images — and What It Cost to Fix It

Years of decentralized record-keeping across dozens of city agencies left the District's digital holdings riddled with redundant files, eating up server space and taxpayer dollars before a cleanup campaign finally began.

By Washington DC News Desk · Published 4 July 2026, 3:16 pm

3 min read

How Washington DC's Public Archives Got Buried Under Thousands of Duplicate Images — and What It Cost to Fix It
Photo: Cox, William Van Zandt, 1852- [from old catalog] / Public domain (Wikimedia Commons)

The District of Columbia's Office of the Chief Technology Officer confirmed earlier this year that a multi-agency audit launched in January 2026 had uncovered more than 400,000 duplicate image files spread across municipal servers — the accumulated debris of nearly two decades of ad-hoc digital record-keeping that no single department fully owned or managed. The problem did not appear overnight. It grew quietly, one redundant file at a time, as agency after agency digitized paper records, migrated platforms, and uploaded the same photographs, scanned permits, and planning documents into systems that rarely talked to each other.

The timing matters. With the Trump administration's DOGE-driven restructuring squeezing federal transfers to the District and Mayor Muriel Bowser's office under pressure to demonstrate fiscal discipline, the roughly $2.3 million that city technology staff estimate has been spent on excess cloud storage for duplicate content over the past five years has become harder to defend. City Council members on the Committee on Technology and the Environment began asking pointed questions about the OCTO budget in late 2025, and the storage audit was one direct response to that scrutiny.

A Problem That Grew With the City's Ambitions

The roots of the duplication crisis trace back to the mid-2000s, when the District began aggressive digitization pushes under multiple administrations. The DC Department of Buildings — known until its 2022 reorganization as the Department of Consumer and Regulatory Affairs — scanned permit records going back to the 1980s. The DC Office of Planning built its own image library for neighborhood surveys covering areas from Shaw to Anacostia. The DC Public Library's Washingtoniana Division independently digitized historical photograph collections. None of these efforts were coordinated under a unified asset management protocol, and as cloud storage got cheaper, the incentive to deduplicate files essentially disappeared.

The NoMa neighborhood illustrates the problem in miniature. As development accelerated along New York Avenue NE after 2010, planning documents, environmental assessments, and construction photographs were uploaded separately by the Office of Planning, the District Department of Transportation, and individual developers submitting materials through the ePlans portal. By the time the January 2026 audit began sampling server contents, OCTO staff found that some individual project folders contained the same site photograph stored in four or five separate locations across different agency drives.

The Anacostia waterfront tells a similar story. Historic preservation surveys conducted jointly by the State Historic Preservation Office and the National Park Service between 2015 and 2019 resulted in image sets that were duplicated across at least three separate municipal repositories, according to the audit summary presented to the Council's technology committee in March 2026.

What the Cleanup Actually Involves

OCTO awarded a contract in February 2026 to begin the deduplication work, using hash-matching software to identify identical and near-identical image files before flagging them for human review. The process is not automatic. Staff must verify that a file marked as a duplicate is not, in fact, a slightly different version of a record with independent legal standing — a distinction that matters enormously for building permits and land-use decisions that can be challenged in court.

The project is budgeted at $870,000 for the first phase, covering roughly 60 percent of the affected servers, with a second phase expected to begin in fiscal year 2027. City officials project the cleanup will free up enough server capacity to avoid a planned $1.1 million storage expansion contract that had been scheduled for renewal in September 2026.

For residents and community organizations that rely on DC's public records systems — from the Mount Pleasant Neighborhood Alliance filing freedom-of-information requests to law firms pulling permit histories for properties on U Street NW — the practical benefit will be faster search results and more reliable file retrieval. The deeper lesson is one the District's technology office has been reluctant to say plainly: years of treating digital storage as infinitely cheap produced costs that were anything but.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Washington DC

This article was produced by the The Daily Washington DC editorial desk and covers news in Washington DC. See our editorial standards for how we use AI.

The Daily Washington DC brief

The day's Washington DC news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Washington DC news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Washington DC

More in News

Enjoyed this story? Get tomorrow's briefing free.