The Daily Washington DC

Washington DC news, every day

News

How Washington's Public Record Archives Ended Up Riddled With Duplicate Images — And Why It's Costing the City

A decades-long chain of digitisation contracts, agency mergers, and budget shortcuts left DC's official image databases bloated, redundant, and increasingly expensive to maintain.

By Washington DC News Desk · Published 4 July 2026, 3:06 pm

3 min read

How Washington's Public Record Archives Ended Up Riddled With Duplicate Images — And Why It's Costing the City
Photo: Photo by Mark Direen on Pexels

Washington DC's Office of the Chief Technology Officer is sitting on a problem that has been quietly compounding since at least 2009: thousands of duplicate images embedded across city agency databases, public records portals, and the DC.gov infrastructure that roughly 700,000 residents depend on for everything from permit applications to court documents. The redundancy isn't accidental. It's the accumulated wreckage of 17 years of piecemeal digitisation efforts, each one launched by a different administration with a different contractor and a different file-naming convention.

The timing matters now because the Trump administration's DOGE-driven federal workforce restructuring has accelerated a parallel crisis at the local level. Mayor Muriel Bowser's office has been under mounting pressure to demonstrate fiscal discipline as federal grant flows to the District have grown less predictable. Every megabyte of redundant storage costs money the city increasingly can't afford to waste. Storage and cloud-hosting contracts for District government systems run into the tens of millions of dollars annually, and duplicated assets inflate those bills directly.

A Paper Trail Back to the Early Digitisation Era

The roots of the problem reach back to the DC government's first major push to scan and upload historical documents, which accelerated after the establishment of the DC Archives' digital initiative on the Southwest Waterfront. Multiple agencies — the Office of Planning, the Department of Consumer and Regulatory Affairs on K Street NW, and the DC Courts system — each contracted separately with vendors to digitise overlapping sets of records. Nobody mandated a shared image registry or a universal hash-checking protocol to catch duplicates before upload. The result was predictable: the same photograph of a zoning map, or a building permit photograph from a Shaw neighbourhood rowhouse, could appear dozens of times under different filenames across different agency servers.

The problem deepened after 2016, when the city launched the open data portal at opendata.dc.gov as part of a broader transparency push. Agencies migrating legacy records to the new platform often batch-uploaded entire existing folders rather than auditing for redundancy first. The DC Department of Transportation's asset image library alone reportedly ballooned during that migration period, though the OCTO has not released a precise figure for the total volume of confirmed duplicates across all systems.

What Federal Restructuring Changed

The DOGE efficiency reviews of federal agencies sharing data pipelines with the District — including the National Archives facility in College Park, Maryland, which handles some DC-originated federal records — introduced a new layer of urgency in early 2026. When federal agencies began auditing their own shared-storage arrangements, discrepancies in cross-referenced image files flagged Washington DC's contribution to joint databases as disproportionately large relative to unique-content value. That finding filtered back to the Wilson Building on Pennsylvania Avenue NW by late March.

Cloud storage costs are not abstract. Amazon Web Services pricing for government-tier storage used by many DC agencies runs in the range of $0.023 per gigabyte per month for standard access tiers — a figure that scales sharply when duplicated files multiply effective storage volume by factors of three or four in some poorly managed directories. An internal audit framework commissioned by the OCTO, referenced in a District budget committee session held in April 2026, identified duplicate image elimination as a priority cost-reduction target for fiscal year 2027.

Advocates for open government, including staff at the DC Policy Center on 14th Street NW, have long argued that the deeper issue is structural: the city has never enacted a unified data governance ordinance requiring agencies to coordinate before launching independent digitisation projects. Without that framework, each budget cycle risks repeating the same pattern.

For residents, the practical upshot is straightforward. If the OCTO moves forward with a deduplication sweep — using automated perceptual hashing tools that can identify visually identical images regardless of filename — city portals should become faster and cheaper to run within 12 to 18 months. The harder work is writing the procurement rules that prevent the next generation of vendors from creating the same mess all over again. The DC Council's Committee on Technology and the Environment is expected to take up draft data governance legislation before the end of the 2026 legislative session.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Washington DC

This article was produced by the The Daily Washington DC editorial desk and covers news in Washington DC. See our editorial standards for how we use AI.

The Daily Washington DC brief

The day's Washington DC news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Washington DC news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Washington DC

More in News

Enjoyed this story? Get tomorrow's briefing free.