The Daily Washington DC

Washington DC news, every day

News

DC's Digital Archives Are Drowning in Duplicate Images — and the Fix Is Slower Than in London or Seoul

Washington's public records offices and cultural institutions are grappling with redundant digital image files at scale, while peer cities have already moved to automate the cleanup.

By Washington DC News Desk · Published 4 July 2026, 3:16 pm

3 min read

DC's Digital Archives Are Drowning in Duplicate Images — and the Fix Is Slower Than in London or Seoul
Photo: Photo by Pierre Blaché on Pexels

The DC Office of the Chief Technology Officer has been working through a stubborn institutional problem that librarians and records managers have complained about for years: thousands of duplicate digital images clogging government servers, slowing retrieval times for public records requests, and inflating storage costs across multiple agencies. The issue has reached a tipping point in 2026 as DOGE-driven budget reviews have forced every DC-area federal and municipal office to justify its data infrastructure spending line by line.

This is not a niche IT headache. Municipal archives, planning departments, and cultural institutions from the Smithsonian's digitisation units to the DC Public Library's Washingtoniana Division on Martin Luther King Jr. Avenue SE hold tens of thousands of scanned photographs, maps, and official documents — many of them ingested multiple times across separate digitisation projects, leaving the same image stored under different file names and in different resolution variants. The redundancy compounds during grant-funded digitisation sprints, when contractors upload batches without cross-checking existing holdings.

What Other Cities Are Doing

London's Metropolitan Archives completed a deduplication sweep of its photographic holdings in late 2024, using perceptual hashing software to flag near-identical image files before a human archivist made the final deletion call. The result, according to the London Archives authority's published annual report, was a roughly 18 percent reduction in raw storage volume across the collections it manages. Seoul's city government rolled out an AI-assisted duplicate detection layer across its municipal document management platform in early 2025, part of a broader smart-city infrastructure push. Amsterdam's Stadsarchief, which manages the city's public records going back to the 13th century, built duplicate-image rules directly into its ingest workflow so the problem is caught at the point of upload rather than cleaned up retroactively.

Washington has done none of that systematically. The DC Public Library received a Library of Congress digitisation partnership grant in fiscal year 2024 to expand its local history holdings, but the grant terms did not require automated deduplication tooling as part of deliverables. The result: archivists at the library's collection services unit on G Street NW have described — in budget justification documents reviewed by this paper — a backlog of image review work that stretches across multiple collection areas, with no dedicated staffing line to address it.

The Cost Problem on Local Servers

Cloud storage is not free. The General Services Administration's published rate schedules for fiscal year 2025 put standard government cloud object storage at roughly $0.023 per gigabyte per month under common procurement vehicles. An archive holding 50 terabytes of images — a modest figure for a mid-sized municipal system — pays over $1,100 a month just for storage, before egress and retrieval costs. If 15 to 20 percent of that volume is duplicated material, which is a figure consistent with findings in published digital preservation literature, the wasted spend is measurable and recurring.

The DC government's Office of Cable Television, Film, Music and Entertainment, based in the Shaw neighbourhood, maintains its own image and video asset libraries separate from the main city archive. Cross-agency deduplication — even sharing a common detection tool — would require a data-sharing agreement that does not currently exist between that office and the Office of the Chief Technology Officer, according to a procurement notice posted to the District's contracting portal in March 2026.

The tension with federal restructuring makes the timeline harder to predict. Several technology positions at DC-adjacent federal agencies that would normally collaborate with municipal IT offices have been cut or left vacant since early 2025. The National Archives facility in College Park, Maryland, which supports many DC-based research workflows, has operated with reduced digital services staffing for over a year.

For residents and researchers who rely on public-facing image collections — genealogists using the Washingtoniana Division, urban planners pulling historical maps for Anacostia redevelopment review, journalists requesting photographic records — the practical advice right now is to submit FOIA and public records requests with specific date ranges and file format preferences. That narrows the search pool archivists must manually sort through and reduces the chance that a duplicate-image tangle delays a response past the statutory 15-business-day window the District is required to meet under DC Code § 2-532.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Washington DC

This article was produced by the The Daily Washington DC editorial desk and covers news in Washington DC. See our editorial standards for how we use AI.

The Daily Washington DC brief

The day's Washington DC news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Washington DC news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Washington DC

More in News

Enjoyed this story? Get tomorrow's briefing free.