The Daily Washington DC

Washington DC news, every day

News

DC's Archives and Libraries Are Drowning in Duplicate Digital Images — Here's What Officials and Experts Are Saying

From the National Archives to the Martin Luther King Jr. Memorial Library, Washington's institutions are grappling with a growing crisis of redundant digital records that wastes storage budgets and buries authentic history.

By Washington DC News Desk · Published 4 July 2026, 3:06 pm

3 min read

DC's Archives and Libraries Are Drowning in Duplicate Digital Images — Here's What Officials and Experts Are Saying
Photo: Photo by Arian Fernandez on Pexels

Thousands of duplicate digital images are clogging the collections of Washington DC's major public archives, costing taxpayers money and slowing down researchers who rely on those records. The problem has moved from a back-office headache into a policy conversation — and the people responsible for managing those collections are pushing for coordinated action before backlogs grow worse.

The issue matters right now for a very specific reason. The Trump administration's restructuring of the federal workforce, carried out in part through the Department of Government Efficiency, has shrunk staffing at federal records agencies. Fewer archivists processing larger digital holdings means that manual deduplication — the time-consuming work of identifying and removing copied image files — is falling further behind. At the same time, DC's own municipal institutions are navigating federal funding uncertainty that limits what software tools they can purchase.

What Officials Are Saying on the Ground

At the DC Public Library system, which operates its flagship Martin Luther King Jr. Memorial Library on G Street NW, administrators have acknowledged internally that their digitization push — accelerated during the pandemic years — generated significant image redundancy. Library officials have said they are evaluating automated deduplication software but have not yet committed to a specific procurement timeline or vendor. The library's Special Collections division, which holds photographic records of neighborhoods including Anacostia and Congress Heights, is among the units most affected.

The National Archives and Records Administration, headquartered at the Pennsylvania Avenue building between 7th and 9th Streets NW, oversees billions of digital files across its holdings. A 2024 Government Accountability Office report — publicly available on GAO's website — found that federal agencies collectively stored an estimated 30 to 40 percent more digital data than they actively needed, a figure that archivists cite when arguing the problem is systemic, not isolated. NARA has piloted AI-assisted image comparison tools in limited contexts, though no agencywide rollout date has been announced.

At the Smithsonian Institution, whose digitization office operates out of its National Museum of Natural History on the Mall, staff have described the duplicate image problem as an inherited consequence of migrating older analog collections onto multiple successive digital platforms over the past two decades. Each platform migration left ghost copies. The Smithsonian's Open Access program, launched in February 2020, released 4.7 million digital records to the public — and experts in digital preservation say that release itself surfaced the scale of duplication for the first time.

What Experts Recommend — and What It Will Cost

Digital preservation specialists at the Washington-based Council on Library and Information Resources have argued publicly that institutions need to adopt hash-based deduplication — a technical process that assigns each image file a unique fingerprint and flags exact copies — as a baseline standard rather than an optional upgrade. Commercial tools capable of handling large archival collections typically run between $15,000 and $80,000 annually depending on collection size, a cost that strains budgets already squeezed by DOGE-era federal grant reductions.

Local historians who work regularly at the Washingtoniana Division of the MLK Library say the practical consequence is real: searching for historical photographs of neighborhoods like NoMa or Shaw sometimes surfaces the same image four or five times under different file names, slowing research and occasionally causing confusion about what constitutes an original record versus a copy.

Mayor Muriel Bowser's office has not issued a specific policy directive on municipal archive deduplication, though the city's Office of the Chief Technology Officer has broader data management guidelines that archivists say could be applied to this problem with modest political will and a dedicated budget line.

For researchers and members of the public who use these collections, the practical advice from digital archivists is straightforward: cross-reference file metadata, check acquisition dates on any image retrieved from municipal or federal portals, and flag apparent duplicates through the formal feedback mechanisms each institution maintains. The MLK Library accepts collection feedback through its Ask-a-Librarian portal. Every flagged duplicate reported is one fewer that staff have to find themselves — and right now, staff capacity is the scarcest resource of all.

Topic:#News

How does this story make you feel?

Spread the word

See something wrong? Suggest a correction.

Have your say

Loading comments…

Sources

About this article

Published by The Daily Washington DC

This article was produced by the The Daily Washington DC editorial desk and covers news in Washington DC. See our editorial standards for how we use AI.

The Daily Washington DC brief

The day's Washington DC news in a 2-minute read, every weekday morning. Free.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

Daily brief

Enjoyed this? Wake up to Washington DC news every morning.

Free, in your inbox before 7am. Weekdays.

By subscribing you agree to receive emails from The Daily Washington DC and accept our Privacy Policy. Unsubscribe anytime.

More from The Daily Washington DC

More in News

Enjoyed this story? Get tomorrow's briefing free.