The DC Public Library's Washingtoniana Division flagged a significant digital cataloguing problem this week after staff discovered thousands of duplicate image files had accumulated across its online archive, some dating back to scanning projects launched in 2019. The redundant files are slowing public search tools and, in at least some cases, showing researchers the wrong photograph attached to the wrong record.
The timing matters. Across the city, federal workforce reductions tied to the administration's restructuring push have thinned the ranks of digitisation contractors who work on joint projects between institutions like the Library of Congress on Independence Avenue SE and the National Archives at 700 Pennsylvania Avenue NW. Fewer staff reviewing ingested files means duplicate images slip through quality checks that would normally catch them before they go live to the public.
What Happened This Week
On Tuesday, July 1, the Washingtoniana Division quietly suspended new uploads to its Digital Collections portal while staff ran a deduplication audit. The pause affected roughly 1,200 recently scanned items from the Peabody Room collection at the Georgetown Branch, according to a notice posted to the library's internal project board and reviewed this week. Librarians said the audit was expected to last through July 11.
Separately, the Historical Society of Washington DC, headquartered at the Carnegie Library building at Mount Vernon Square, disclosed that its own photographic archive — which holds more than 75,000 images of the city — had identified approximately 3,400 duplicate or near-duplicate image files introduced during a 2024 batch migration. Society staff have begun manually reviewing flagged files using open-source deduplication software, a process that archivists estimate will take six to eight weeks to complete.
The problem is not unique to Washington, but the concentration of major federal and civic collections in a single city amplifies the stakes. The Library of Congress alone holds more than 17 million photographs in its Prints and Photographs Division, and even a fraction of a percent of duplicate or misattributed images represents a meaningful research hazard. The National Archives has separately been dealing with a backlog of digitisation work that predates the current administration's hiring freezes.
Why Duplicate Images Are More Than a Filing Problem
Duplicate image replacement — the process of identifying, removing, and replacing redundant or incorrectly filed digital images — is tedious, unglamorous work that rarely draws attention until something goes wrong. But researchers at institutions from the Anacostia Community Museum on Fort Place SE to the Kiplinger Research Library in Shaw depend on accurate metadata to locate photographs documenting neighbourhood change, particularly as gentrification has accelerated in areas like NoMa and Anacostia itself.
A duplicated or misfiled photograph of a 1970s U Street corridor building, for instance, can appear under the wrong address, skewing property history research that legal teams, developers, and community advocates all rely on. Howard University's Moorland-Spingarn Research Center, which holds extensive photographic documentation of Black Washington, has been running its own internal review of files imported from an external vendor in early 2025.
The DC Office of the Chief Technology Officer has a data governance framework that applies to city agency databases, but it does not extend to library or archival image repositories, which operate under separate institutional policies. That gap has become more visible as digitisation scales faster than oversight structures can keep pace.
For anyone who uses these archives — genealogists, journalists, historians, architects researching building permits along the H Street NE corridor — the practical advice right now is straightforward: if an image retrieved from a DC Public Library or Historical Society portal looks inconsistent with its listed caption or date, report it using the feedback link on the item's record page. Both institutions say they are processing correction requests within five business days during the current audit window. The Washingtoniana Division expects its Digital Collections portal to resume normal uploads by July 14, though staff acknowledged that date could slip if the audit surfaces a larger backlog than currently estimated.