Thousands of duplicate digital images are sitting in Washington DC's overlapping public record systems — clogging servers, confusing researchers, and running up storage costs that neither city agencies nor federal partners want to absorb. The problem has sharpened this summer as the Trump administration's ongoing restructuring of the federal workforce has left key IT and records positions vacant across departments that share data infrastructure with the District.
The timing matters. Mayor Muriel Bowser's office has been pressing agencies to digitize more records as part of a broader open-government push, while simultaneous DOGE-driven cuts have reduced the number of federal archivists and IT contractors available to help deduplicate shared image databases. The result is a bottleneck that affects everything from property deed photographs stored at the DC Office of Tax and Revenue on Indiana Avenue NW to historic survey images held jointly by the National Archives and Records Administration in College Park, Maryland.
What the Backlog Looks Like on the Ground
The DC Office of the Chief Technology Officer, based at One Judiciary Square, has acknowledged the deduplication problem is not new — but the scale has grown. Agencies that digitized paper records in waves between 2018 and 2023 often lacked consistent metadata standards, meaning the same scanned photograph can appear under multiple file names, across multiple directories, with no automated flag to catch the overlap. Storage costs for government cloud contracts are not trivial: commercial cloud storage for large image archives typically runs between $0.02 and $0.023 per gigabyte per month under GSA schedule rates, and a single poorly managed digitization project can generate terabytes of redundant data within months.
The DC History Center on Massachusetts Avenue NW, which digitized portions of its photograph collection in partnership with the Library of Congress between 2020 and 2022, has dealt with its own version of the problem. Duplicate image files create confusion for researchers pulling records remotely — a challenge that has only grown since the center expanded its online portal to outside users in early 2024.
At the federal level, NARA has been working under a mandate, set by the Office of Management and Budget, to shift to fully electronic records management. That mandate carried a target date of December 31, 2022 for most permanent records — a deadline that came and went with many agencies still ingesting paper backlogs. Deduplication was never explicitly required under the mandate's technical standards, leaving agencies to set their own protocols, or none at all.
The Decisions Coming This Fall
Three choices now sit in front of DC and federal decision-makers, and the answers will shape public access to government imagery for years.
First, the DC OCTO must decide whether to fund a dedicated deduplication audit — estimated by comparable city-level projects in Chicago and New York at roughly $200,000 to $400,000 depending on data volume — or continue with piecemeal agency-by-agency cleanup. Second, the National Archives must determine which shared datasets with District agencies fall under federal jurisdiction and which are the city's responsibility to maintain. That boundary question has no clean legal answer right now. Third, agencies must choose a deduplication tool: open-source options like Apache Tika are available but require technical staffing that has thinned considerably under the current federal hiring freeze.
For residents and researchers who rely on public image archives — journalists filing FOIA requests for surveillance footage, historians working out of the MLK Memorial Library on G Street NW, or developers pulling permit photographs for properties in rapidly changing neighborhoods like Anacostia and NoMa — the practical stakes are real. Duplicate records slow retrieval, inflate response times for public records requests, and in some cases produce conflicting versions of the same document.
The next scheduled interagency working group on DC digital records is expected to convene in September 2026. If city and federal officials arrive without a agreed-upon deduplication standard, the backlog will keep growing — and so will the bill for storing files that should have been flagged and removed years ago.