Washington's network of public agencies, libraries and municipal databases is sitting on a problem that keeps getting more expensive to ignore: millions of duplicate digital images clogging storage systems, inflating costs and slowing access to records that residents and researchers depend on every day. The question now is who pays to fix it, who decides how, and what gets deleted along the way.
The timing matters. Under the Trump administration's ongoing federal restructuring, the DOGE-driven efficiency push has already cut thousands of federal jobs and squeezed contractor budgets across the District. For DC's own agencies—which run parallel but legally distinct digital infrastructure—the pressure is indirect but real. Any increase in cloud storage costs or IT staffing gets harder to justify in a fiscal environment where Mayor Muriel Bowser's office is already managing federal funding uncertainty on multiple fronts.
What the Problem Actually Looks Like on the Ground
The District of Columbia Public Library system, which operates 26 branches including the flagship Martin Luther King Jr. Memorial Library at 901 G Street NW, has been digitizing local history collections for years under its DC Neighborhood Project and related programs. Archivists working with that material routinely flag duplicate scans—the same photograph of, say, a 1960s U Street corridor building, ingested three or four times across different collection workflows. Multiply that across years of scanning drives and the numbers stack up fast.
The Office of the Chief Technology Officer, which oversees DC government's enterprise IT systems from its offices near judiciary Square, has flagged digital storage consolidation as a medium-term priority in its multi-year technology roadmap. The city spends tens of millions annually on data storage and cloud infrastructure contracts. Duplicate image files—particularly high-resolution scans from permitting records, public health documentation and Metropolitan Police Department evidence archives—represent a measurable fraction of that overhead, though the OCTO has not published a specific public figure breaking out redundancy costs.
Private institutions face the same crunch. The Smithsonian Institution, whose 21 museums stretch from the National Mall to the Anacostia Community Museum on Fort Place SE, operates one of the largest digital collections in the world. Its digitization program has grappled with cross-collection duplication for more than a decade, and its open-access image portal—which made roughly 4.7 million images freely available as of early 2024—relies on ongoing deduplication work to remain functional and trustworthy.
The Decisions That Will Define What Comes Next
Three choices are converging at once. First, agencies must decide whether to run automated deduplication software—tools that flag visually similar or hash-identical files—or invest in human review. Automated tools are faster and cheaper upfront, but they carry documented risks of deleting images that are technically identical but contextually distinct, a concern raised repeatedly by digital preservation professionals at institutions like the Library of Congress on Independence Avenue SE.
Second, DC's government must settle on a shared-storage framework. Right now, agencies including the Department of Consumer and Regulatory Affairs and DC Health each maintain siloed image repositories. A unified approach would reduce redundancy but requires a governance agreement that has proved elusive. The OCTO's current fiscal year 2026 budget cycle is the natural moment to force that conversation.
Third, and most politically charged, is the question of what gets permanently deleted. Images in government custody—particularly from permitting records in rapidly gentrifying neighborhoods like Anacostia and NoMa—can become legal evidence in property disputes, zoning appeals or civil rights cases. Deleting a duplicate that turns out to be the only surviving copy of a key record is not a recoverable error.
The practical path forward likely involves a phased approach: automated flagging in the first quarter of fiscal year 2027, human review of flagged batches through mid-year, and deletion only after a 90-day hold period allowing agency counsel to sign off. That is roughly the framework the National Archives and Records Administration has recommended for federal records, and DC's own archivists have pointed to those federal guidelines as a reasonable baseline. Getting city council buy-in before the next budget markup, expected in the fall of 2026, would lock in both funding and accountability before the decisions get made by default.