Washington's public agencies are sitting on a problem measured in terabytes. Duplicate images — identical or near-identical digital files stored multiple times across government servers — are quietly draining storage budgets, slowing archival retrieval systems, and complicating Freedom of Information Act responses at agencies from the DC Office of the Chief Technology Officer to the National Archives facility on Pennsylvania Avenue. City and federal officials are now weighing what to do about it.
The issue has sharpened focus at a moment when the Trump administration's DOGE-driven efficiency push has put every line of government IT spending under scrutiny. For the District specifically, which relies on a patchwork of aging data infrastructure maintained partly through federal grants, the cost of redundant digital storage is not an abstraction. A 2024 report from the Government Accountability Office found that federal civilian agencies collectively spend billions annually on duplicative IT systems, a category that includes unmanaged image and document repositories.
What Officials and Technologists Are Saying
Inside the John A. Wilson Building on Pennsylvania Avenue NW, Mayor Muriel Bowser's office has pointed to the DC Digital Services initiative as the administrative vehicle for modernizing how the city stores and retrieves public records. Officials associated with that program have described duplicate file remediation as a priority for fiscal year 2026, though no formal public timeline or cost estimate has been released by the mayor's office.
At the federal level, the conversation runs through the National Archives and Records Administration, headquartered on Constitution Avenue NW. NARA has been working under a mandate established by the 2022 Presidential Records Digitization Directive to migrate paper and legacy digital records into managed electronic systems by December 31, 2026 — a deadline that technologists working in the records management space say is increasingly difficult given the volume of unresolved data quality issues, including duplicates.
Experts at the DC-based nonprofit the Center for Democracy and Technology have argued publicly that deduplication — the automated process of identifying and removing redundant files — should be a standard requirement embedded in any government cloud migration contract. Their position, laid out in policy briefs circulated on Capitol Hill earlier this year, is that agencies purchasing new storage infrastructure without requiring deduplication protocols are essentially paying twice: once to store the original file and again to store its copies.
The practical stakes are visible at the ground level. The DC Public Library's digital collections hub, anchored at the Martin Luther King Jr. Memorial Library on G Street NW, completed a major digitization push last year covering historical photographs of neighborhoods including Anacostia and NoMa. Library staff have acknowledged in public presentations that managing duplicate scans from community-submitted batches required significant manual intervention — work that consumed staff hours that could otherwise go toward expanding the accessible collection.
The Numbers Behind the Clutter
Storage is not free. Commercial cloud pricing from major providers currently runs between $0.02 and $0.023 per gigabyte per month for standard archival tiers, according to published rate cards. A government agency holding even 500 terabytes of unaudited image data — a figure well within the range reported for mid-sized federal bureaus — could theoretically eliminate tens of thousands of dollars in annual costs by running a single deduplication pass. Across dozens of agencies, the arithmetic scales quickly.
For the District's own IT budget, which the Office of the Chief Financial Officer set at roughly $180 million for fiscal year 2026, storage efficiency has become a talking point as federal funding uncertainty puts pressure on discretionary spending. DOGE-related reviews of grants flowing to DC's technology offices have introduced additional unpredictability.
Technologists advising the city recommend agencies begin with an audit of existing image repositories before purchasing additional infrastructure. Tools capable of identifying duplicate files through hash-matching — a process that compares unique digital fingerprints of files rather than their names — are widely available and can be deployed without replacing existing systems. The DC Office of the Chief Technology Officer has the authority to issue guidance mandating such audits across District agencies, though as of July 4, 2026, no such directive has been published. The next move belongs to whoever decides the quiet problem is worth a loud solution.