DC's Digital Dead Weight: The Numbers Behind the City's Duplicate Image Problem
Thousands of redundant files are clogging Washington DC government servers and contractor systems — and the cleanup bill is growing.
Thousands of redundant files are clogging Washington DC government servers and contractor systems — and the cleanup bill is growing.

Washington DC's municipal and federal-adjacent technology infrastructure is carrying tens of thousands of duplicate digital images across its servers, a problem that city IT managers and procurement officers say is quietly draining storage budgets and slowing down public-facing systems at a moment when every dollar is under scrutiny. The scale of the redundancy is larger than most agencies have publicly acknowledged.
The timing matters. With the Trump administration's DOGE restructuring pushing federal contractors and District agencies to justify every line item, duplicate data — images stored two, three, sometimes five times across disconnected systems — has surfaced as a concrete, measurable inefficiency. Mayor Muriel Bowser's Office of the Chief Technology Officer has been under pressure since at least early 2025 to consolidate the District's fragmented data architecture, and duplicate image files represent one of the most straightforward targets for immediate cost reduction.
Industry benchmarks from enterprise storage audits suggest that between 20 and 40 percent of files stored on typical municipal servers are exact or near-exact duplicates. For a mid-sized city government running several petabytes of data — as DC does across agencies including the Department of Motor Vehicles on Penn Avenue, the DC Health systems network, and the Office of Tax and Revenue — that redundancy translates directly into wasted licensing costs, slower retrieval speeds, and inflated cloud-storage invoices. Cloud storage at standard enterprise rates runs roughly $20 to $25 per terabyte per month; trimming even one petabyte of duplicates saves close to $25,000 monthly.
The DC Public Library system, which operates its central branch at the Martin Luther King Jr. Memorial Library on G Street NW, digitised more than 800,000 archival images through its Washingtoniana collection project. Staff there have previously flagged that duplicate scans from overlapping digitisation contracts inflated the working archive by an estimated 15 percent before a 2023 deduplication pass. The library's digital services team declined to provide updated figures for this article, but the structural problem — multiple vendors scanning the same physical materials under different contract scopes — is common across DC government digitisation efforts.
At the DC Courts complex on Indiana Avenue NW, electronic case file systems that ingest scanned court documents have historically generated duplicate image attachments when filers submit the same exhibit through both an online portal and a physical filing. Court technology staff have been working since a 2024 system upgrade to implement automated hash-checking — a process that compares file fingerprints to catch identical images before they are saved a second time. That upgrade was budgeted at $1.2 million as part of the Courts' broader case management modernisation package approved by the Joint Committee on Judicial Administration.
Deduplication software licences from enterprise vendors typically run between $15,000 and $80,000 annually depending on the volume of data under management, and a full audit of a large municipal archive can take three to six months of staff time. For agencies already cut to lean staffing levels under the current federal restructuring environment — particularly those sharing infrastructure with federal programs through the DC-federal cost-sharing arrangements concentrated around the L'Enfant Plaza and Capitol Hill corridors — finding the internal capacity to run those audits is its own challenge.
Practical guidance from the National Institute of Standards and Technology, headquartered in Gaithersburg, Maryland, recommends that agencies establish automated deduplication policies at the point of data ingestion rather than attempting periodic cleanup after years of accumulation. That means writing deduplication requirements into procurement contracts before a vendor touches a server, not after the invoice is paid.
For DC residents and businesses interacting with District digital services — from permit applications processed through the Department of Consumer and Regulatory Affairs on Rhode Island Avenue NE to business licence renewals online — the downstream effect of bloated image archives is slower system response times and occasional retrieval errors when duplicate files conflict. The fix is technical and largely invisible to the public. The cost of not fixing it shows up on the budget spreadsheet every month.
How does this story make you feel?
Spread the word
About this article
Published by The Daily Washington DC
Daily brief
Free, in your inbox before 7am. Weekdays.
More in News