Washington's Office of the Chief Technology Officer quietly flagged the problem in its most recent infrastructure audit: thousands of duplicate image files clogging the District's shared document repositories, slowing retrieval times and inflating storage costs at a moment when federal funding to city agencies is already under extraordinary pressure. The issue isn't new. But the bill for ignoring it for so long is finally coming due.
The timing matters. With the Trump administration's Department of Government Efficiency pushing sweeping cuts to federal transfers that historically subsidize District operations, Mayor Muriel Bowser's government is under pressure to demonstrate fiscal discipline on every front — including the unsexy, granular work of digital records management. A city that can't show clean, deduplicated data systems has a harder argument to make when asking Congress to protect its annual federal payment, which runs into the hundreds of millions of dollars annually.
How the Duplication Problem Built Up Over 20 Years
The roots go back to the early 2000s, when District agencies began digitizing paper records in earnest but without a unified standard. The DC Office of Documents and Administrative Issuance, housed on North Capitol Street, operated on different software protocols than the DC Public Library's digitization program, which maintained its own image archive for the Washingtoniana Collection at the Martin Luther King Jr. Memorial Library on G Street NW. When agencies merged systems or migrated platforms — as happened repeatedly between 2008 and 2019 — files were often duplicated rather than reconciled. Nobody was assigned to clean up the overlap.
The DC Health department, which digitized patient intake forms and public-health campaign materials through a separate contract, generated its own redundant image archive during the COVID-19 emergency procurement period beginning in March 2020. Emergency purchasing rules that year allowed faster vendor contracts but reduced the oversight that would normally catch duplicate uploads. IT staff at the John A. Wilson Building, the District's seat of government on Pennsylvania Avenue NW, have described the resulting file ecosystem informally as a patchwork — though no formal public statement attributing that characterization to a named official has been issued.
Industry data from AIIM, the global information management association, suggests that in large municipal digitization projects, duplicate file rates typically run between 20 and 40 percent of total stored images when cross-departmental standards are absent. The District has not published its own duplicate rate publicly, but the scale of storage contracts it has renewed in each of the past four fiscal years suggests the problem is substantial. The city's fiscal year 2025 technology budget allocated roughly $47 million to enterprise infrastructure, a figure that includes cloud storage costs that deduplication would reduce.
What a Fix Actually Looks Like — and Who Pays
Deduplication is not a single software purchase. For a government the size of the District, which operates dozens of agencies with overlapping document jurisdictions, it requires a coordinated audit, a common metadata standard, and a policy decision about which version of a duplicated record becomes the authoritative copy. The National Archives and Records Administration, headquartered in College Park, Maryland, has published federal guidance on exactly this process — guidance that applies to federal agencies but that District IT managers have used as a practical reference for their own planning.
The DC Council's Committee on Technology and the Environment has jurisdiction over the OCTO budget and has previously scrutinized digital infrastructure spending during annual oversight hearings at the Wilson Building. Advocates for open-government access, including DC Open Government Coalition, have long argued that records mismanagement — including unresolved duplicates — undermines Freedom of Information Act compliance and public accountability.
For District residents and businesses that depend on fast document retrieval from agencies like the Department of Consumer and Regulatory Affairs on Rhode Island Avenue NE, the practical consequence is slower service. DCRA processes building permits, business licenses, and property records — all image-heavy — and delays in that system ripple through real-estate transactions across neighborhoods from Anacostia to NoMa.
City technology staff have indicated through budget documents that a structured deduplication initiative could be phased across fiscal years 2026 and 2027. The first step, already reportedly underway, is a full inventory of image assets across all major agencies. What that inventory reveals — and what the District decides to fund next — will determine whether this long-deferred cleanup finally happens or gets shelved again when the next budget crisis hits.