Arizona: 10-year Journey Toward Automating Manual Deduplication

Strategy

Arizona’s journey toward automating manual deduplication began in 2015 with the development of a Microsoft Access script. The state’s strategy focused on building an automated framework that could simulate human interaction with Arizona State Immunization Information System (ASIIS) to reduce reliance on staff. Over the years, the process evolved multiple times, adapting to both system upgrades and changing technologies. The most recent advancement, implemented in March 2025, leverages Python scripts integrated with a sophisticated evaluation algorithm enhanced by Soundex. Deployed with cloud-based scheduling on AWS, the solution achieves near-continuous deduplication with minimal manual intervention. This final evolution significantly strengthens data integrity, scalability, and efficiency in managing Arizona’s immunization records.

Challenge

The ASIIS faces increasing challenges in managing duplicate patient records submitted from multiple providers and disparate systems. Manual deduplication is labor-intensive, inconsistent, and prone to human error, creating risks to data quality, patient safety, and reporting accuracy. Current tools lack flexibility, as they cannot automatically process complex variations, such as transposed fields, phonetic similarities, or partial identifiers, resulting in a manual queue. For example, differently spelled names may evade detection, while overly strict rules generate false positives that still require staff review. 

The number of records ending in the manual deduplication queue demonstrates the scale of the issue: 

  • 2019    434,029 out of 2,649,565 or 16.38% 
  • 2020    305,158 out of 2,445,594 or 12.48% 
  • 2021 1,397,714 out of 9,143,011 or 15.29% (average 5,824 per day) 
  • 2022    989,204 out of 5,250,935 or 18.84% 
  • 2023    471,279 out of 2,297,414 or 20.51% 
  • 2024    371,670 out of 2,492,524 or 14.91% 

Manual deduplication requires significant staff time, creating bottlenecks whenever staff are reassigned to other tasks, resulting in backlogs and incomplete patient records. Existing processes require substantial staff oversight, which reduces efficiency and delays timely data sharing with providers and public health stakeholders. Without a standardized, scalable, and automated solution, ASIIS cannot ensure data integrity at scale. ASIIS needs a robust system that balances automation with accuracy, reduces manual workload, and ensures data integrity across interconnected healthcare and public health systems. To address the gap, BIZS began developing tools to tackle the problem. 

Solution

To address the high volume of duplicate records in ASIIS, BIZS developed an automated deduplication framework leveraging Python scripts designed to simulate human user interaction with the system’s existing user interface. This approach allowed Arizona to enhance data processing without requiring disruptive backend system changes. The scripts incorporated a sophisticated evaluation algorithm that combined deterministic and probabilistic matching methods with phonetic encoding through Soundex. This ensured that subtle variations—such as misspellings, transpositions, or phonetic similarities—were captured accurately while minimizing false positives. 

The automation was deployed on a shared AWS virtual machine, enabling scalable and cost-effective processing. Scheduling was configured to run deduplication tasks multiple times a day, ensuring continuous improvement of data integrity without introducing backlogs. Exception handling was carefully built into the workflow: only the most complex or ambiguous cases were flagged for human review, while the vast majority of duplicates were resolved automatically. Weighted scoring across multiple attributes—such as name, date of birth, address, and phone—provided an additional safeguard against incorrect merges. 

Minimal manual intervention was required, which freed staff to focus on higher-value tasks rather than repetitive record review. By combining automation, algorithmic sophistication, and cloud-based infrastructure, Arizona created a flexible, scalable solution capable of adapting to future increases in record volume and evolving data quality standards. 

Outcome

The automated deduplication solution delivered substantial improvements to ASIIS data integrity, efficiency, and scalability. Manual processing of duplicate records was reduced dramatically, with more than 90% fewer cases requiring manual review compared to previous years. Staff time once consumed by daily manual deduplication was reallocated to higher-value responsibilities, helping reduce backlogs and ensuring that patient records remained more complete and up to date. 

By simulating human interactions in the UI, the solution integrated seamlessly with ASIIS without the need for disruptive system redesigns. The Soundex-enabled algorithm significantly improved match accuracy, capturing subtle phonetic and partial similarities that manual processes or rigid rules had previously missed. Weighted scoring reduced the risk of false positives, while exception handling ensured human oversight for edge cases only. 

The use of an AWS shared virtual machine provided both cost efficiency and reliability. Scheduled daily runs eliminated gaps in processing, allowing for near real-time deduplication. This supported more timely and accurate data sharing with public health stakeholders, improving the quality of immunization reporting and compliance with federal standards. The solution also established a sustainable framework for future growth, ensuring Arizona is prepared to handle rising data volumes with minimal additional cost. 

Importantly, the approach was shared with another jurisdiction, Puerto Rico, extending the benefits of automation and demonstrating the model’s adaptability beyond Arizona. Ultimately, the initiative proved how advanced automation and minimal manual intervention can strengthen both operational efficiency and public health outcomes. 

Supplemental Materials

Years: 2015, 2025

Locations: Arizona

Programmatic Areas: IIS

Key Words: health IT, IIS, interoperability, quality improvement, using immunization data

Evidence Based: No

Evaluations: No

Back To Top
Search