Skip to main content
Improving Tax Office Data-Matching

It may seem like a contradiction, but in some regards, tax offices are both data rich and data poor.  Tax offices have a significant amount of data, predominantly from business registration forms and tax return filings available to them.  This means that they have financial data, employment information, corporate data, and some demographic data.  Tax agencies often have extensive data warehouses which consolidate the data they possess, allowing for extensive analyses.

However, tax offices often lack additional data that would help them build stronger tax analytics to achieve their tax compliance mission.  This missing data includes demographic data, such as phone numbers, email address, aliases and a comprehensive list of physical addresses.  It also includes data which doesn’t have to be reported to tax agencies, such as bank account, property and asset information.

In addition, tax agencies typically lack the ability to consolidate their data with external data because often the external data lacks a one-to-one match, such as an SSN or EIN.

Tax Office Data-Matching: The Value Of Third-Party Data

By adding in additional data sources, tax offices would achieve a more robust data set, which can lead to better analyses, identification and predictive modeling.  To get this additional information, tax agencies can turn to several places for additional data.  One source would be other State agencies, including:
  • An unemployment insurance agency who can provide wage and employer data.
  • The Secretary of State who can provide data on corporation filings, and information about officers
  • The Department of Motor Vehicles that can provide information on vehicles and driver’s license data
  • Business license agency(ies) that can provide information on outstanding business licenses
  • Alcohol control boards that can provide data on liquor licenses and sales
Tax agencies can also get data from local governments. The difficulty with local government data is the number of sources and that this data may not be available in a consistent format.

Tax offices can turn to commercial data providers who can provide significant data on businesses and individuals.  One benefit of using third-party data sources is this single source can provide many of the data elements these other government agencies provide.  Another benefit of getting this data from a commercial source is they will have cleaned up, normalized and attributed the data to the proper source.  This will allow the agency to build stronger tax analytics.

Tax Office Data-Matching: Utilizing an Automated Tool to Consolidate Data

Consolidating internal and external data can pose a significant challenge. Often different entities use different key data fields to organize their data. The tax agency typically uses an SSN or FEIN, whereas other entities may use their own identifier, and may or may not have an SSN or FEIN.  In these situations, fuzzy matching is often needed to consolidate data. For example, is “Bob Smith at 22 Franklin Road” the same person as “Robert Smyth at 22 Franklin Rd, Suite 202”?

A manual consolidation is impractical because (1) the data is changing constantly, and (2) the sheer volume of data makes manual validation almost impossible.  Without automated tools, data consolidation and aggregation can represent 90% of the work, with the selection of cases representing only 10% of the work. Automation is needed to flip those weightings.

Automated systems can score matches and automatically consolidate individuals and businesses that have a sufficiently high match score. This means that depending on your use case you can have the tool change how the data is matched to either minimize false positives or accept some false positives with a goal of constructing a more robust matching environment.  For example, if you are trying to identify a fraud ring, you may want to allow for more false positives in hopes of finding hidden connections.  In another example, you may be looking for non-filers, and want to minimize your false positives, and therefore minimize the number of incorrect non-filer letters you generate.

Tax Office Data-Matching: Using Network Analytics to Visualize Data and Find Hidden Insights

Network Analytics is an advanced technology approach to associate people, entities and relationships to determine who an individual or business is related to, with either one or multiple degrees of separation. For example, if two people have only a single employer in common, they would not be considered tightly connected. However, if those same two people shared addresses, phone numbers and were also connected to the same six people, then they would be considered very tightly connected. A Network Analytics tool can analyze all sorts of data, including phone numbers, physical addresses, bank accounts, credit cards, device IDs, and email addresses for overlaps and connections. This analysis can be significantly aided by supplementing tax agency data with third-party data, because many of these connections would not be known using exclusively tax data.

Through network analytics, a tax agency can identify significant non-compliance in an automated fashion.  For example, an agency could find:

  • New business registrations where the business was related to another business that still owes money to the tax agency.
  • Refund fraud attempts, where the latest refund filing shares characteristics with a previously identified refund fraud request
  • Fraud rings, where an agency employee can determine who a taxpayer is related to and find additional individuals and businesses that might be related for investigation.
With a robust data repository, tax agencies can use tools like predictive analytics, outlier models and network analytics to strengthen their compliance capabilities and collect more money. A higher level of successful audits will lead to a public that understands that almost all failures to meet their obligations will be identified, ensuring that voluntary compliance rises as well.

related posts