Skip to main content

Smart Data Quality

Make your Data Quality treatments easier and more efficient thanks to AI!

Smart Data Quality: the rise of an artificial intelligence solution for improved data quality

More than ever, companies seek to extract value from the data they increasingly collect about their customers, their suppliers, their market etc. Nevertheless, most of them face a major obstacle: poor data quality, which costs companies up to 20% of their turnover.

Data quality is indeed crucial for conducting any relevant data processing and data analysis since poor data can hinder the proper conduct of certain business processes: erroneous invoicing, polluted marketing campaigns, unsuccessful field interventions, biased performance indicators or even distorted understanding of the business.

However this task is generally regarded as IT related and tedious as it involves time-consuming manual operations on huge databases scattered along many departments of the company.

In light of this finding, our Data Science team has developed an Artificial Intelligence (AI) solution designed to:

  • Make Data Quality treatments automated, fast and efficient thanks to Machine Learning models
  • Place the business experts, key users of the data, at the heart of the approach to ensure its relevance regarding specific business contexts

Each module of the application relies on the users’ annotations and expertise to personalize the main algorithms. Allowing our algorithms to adapt to every different context makes them super efficient and enables them to perfectly match specific business needs.

The Smart Data Quality application forms part of Sia Partners augmented consulting strategy and therefore paves the way for the future of consulting where consultants are empowered by the use of Artificial Intelligence (AI). By combining our consultants business understanding as well as functional experts with AI capabilities, the application is able to automatically create relevant algorithms for better data Quality Control.

Smart Data Quality scope of Application

Smart Data Quality can tackle many issues in various fields thanks to its agnostic approach. The application helps to handle different Data Quality tasks with the following main functionalities:

  • Data Quality Diagnosis: the solution gathers the users’ feedback to build an exhaustive diagnosis of the Data Quality within your systems. For instance, the users’ feedback could allow the algorithm to learn to efficiently recognize typing mistakes and semantic errors in customers' addresses.
  • Data Deduplication: based on the users’ feedbacks and state-of-the-art fuzzy matching algorithms, the solution builds an efficient deduplication tool that understands the definition of a duplicate regarding a specific context.
  • Data Enrichment: still based on the users’ choices, the solution perceives how to enrich any dataset with external open data, as the SIREN base in France for instance, as well as how to reconcile of data from different sources within the company.
  • Smart Completion: the solution learns by itself the correlations between features in order to complete your missing data using extrapolation and AI algorithms.

As a result, this is a tool that automates and accelerates Data Quality treatments while making them more relevant regarding your specific business.

A first business use case : CRM for a large water company

Sia Partners has been helping companies to better understand their customers for a long time and therefore has a deep understanding of their needs regarding this strategic task. In particular when it comes to Customer Relationship Management (CRM) and Data Deduplication. 

The implementation of increasingly advanced CRM tools is gaining momentum among many businesses in order to capitalize on the great amount of aggregated customer data. Customer knowledge becomes more detailed, reducing the time it takes to process requests and improving consumer satisfaction. Nevertheless, all this implies several levels of processing in order to guarantee a high quality of the data.

Indeed, customer databases are often characterized by a strong presence of duplicates due to typing errors and/or technical errors. For our client, a large water company, this poor data quality has resulted in the following consequences:

  • Increase in maintenance and storage costs
  • Increase in the emailing and letter campaign costs
  • Deterioration of the brand image because consumers may get tired of being targeted several times by the same action
  • Dispersion of data and therefore of customer knowledge
  • Dispersion des données et donc de la connaissance client


Smart Data Quality, un outil basé sur l'IA pour un traitement optimisé des doublons

The detection and deletion of duplicates was therefore a key challenge for our client. However, the current volume of customer databases, which counts millions of rows, makes the task impossible to carry out manually and the automation process suffers from numerous technical constraints, unless using artificial intelligence.

Our Machine Learning based algorithms allowed us to smartly tackle this challenge by bringing together great AI capabilities with functional experts to ensure treatments that are relevant regarding the specific data involved and the business needs of the project owner.

For our client we were able to tag duplicates with a 95% accuracy, according to business experts' feedback, and to investigate their creation over time in order to take concrete corrective measures.

A second business use case: Data Reconciliation and Enrichment for two major energy companies

Sia Partners has led many assignments related to operational efficiencies. In these cases, a company’s specific on-the-field process is improvable and must be reorganized in order to be more efficient.

This is frequent for many energy companies since they deliver electricity and/or gas from many different sources to numerous customers, people as well as firms. This multiplication of entry and exit service points makes it necessary to rely on an accurate and frequently updated address database.

Nevertheless, address databases generally contain many errors due to typing mistakes as well as the existence of different standards. As a result it is difficult and time consuming to reconcile data from several internal sources scattered along the company departments, and it is even harder when it comes to reconciliation with external sources like open data.

Complétez vos données avec l'outil Smart Completion de noter solution Smart Quality Data

Our tool has then been used to match customer addresses with official gas and electricity service points in order to optimise the efficiency of field interventions and energy delivery. Indeed, to prioritize the maintenance on its grid connections, our client wanted to point out social housing. We provided him with our Smart Data Quality solution to enrich its data with data coming from social landlords to identify these connections. 

Collecting and matching external open data can also lead to better understanding of consumption behaviors. Another energy provider wanted to use the French open data to build an efficient customer segmentation. Smart Data Quality helps our client to enrich its customer data with the Open Data available.


Improving data quality is a key challenge to get the most out of a company's business, and more and more companies are becoming aware of this.

Our customers' challenges are our source of inspiration and therefore our AI ready-to-use solution enables us to efficiently tackle this challenge by providing tools responding to the most frequent use cases and placing back their business experts at the heart of the processes.

Our conviction is that this approach, combining the latest advances in AI technologies with human business expertise, is the future of smart and ethical Data Science.