April 2020 Bank Secrecy Act / Anti-Money…
Over the past decade, tremendous advances have been made in Natural Language Processing, our experts in datascience Sia Partners show you how they work and discuss their scope.
NLP (Natural Language Processing or Automatic Language Processing in French) has been one of the most active fields of Data Science over the past decade. Many models are developed and published regularly by GAFAM or other research laboratories.
These models are often more efficient than their predecessors in most linguistic tasks thanks to a different algorithm, a more consistent learning base or a revised primary learning objective.
In November 2018, Google AI (Google's Artificial Intelligence R&D department) released the BERT model which revolutionized NLP, guided the way forward for future models and now serves as a basis for comparison with new works.
Thanks to Sia's R&D work Partners concrete projects are taking shape on the innovative subjects of Data Science. These industrial achievements, some of which we will detail below, at the cutting edge of the latest innovations, can for example take the form of voicebot , comment analysis and fraud detection tools.
Before explaining how it works, it is important to understand that BERT is a model of embedding , ie it allows, thanks to the training of a neural network, the representation of text (words or sentences) in the form of a digital vector according to the context. This representation then makes it possible to be used more easily by a machine and in particular by the Machine Learning algorithms that we are implementing.
BERT stands for Bidirectional Encoder Representations for Transformers. To try to see more clearly we will simply explain the key concepts behind this linguistic representation tool.
Bidirectional indicates that the model will work both ways of a sentence. Typical human reading is only one way (left to right, right to left, or top to bottom) so the vast majority of language models have been created in the same way. That is, often by trying to predict the next word from the beginning of the sentence. BERT breaks free from this idea by allowing itself to be read in both directions. Illogical from a human point of view, - predicting a word from the next word - this makes sense for an algorithm that seeks to define the meaning of words in the context of the sentence containing them. To achieve this goal, Google's model hides a proportion of the words in the sentence from the algorithm so that it tries to predict it. This will allow to reach a large amount of data because any sentence can be used for learning by BERT. Labeled sentences are no longer directly necessary to train a “contextualized” model.
Encoder Representations defines what is the basis of NLP models. Namely to represent a textual data in a simple way for the computer and allowing generalized modifications. This part is also called embedding, it will be used to represent a textual object (word, sentence, paragraph, document, etc.) numerically using a vector. In the case of BERT, this embedding is done on the set of words ( word embedding ) of a sentence and on the sentence itself (sentence embedding ) using a normalized numeric vector of 768 values. Thanks to the bi-directional appearance of the model, each word represented by a vector will contain a definition of the word within the context of the sentence. For example, in the sentences "Steve Jobs eats an apple "and" Steve Jobs created Apple ”, the words“ apple ”, although spelled identically have a different meaning. BERT will allow this peculiarity to be grasped and will construct distinct vectors between the two apples. This contextualized representation of words is a strong change from the more traditional (and well-known) digital representations offered by NLP.
Finally, the Transformer is the heart of the BERT model. It represents the purely technical part of the model and we will try to explain it simply. A Transformer is a succession of neural networks called layers of attention. These layers (existing in the form Encoder and Decode ) are used to retain the points of attention of the sentence which one seeks to digitize. For example, in the sentence "The dog gnaws on the bone because it is tasty", the personal pronoun "it" refers to the bone and not to the dog. The layers of attention will make it possible to capture this link and to build vectors of words containing this link. The transformer will thus produce at the output of all the layers the embedding of each word of the sentence as well as the embedding of the sentence itself.
In addition, BERT pursues a second objective which is not explained in the acronym. This is the Next Sentence Prediction (NSP), the purpose of which is to qualify the link that may exist between two sentences. For the training phase, two sentences are proposed to BERT which must determine if the second is the continuation of the first (a question and its answer for example). The downside of this task is that it requires a more complex dataset to obtain (a sample of two consecutive sentences, a sample of two independent sentences).
To recap, BERT therefore has the advantage of representing language (word or sentence) numerically while preserving the context. Its second advantage is to have facilitated learning by needing basic sentences in which certain words will be masked then predicted and pairs of consecutive or independent sentences.
In the summer of 2019, Facebook AI developed the model RoBERTa (Ro for Robustly optimized ) which broadly incorporates the BERT model with some modifications.
These modifications are found mainly in the model training hyperparameters (batch size, training rate, data set size) but the main change is the removal of the objective of NSP. So RoBERTa specializes only on the task of contextually encoding a linguistic object. This slight variation allows the model to perform better than its BERT predecessor in many language-related areas. On the other hand, the removal of the NSP objective will decrease the performance on key tasks, such as the detection of intentions.
BERT was designed for use on the English language only. But its construction and learning principle is fully transferable to other languages provided sufficient data is available.
In October 2019, Facebook, in partnership with INRIA, developed CamemBERT. This model is none other than the model RoBERTa transposed to French and which was trained on Oscar's French dataset (Open Super-large Crawled Almanac corpus ) representing 138 GB, or more than 23 billion words. The result is a very powerful tool in linguistic tasks compared to the multilingual models frequently used for French.
For Sia Partners, armed with strong expertise in the fields of artificial intelligence, Camembert is one of the tools providing a real novelty for French subjects. This linguistic model (or others very similar) is therefore used in many use cases for the valuation of textual data.
The solution Deep review developed by Sia Partners is based on a simple idea: to evaluate the feelings of users through comments posted online or through internal satisfaction barometers. Manual analysis of these textual returns, which amount is growing exponentially, is often time consuming and does not provide “intelligent” decision support indicators for marketing and operational teams.
This is where BERT and the NLP in general bring the added value which Sia Partners has. Our Data Science teams have in fact built algorithms to perform different language comprehension tasks in order to restore and synthesize the content of the comments analyzed.
The product Deep Review has a sentiment analysis tool. That is to say, it is able to find the positivity (or negativity) of a sentence. Here BERT will be used to digitize the comments. The vectors will then be used to create a classification model (neural network, random forest, or other) between negative, neutral or positive.
A lot of data is necessary to obtain a sufficiently powerful tool. These data must also be labeled between the three final classes (negative, neutral, positive) and in equivalent proportions to avoid bias.
In the end, after training our model on several hundred thousand comments, the classifier correctly categorizes the data 95% of the time. Such a powerful tool allows for in-depth analysis of sentiment analysis.
This sentiment analysis is then coupled with a thematic analysis thus making it possible to understand for each comment the themes mentioned and the associated satisfaction or dissatisfaction.
These results thus make it possible to have key indicators for the user to better understand his network of points of sale or his products, with in particular the following main functionalities:
A second use case frequently encountered at Sia Partners is the detection of intentions. Unlike sentiment analysis, the goal of intention detection is more complex. We do not seek to classify a linguistic object (word, sentence) between three categories but between a very large number (ten, a hundred, sometimes more) and sometimes without having defined them in advance. To achieve this objective, the means are also more difficult to build.
For example, one of the possible solutions for the detection of intentions is to build a labeled base of examples of linguistic objects related to an intention. Then, we bring the target sentence closer to one of the intentions.
As said above, this model is built with the objective of predicting the missing words of a sentence. Despite the many qualities of BERT models, the comparison between two sentences quite often leads to inconsistencies and poor results compared to other models.
To overcome this difficulty, the BERT model should be specialized. Siamese neural networks are then used to specialize in NLI (Natural Language Inference ).
The NLI is the comparison of two sentences and the relation between them. Sentence A and Sentence B can be neutral, talk about the same subject with a similar opinion, or talk about the same subject with an opposite opinion. From these three possibilities of link between the sentences it is possible to specialize a BERT model so that the digital vectorization created preserves the first knowledge (definition and context) and that this representation can be compared to another - with the ability to get the quantified relationship that binds them together. A positive value would indicate an existing and positive link, a negative value an existing but opposed link, and a null value no link.
Thus, the detection of intentions can be done by bringing the target sentence closer to an intention either with links of positive values (we bring closer the intention that deals with the same subject and with the same polarity) or with links of positive absolute values (we compare the intention that deals with the same subject).
An existing use case encountered by Sia Partners for this problematic was the construction of a conversational agent. The goal of the project was to build an algorithm capable of responding to calls from a user and providing adequate answers to the questions asked. Intent detection was taken into account when a caller asked a question. The tool must then reconcile the question asked with an intention previously recorded in a reference base. If an intention is detected, the conversational agent can then provide the appropriate response.
By using this same specialization of BERT models, another possible solution to do intention detection is to cluster (reconciliation). That is to say to bring the sentences together according to their sensitivity. The advantage is that there is no longer any need to build a base of intentions by hand. The downside is that matching sentences can produce results that are difficult to interpret.
BERT therefore popularized the frequent use of Transformers in linguistic models. It is today a benchmark model used in many areas of NLP. Thanks to its data expertise, Sia Partners has been able to develop innovative solutions based on this technology, such as sentiment analysis and the detection of intentions presented above. Today, with the advancement of these linguistic models, and other models such as GPT-2 from OpenAI (Tesla group) capable of building realistic linguistic content (answer, description, biography, etc.) from structured data, new perspectives will be created and many projects will be able to take shape in Machine Learning .