Emerton Data — Deep learning in insurance

The definition of deep learning and a few historical elements

Deep learning is an ensemble of machine learning technics inspired by neural networks, which involves multiple layers of models. Deep learning is a buzzword (rebranding of neural network, although including now more layers) and covers a wide range of algorithms: Convolutional Neural Networks, Recurring Neural Networks, Long-Short Term Memory, are among the main ones.

The roots of deep learning are as old as the mid-20th century. The first algorithmic neural network, inspired from biology, is the 1957 Perceptron from Rosenblatt.

The first big step in the field happened in the 80s, when lastly a meaningful framework to optimize parameters was used in the neural network environment, namely gradient methods. At this stage the efficiency of these new methods was not proved at all.

One had finally to wait until the 2010s when a double revolution happened. This revolution was both linked to the volume of data (release of the ImageNet dataset) and to the increase of computational power (GPU Nvidia Cuda 1Trillion Operations per Seconds). Then deep learning started to win against any other methods in image processing and progressively also in the Natural Language Processing.

Then a revolution followed, with an incredible spreading-over in a very short time period, roughly 1.5 years.

How does it work?

To make the long story short, understanding deep learning algorithms amounts to understand two aspects: architecture and back-propagation.

The architecture of a neural network involves multiple neurons, interacting between themselves, in successive layers. These are as many parameters (number of layers, type of interactions, weights on each cells), that the machine will “learn” by confronting this architecture to real world data.

Here comes the second element: back-propagation. The idea is very simple: every time a new input enters the network, the output gives an error (from a loss function). Then this error is back-propagated so that each neuron is attributed a contribution to the error. The contribution to the error is used to optimize weights by gradient descent, in order to ultimately minimize the loss function.

Anyway the real magic is that in deep learning, everything is optimized simultaneously.

These are the high level principles on the method, the purpose is not to dig into the technical details. Instead one can focus on the key elements driving success in deep learning.

Practically speaking, what is needed in order to make it working?

Large volume of data: because there are hundreds of thousands of parameters to calibrate, the number of observations must be way bigger!
A certain type of input data, where the added-value is proven. This is typically unstructured data: image, text, and audio. The intuition here is that complex and deep neural networks bring a high added value when the variables used are rather complex to infer. This is typically the case for images, text files, and audio files.
Deep learning has less proved significant breakthrough in other applications with structured data, as compared to baseline machine learning algorithms.

Let’s try to illustrate with two standard examples: image and text.

One of the major classes of deep learning architecture used for image processing is Convolutional Neural Networks. They typically identify and focus on various parts of an image, and then assemble them in order to achieve classification of images. These methods will learn the contours, the forms, on the left, on the right, the contrast level, etc. This is the high added value part. Then decision made by the machine relies on relatively simple rules.

Anytime that the data contains a historical or temporal dimension, the deep learning architectures used are called Recurrent Neural Networks, and among them one of the most popular is the LSTM standing for Long-Short Term Memory.

They have loops in themselves so that they can take into account the past (short term and long term). Information becomes persistent. The most famous example is the one of inferring a word in a sentence. The memory effect is needed in order to infer the very last word of the following sentence: “I have been leaving in Germany for 5 years … I am fluent in German”. In this example a standard model could learn that a word following “fluent in” will be the name of a language. In order to predict that this language is German, the model needs to incorporate a memory effect: “Germany” appears earlier in the sentence.

LSTM can also be seen as the equivalent of time series’ econometrical models in the deep learning field. It is strongly believed that there is still large innovation opportunities in this kind of approaches, both in finance (where many time series are at stake, obviously), and also in the insurance industry, where time series and space-time series are becoming omnipresent (we will come back on this point in the next paragraph).

The Eldorado in Insurance?

Deep learning has been and is definitely giving a new boost to Artificial Intelligence. However it won’t be the solution to any problems for insurers! Let’s dig into a few details in order to understand where one can expect low or high added value of deep learning in the insurance sector.

As of today, no one is able to understand nor clarify why deep learning is performing so well on numerous use cases, therefore deep learning is not natively conductive to interpretation.

On the other hand, one of the main tasks of the insurer is to understand the risk and its drivers, in order to best prevent and manage it.

The interpretability is required from several perspectives: regulation, audit, decision making and stability. Interpretability is actually a field of investigation in the insurance sectors, in big groups and also in InsurTech start-ups. The idea is to take the outcome of Deep Learning and then to explain it by an interpretable model…But this is just the beginning of the story.

This explains briefly why deep learning has not any straightforward uses in the pure “risk” applications that insurers face.

However, there are many other applications in insurance, and this is a very active field. Why?

Insurance is an information business, therefore with flows of information and data. And today a big part of this data is made of images, texts, and audio. As mentioned above, the added value of deep learning is clear for these types of unstructured data. The main areas where deep learning is impacting the insurance business are probably:

… already in the short-term:

Claims: e.g. assess claim cost based on images
Fraud: asses fraud in images, speech, texts
Customer experience and new services: satisfaction, quick quotation
Augmented intelligence in Robotic Process Automation (which may lead to a shift of operational risk)
Parametric insurance for agriculture, farming, or any weather sensitive activities

…. and even more in the longer-term:

All IoT data generated: time-space series for instance for connected health or telematics businesses.

These applications are numerous, they are designed to improve the service and therefore customer satisfaction. They are also designed to improve efficiency.

To conclude, it is quite clear that deep learning technologies will not serve in the short term all data analytics needs of insurer carriers, whose job is mainly to understand risks and interpret related models and algorithms. As of today deep learning is not understood and therefore does not lead to such desired models

However, there will be in short & long terms a massive use of deep learning technologies in insurance, everywhere there are images, texts, audio, and IoT generated data.

There is currently a big underlying debate on the future and the role of human employees in this context. What will be the real impact on insurance jobs? Will employees be massively replaced? Will there be a shift of roles played by humans?

‍