Why understanding the LSTM model is still important today

This article is written by Kees, the founder of Smart AI Solutions. Kees holds a Master of Science in Econometrics from the Erasmus University and will share his opinion and views regularly for those who are interested in the AI and machine learning field.

Last year I heared many data science professionals stating that recurrent neural networks, such as the LSTM model are being outcompeted by convolutional neural networks for a range of tasks. Some of them say it is not necessary anymore to focus too much on these models. I agree that for some tasks the LSTM model is indeed being outcompeted by convolutional neural networks, but the LSTM model remains a very powerful model. There are enough exampels for which the LSTM model is doing a better job than convolutional neural networks. As always the model architecture you are using is just one of the many pieces of the puzzle. LSTM layers are still being used in the Alexa engine of Amazon for speech recognition for example. They probably would not use these layers when they did not posess powerful properties. But I want to stay out of this discussion and give you one very good argument to still learn about the workings of recurrent neural networks and especially the LSTM model: by learning the LSTM model you will grasp essential fundamentals you have to understand about training deep learning networks that also pop up in other model architectures such as convolutional neural networks. Examples of these fundamental problems are the vanishing and exploding gradient problem, the concept of memory in neural networks and understanding the inherent structure of temporal data. This blog is especially directed to data scientists for the first time learning about the LSTM model. It's about my journey and along the way I ask you some questions to answer for yourself. So I'm not giving you any answers, but by thinking for yourself you will definitely learn a lot.

Years ago I was first learning about the LSTM model, because I wanted to use this model to predict financial time series data for my graduate thesis. Before I started learning about this model all these years ago I actually never experimented or worked with neural networks before.

So when I first dived into the long short term memory model I was hit by a bus caused by the seemingly complexity of the model. Yes, it really hurted when I dived into this model for the first time. I decided to take a step back and just started out with learning about feed forward neural networks. This was a pretty good starting point and I quickly grasped the theory and the way you train neural network models. After getting how neural networks work I decided not to move directly to the long short term memory model again but instead I started learning about simple recurrent neural networks. Simple recurrent neural networks are also not really hard to understand, but a little bit harder then feed forward neural networks because you have to learn about the vanishing or exploding gradient problem. Actually I would recommend everybody trying the understand the long short term memory model to really carefully study simple recurrent neural networks and really understand the effect of the vanishing gradient problem.

Now that I was confident understanding feed forward and simple recurrent neural networks I decided to go back again to the long short term memory model. I have to admit that learning about the LSTM model for the first time can be pretty tough. Many data science professionals are talking about this model and implementing it, but I wonder how many people really understand the basics of this algorithm. To test your knowledge about this model try answering the following questions:

How does the long short term memory model solve the vanishing/exploding gradient problem?

And maybe another one:

What is the number of weights estimated in a long short term memory model?

I ask these two questions because there seems to be a lot of discussion about these questions in online fora. I have to admit that it took me quite some time wondering my head about these questions and maybe twice the time to understand the LSTM model, but once you have the idea that you are understanding the model it gives you a really satisfied feeling. The point I want to stress is that the route I described above really worked well for me in understanding the LSTM model. I didn't get it from one day to another but actually l I think it took me something like two weeks to go from just a simple feed forward neural network to grasping the LSTM model.

It is now already years ago I learned about the LSTM model but with this blog post I wanted to motivate young aspiring data scientist to learn about this model. With learning I mean really trying to understand the inner workings of this model. I have used the LSTM model many times in the last couple of years in different client projects and I still enjoy the benefits of having learned about the fundamentals of this models years ago. Lastly I just want to add that after all these years I am still amazed by this model and the applications of it. Hochreiter and Schmidhuber, the inventors of this model, are just brilliant. I hope this post is going to help you understand the LSTM model. My advise is to start simple and end with more complex details. Then you will probably avoid being hit by the bus;)

Smart AI Solutions

Why understanding the LSTM model is still important today

Comments

Post a Comment

Popular Posts

AI ventures: Creating an AI instruction platform for health care

ING: Applied AI training for ING software engineers