Reinforcement Learning: Frozen Lake
#day1 of #100daysofcode
It took me a little time to take the #100daysofcode challenge. But now I have publicly announced that I will code daily at least for one hour and I also decided to document my work. In this series of articles, I am going to share my learning and also the resources I have used. Stay tuned to learn with me. I have decided to devote one article to each day. That’s how if anybody wants to reproduce my work then they can do so by following the resources I have used. I encourage you to join this challenge with me and learn amazing things.
There is a lot to learn and great things are happening every day. I have modified this challenge by adding this documentation step. I will be sharing my whole day learning and approach here. I hope this helps you in any way.
Let’s get it started!! yeaaaaa
This is my first day. I have done some setup for the rest of the days. I want to commit my daily code on Github. To have my work in a structured way I created a folder with name 100-days-of-code and in it, I have created 100 folders with name day1 to day100. Of course, I have not done it manually. I am working on a Ubuntu machine and my friend Mr. terminal was there to help me. So it was simple. I typed the following command and all was done in microseconds.
Today I worked on Reinforcement Learning and implemented Frozen Lake gameplay. The problem description is:
Winter is here. You and your friends were tossing around a frisbee at the park when you made a wild throw that left the frisbee out in the middle of the lake. The water is mostly frozen, but there are a few holes where the ice has melted. If you step into one of those holes, you’ll fall into the freezing water. At this time, there’s an international frisbee shortage, so it’s absolutely imperative that you navigate across the lake and retrieve the disc. However, the ice is slippery, so you won’t always move in the direction you intend.
The surface is described using a grid-like the following:
SFFF (S: starting point, safe)
FHFH (F: frozen surface, safe)
FFFH (H: hole, fall to your doom)
HFFG (G: goal, where the frisbee is located)
The episode ends when you reach the goal or fall in a hole. You receive a reward of 1 if you reach the goal, and zero otherwise.
I watched this Youtube series on Reinforcement Learning by deeplizard and worked accordingly. You must check out the Openai Website for amazing stuff on Reinforcement Learning.
I have worked on supervised and unsupervised machine learning problems earlier and have quite a good experience of them. I want to work on the Reinforcement Learning problem and that’s why I choose it on the first day of the challenge.
Reinforcement Learning is a very powerful tool when we don’t have any data to solve a particular problem. For supervised learning, we need well-labeled data. For unsupervised learning also we need good amount of data. But there are some problems which can not be tackled by these approaches. We need to use reinforcement learning over there.
I worked on a very simple problem. There more amazing problems that we can tackle using Reinforcement Learning. My first day was great working on this amazing problem. I have coded it and committed it to Github. Here is the link if you want to check.
Let me briefly explain the implementation. There is a lot to explain in detail that it deserves its own post. Maybe later on :)
The brief concept is: There are four things — Agent, Environment, Action, Reward.
Don’t be afraid of these. They are just terminologies and once you know them they are super easy.
Action — any activity like moving left or right
Environment — the surrounding in which action has to be taken
Agent — the one who takes action
Reward — is the appreciation given if Agent takes the right Action.
The agent’s aim is to maximize its reward. Concisely this is the concept. Now there are various methods to do all of this. Let’s get into code. I am using the code-first approach here. I explain and show you the code also. So that you should have an idea of how it is done.
Code
The first step as always import the required libraries. I am using the “gym” library provided by Openai to experiment with different environments. The rest of the libraries are the very basic one you must know them if you have worked with Python earlier. “clear_output” is used in Jupyter Notebook to clear out the output of the cell. We will see its use in the last when we will print the actions of the agent.
Here we have created an instance to the “Frozen Lake” environment.
These are environment variables. Like “num_episodes” means how many times we want to repeat the process of reaching the goal state. The rest of the variables need some background knowledge to understand. I will explain these in a separate article that will be dedicated to Reinforcement Learning. In this post, my aim is to give you a basic concept overview. So that you should know what is Reinforcement Learning and how to approach it.
Next, we create a Q — table which will have values for each action for the given state. Basically you can imagine it as the 2D matrix as shown below.
This is our Q — table at initial stage.
We need a list to store the reward of each episode so that later on we can analyze the result.
Above is the algorithm to update the Q — table.
Above is the implementation for updating Q — table and get the optimal one.
This shows how many times our agent reached the goal state. The values shown should be interpreted like 0.654 means our agent 65.4 % times reached the goal state.
This how our Q — table looks like after it has been updated.
So far our agent learned but we need to see how it is working. What are the results? We need to see the visual results and working of the agent.
So that’s how I finished my day 1 of #100daysofcode
My main aim with this post is to give you an overview of Reinforcement Learning and also motivate you to work with it. I know this post is not a detail explanation of the solution. I will cover Reinforcement Learning in future articles. Slowly and gradually we will learn it. It is amazing and fun learning these kinds of tools. I enjoyed it. Stay tuned for the next one.