Implementation of Q-Learning with Python

Q-learning is a popular reinforcement learning algorithm used to make decisions in an environment. It enables an agent to learn optimal actions by iteratively updating its Q-values, which represent the expected rewards for taking certain actions in specific states. Here is a step-by-step implementation of Q-learning using Python:

Image by Author

1. Import the necessary libraries:

import numpy as np

import random

2. Define the environment:

# Define the environment
env = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

# Define the rewards
rewards = np.array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
                    [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]])

# Define the actions
actions = ['up', 'down', 'left', 'right']

3. Define the Q-table:

# Define the Q-table
q_table = np.zeros([env.shape[0], env.shape[1], len(actions)])

4. Define the hyperparameters:

# Define the hyperparameters
alpha = 0.1
gamma = 0.6
epsilon = 0.1

5. Define the training loop:

# Define the training loop
for episode in range(1, 1001):
    state = [0, 0]
    while state != [9, 9]:
        if random.uniform(0, 1) < epsilon:
            action = random.choice(actions)
        else:
            action = actions[np.argmax(q_table[state[0], state[1]])]

        if action == 'up':
            next_state = [max(state[0] - 1, 0), state[1]]
        elif action == 'down':
            next_state = [min(state[0] + 1, 9), state[1]]
        elif action == 'left':
            next_state = [state[0], max(state[1] - 1, 0)]
        else:
            next_state = [state[0], min(state[1] + 1

Neural Network with Q-learning

Neural networks can be used in Q-learning to approximate the Q-values of each state-action pair. This is known as Deep Q-Learning. Here’s how to use a neural network with Q-learning:

1. Define the neural network architecture:

import tensorflow as tf

model = tf.keras.models.Sequential([
    tf.keras.layers.Dense(64, input_shape=(state_size,), activation='relu'),
    tf.keras.layers.Dense(64, activation='relu'),
    tf.keras.layers.Dense(action_size, activation='linear')
])

2. Define the hyperparameters:

# Define the hyperparameters
alpha = 0.1
gamma = 0.6
epsilon = 0.1

3. Define the training loop:

# Define the training loop
for episode in range(1, 1001):
    state = env.reset()
    state = np.reshape(state, [1, state_size])
    done = False
    while not done:
        if np.random.rand() <= epsilon:
            action = random.randrange(action_size)
        else:
            q_values = model.predict(state)
            action = np.argmax(q_values[0])

        next_state, reward, done, _ = env.step(action)
        next_state = np.reshape(next_state, [1, state_size])

        target = reward + gamma * np.amax(model.predict(next_state)[0])
        q_values = model.predict(state)
        q_values[0][action] = (1 - alpha) * q_values[0][action] + alpha * target

        model.fit(state, q_values, verbose=0)

        state = next_state

4. Evaluate the model:

# Evaluate the model
scores = []
for episode in range(100):
    state = env.reset()
    state = np.reshape(state, [1, state_size])
    done = False
    score = 0
    while not done:
        q_values = model.predict(state)
        action = np.argmax(q_values[0])
        next_state, reward, done, _ = env.step(action)
        next_state = np.reshape(next_state, [1, state_size])
        score += reward
        state = next_state
    scores.append(score)
print(np.mean(scores))

In this implementation, we use a neural network with two hidden layers of 64 neurons each and a linear output layer with the same number of neurons as the number of actions. The neural network is trained using the Q-learning algorithm, which updates the Q-values of each state-action pair based on the Bellman equation. The hyperparameters alpha, gamma, and epsilon control the learning rate, discount factor, and exploration rate, respectively.

Search This Blog

Implementation of Q-Learning with Python

Neural Network with Q-learning

Comments

Post a Comment

You may like

Latest Posts

SwiGLU Activation Function

Position Embedding: A Detailed Explanation

How to create a 1D- CNN in TensorFlow

Introduction to CNNs with Attention Layers

Meta Pseudo Labels (MPL) Algorithm

Video Classification Using CNN and Transformer: Hybrid Model

Liquid Neural Networks: Introduction