A Beginner’s Guide to Encoder-Decoder (Seq2Seq) Models

📚 Table of Contents

Introduction
What is an Encoder-Decoder Model?
Example Application: Language Translation
Step-by-Step Tutorial with Python
Summary
References

🚀 Introduction

Encoder-decoder models (also known as sequence-to-sequence or Seq2Seq models) have transformed Natural Language Processing (NLP) tasks such as machine translation, text summarization, and question-answering systems. They allow computers to map input sequences (e.g., sentences in one language) to output sequences (translations in another language).

This guide provides a clear, straightforward introduction to how these models work, supported by an easy-to-understand Python example.

🤖 What is an Encoder-Decoder Model?

A Seq2Seq model consists of two main parts:

Encoder: Processes the input sequence and converts it into a context vector (also known as the hidden state), capturing the meaning of the sequence.
Decoder: Takes this context vector and generates the output sequence, one word at a time.

The key advantage of this architecture is its flexibility in handling sequences of varying lengths and complexity.

🌐 Example Application: Language Translation

Imagine translating an English sentence into German:

Input (English): “Hello, how are you?”
Output (German): “Hallo, wie geht es dir?”

Here, the encoder reads the English sentence, converts it to a context vector, and the decoder generates the German translation from this context.

🛠️ Step-by-Step Tutorial with Python

Let’s create a simplified encoder-decoder model using PyTorch to translate short English phrases to German.

📌 Installing Dependencies

Install the required packages:

pip install torch torchtext numpy

data = [
    ("hello", "hallo"),
    ("how are you", "wie geht es dir"),
    ("good morning", "guten morgen"),
    ("thank you", "danke"),
    ("good night", "gute nacht")
]

📌 Preparing Data

We’ll create a tiny dataset manually for simplicity:

import torch
import torch.nn as nn
import torch.optim as optim

# Encoder definition
class Encoder(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(Encoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(input_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)

    def forward(self, input, hidden):
        embedded = self.embedding(input).view(1, 1, -1)
        _, hidden = self.gru(embedded, hidden)
        return hidden

# Decoder definition
class Decoder(nn.Module):
    def __init__(self, hidden_size, output_size):
        super(Decoder, self).__init__()
        self.hidden_size = hidden_size
        self.embedding = nn.Embedding(output_size, hidden_size)
        self.gru = nn.GRU(hidden_size, hidden_size)
        self.out = nn.Linear(hidden_size, output_size)
        self.softmax = nn.LogSoftmax(dim=1)

    def forward(self, input, hidden):
        output = self.embedding(input).view(1, 1, -1)
        output = torch.relu(output)
        output, hidden = self.gru(output, hidden)
        output = self.softmax(self.out(output[0]))
        return output, hidden

📌 Building a Simple Seq2Seq Model

Here’s a simple encoder-decoder model in PyTorch:

encoder = Encoder(input_size=10, hidden_size=16)
decoder = Decoder(hidden_size=16, output_size=10)

encoder_optimizer = optim.Adam(encoder.parameters(), lr=0.01)
decoder_optimizer = optim.Adam(decoder.parameters(), lr=0.01)
criterion = nn.NLLLoss()

# Simplified training loop example
for epoch in range(100):
    input_tensor = torch.tensor([1,2,3])  # dummy inputs
    target_tensor = torch.tensor([1,2,3]) # dummy targets
    encoder_hidden = torch.zeros(1, 1, 16)

    encoder_optimizer.zero_grad()
    decoder_optimizer.zero_grad()

    encoder_hidden = encoder(input_tensor[0], encoder_hidden)

    decoder_input = torch.tensor([0])  # start token
    decoder_hidden = encoder_hidden

    loss = 0
    for di in range(target_tensor.size(0)):
        decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
        loss += criterion(decoder_output, target_tensor[di].unsqueeze(0))
        decoder_input = target_tensor[di]

    loss.backward()
    encoder_optimizer.step()
    decoder_optimizer.step()

📌 Using the Model for Prediction

After training, you can generate translations:

encoder_hidden = torch.zeros(1, 1, 16)
input_tensor = torch.tensor([1,2,3])
encoder_hidden = encoder(input_tensor[0], encoder_hidden)

decoder_input = torch.tensor([0])  # start token
decoder_hidden = encoder_hidden
predicted_sentence = []

for di in range(3):
    decoder_output, decoder_hidden = decoder(decoder_input, decoder_hidden)
    topv, topi = decoder_output.topk(1)
    predicted_sentence.append(topi.item())
    decoder_input = topi.squeeze().detach()

print("Predicted sentence indices:", predicted_sentence)

🎯 Summary

Encoder-decoder (Seq2Seq) models are powerful tools in NLP tasks like translation. Understanding their structure helps in building effective and versatile NLP solutions.