How to Fine-Tune DeiT: Data-efficient Image Transformer Skip to main content

How to Fine-Tune DeiT: Data-efficient Image Transformer

If you're interested in the latest advances in deep learning for computer vision, you may have heard about DeiT, or the Data-efficient Image Transformer. DeiT is a state-of-the-art model for image classification that achieves impressive accuracy while using fewer training samples than its predecessors. In this blog post, we'll take a closer look at DeiT and how you can implement and fine-tune it in TensorFlow.

What is DeiT?

DeiT is a model developed by researchers at META AI that builds on the success of the Transformer architecture, which was originally developed for natural language processing tasks. Like the Transformer, DeiT uses self-attention to process input data, allowing it to capture complex relationships between image features. However, DeiT is specifically designed for image classification tasks, and achieves this by using a novel distillation-based training method that enables it to be trained on smaller datasets than previous models.

The key innovation behind DeiT is the use of distillation through attention. This involves training a smaller "student" model to mimic the behavior of a larger "teacher" model by paying attention to the same parts of the input. The student model is then fine-tuned on a smaller dataset and can achieve similar accuracy to the teacher model while using far fewer training samples. In the case of DeiT, the teacher model is a much larger model that is pre-trained on a large dataset, while the student model is trained using distillation on a smaller dataset. 

How to Implement DeiT in TensorFlow

Implementing DeiT in TensorFlow is relatively straightforward, thanks to the availability of open-source implementations from Facebook AI and the TensorFlow community. Here are the steps you can follow to implement and fine-tune DeiT in TensorFlow:

  • Install the necessary packages and dependencies, including TensorFlow, the TensorFlow model garden, and the PyTorch Lightning framework.
  • Download the DeiT model weights and configuration files from the official GitHub repository, or use the pre-trained models available in the TensorFlow model garden.
  • Load the model into TensorFlow using the appropriate API, depending on the model format.
  • Fine-tune the model on your own dataset using transfer learning techniques, such as freezing the early layers of the model and training only the later layers.
  • Evaluate the performance of the model on your test set, and adjust the hyperparameters as needed to achieve optimal accuracy.

By following these steps, you can easily implement and fine-tune DeiT in TensorFlow for your own image classification tasks.

Keras Implementation of DeiT

In this example code, we first load the DeiT model architecture and weights using the load_model function in TensorFlow We then freeze the first few layers of the model using a for loop and add a new dense layer for classification. We compile the model with an optimizer, loss function, and metrics.
Next, we prepare the data for training and validation using ImageDataGenerator. We define the training and validation directories, the target size of the images, the batch size, and the class mode.
We define callbacks as saving the best model and early stopping. We then train the model on the data and validate using the fit function. We pass in the training and validation generators, the number of steps per epoch, the number of epochs, and the callbacks.
Finally, we evaluate the model on a test set using evaluate_generator. We load the best model weights and pass in the test generator and the number of steps. We print the test accuracy and plot the training and validation loss and accuracy.
Here's a TensorFlow example for fine-tuning and testing DeiT on a specific dataset: 
import tensorflow as tf
from tensorflow.keras.layers import Input, Dense
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.metrics import categorical_accuracy
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
from tensorflow.keras.applications import EfficientNetB7
from tensorflow.keras.models import Model
from tensorflow.keras.utils import plot_model
import matplotlib.pyplot as plt

# Load the DeiT model architecture and weights
deit = tf.keras.models.load_model('deit_model.h5')

# Freeze the first few layers of the model
for layer in deit.layers[:-10]:
    layer.trainable = False

# Add a new dense layer for classification
x = deit.layers[-2].output
predictions = Dense(3, activation='softmax')(x)
model = Model(inputs=deit.input, outputs=predictions)

# Compile the model with an appropriate optimizer and loss function

# Prepare the data for training and validation
train_data_dir = 'train_data'
validation_data_dir = 'validation_data'

train_datagen = ImageDataGenerator(
    rescale=1. / 255,

validation_datagen = ImageDataGenerator(rescale=1. / 255)

train_generator = train_datagen.flow_from_directory(
    target_size=(224, 224),

validation_generator = validation_datagen.flow_from_directory(
    target_size=(224, 224),

# Define callbacks for saving the best model and early stopping
filepath = "best_model.h5"
checkpoint = ModelCheckpoint(filepath, monitor='val_categorical_accuracy', verbose=1,
                             save_best_only=True, mode='max')
early_stop = EarlyStopping(monitor='val_categorical_accuracy', patience=5, mode='max')

# Train the model on the data and validate
history =
    steps_per_epoch=train_generator.samples // train_generator.batch_size,
    validation_steps=validation_generator.samples // validation_generator.batch_size,
    callbacks=[checkpoint, early_stop])

# Evaluate the model on a test set
test_data_dir = 'test_data'
test_datagen = ImageDataGenerator(rescale=1. / 255)
test_generator = test_datagen.flow_from_directory(
    target_size=(224, 224),

test_loss, test_acc = model.evaluate_generator(test_generator, steps=test_generator.samples // test_generator.batch_size)
print('Test accuracy:', test_acc)

# Plot the training and validation loss and accuracy
plt.title('Model loss')
plt.legend(['Train', 'Validation'], loc='upper left')
plt.title('Model accuracy')
plt.legend(['Train', 'Validation'], loc='upper left')


DeiT is an exciting development in the field of deep learning for computer vision and has the potential to enable more efficient and accurate image classification using smaller datasets. By using distillation through attention, DeiT is able to learn from the behavior of larger, pre-trained models and achieve state-of-the-art performance on image classification tasks. With the availability of open-source implementations in TensorFlow and other frameworks, it is now easier than ever to experiment with DeiT and other advanced deep-learning models.


You may like

Latest Posts

SwiGLU Activation Function

Position Embedding: A Detailed Explanation

How to create a 1D- CNN in TensorFlow

Introduction to CNNs with Attention Layers

Meta Pseudo Labels (MPL) Algorithm

Video Classification Using CNN and Transformer: Hybrid Model

Graph Attention Neural Networks