Skip to main content

Recognizing AI-generated Faces Using Deep Learning

 


Artificial intelligence is reshaping the world. This technology is changing the way we handle our daily tasks. Deep learning is one method to go toward artificial intelligence and it has so far shown great significance when applied to various areas, from medicine to computer vision.  However, deep learning showed also that it can be used to harm or help in fraudulence, depending on how it is applied. One example of this is what is called Generative adversarial learning. In 2016, generative adversarial networks (GANs) were presented to the world, and since then many versions of these networks were developed. A GAN can be defined as a machine learning model that consists of two neural networks competing with each other. These two networks are called generator and discriminator, and each has a different role; one is for generating images from random noise given as input, while the latter is for detecting whether the generated images are real or fake.  Finally, this whole process is repeated over a huge number of iterations until the generator becomes good enough to generate images similar to those of the dataset.

In 2018, CyleGAN was proposed by NVIDIA, which is a modified version of GAN, in which it consists of two generators and two discriminators that provide image-to-image translation, i.e., transforming images from domain A to domain B and vice-versa. This network showed great generalization capability when it was applied to generating realistic fake human faces. It made everyone feel that there are new humans on this earth. Moreover, DeepFake technology and other methods showed that human-like fake faces can be generated and they are hard to be distinguished from original human faces.

Thus, there is an urgent need to use deep learning itself to fight another deep learning method that is used to create humans or faces that do not exist. This is because these faces can be for bad reasons such as fraudulence. Hence, it is good to have a deep network that can recognize whether a face is real or AI-generated.

In this article, we will provide a deep learning model that is trained to detect and classify faces into:

  • AI-generated faces
  • Non AI-generated faces (Real)



Before we start building a deep learning model for recognizing fake faces, I would like you to look at the figure above and try to find out which face is the fake one. Note that there is only one fake face among the three faces. The aim of this test is to pinpoint the challenge and difficulty of recognizing (by humans) the fake faces generated by powerful deep networks such as GANs.

 

Deep Learning for Detecting Fake Faces

Fake face detection is a classification task. A face image must be given to a trained network that would classify it as a real or AI-generated (fake) face. When it comes to classification, we have a lot of options. There are numerous deep networks that are trained on huge datasets such as ImageNet, which provided them with the powerful feature extraction capability that helps in accurate classification, when applied to new targets, in case of transfer learning.

In this work, we will apply transfer learning as it makes the training faster and it needs no huge number of training images. We will use the ResNet18 deep network and train it using two different datasets of real and AI-generated face images.  ResNet18 is a pre-trained model that consists of 18 convolutional layers and it uses skip connections that were proposed in the residual learning networks. For fine-tuning and testing this network, we will use the 1 m million fake faces and the UTKFace for fake and real faces, respectively. Figure 2 shows the process of the transfer learning approach of the ResNet18 that will be trained to recognize fake faces. As seen, the process consists of 3 main stages. The first one is to load and preprocess the dataset images, then load the pre-trained ResNet18 model, and finally reshape it to fit our new classification task by changing the number of nodes of its final fully connected (FC) layer to 2, representing Real and Fake faces. 


 

Loading Dataset

The first stage of our work will be to load and preporcess the dataset images. We will use two datasets and combine them together to collect as many images as possible, of both real and fake faces. UTKFaces and 1-millions fake faces will be used for real and fake face images, respectively. The dataset will be split into training, validation, and testing sets.

We will use Pytorch deep learning library to build and fine-tune our model. Hence, let’s first import some dependencies:

 

 

from __future__ import print_function, division

import torch

import torch.nn as nn

import torch.optim as optim

from torch.optim import lr_scheduler

import numpy as np

import torchvision

from torchvision import datasets, models, transforms

import matplotlib.pyplot as plt

import time

import os

import copy


Next, we will load our train, validation, and test data using Dataloaders, and we will use Transforms to resize, normalize and augment our images. Note that we will resize the images to 128x128x3 pixels and select a small batch size to lower the computational cost of our system.

 

# Data augmentation and normalization for training

data_transforms = {

    'train': transforms.Compose([

            transforms.Resize(128),

        transforms.RandomHorizontalFlip(),

        transforms.ToTensor(),

        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

    ]),

    'val': transforms.Compose([

        transforms.Resize(128),

        transforms.ToTensor(),

        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

    ]),

   

      'test': transforms.Compose([

        transforms.Resize(128),

        transforms.ToTensor(),

        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])

    ]),

   

}

data_dir = r'C:\Users\abdul\Desktop\ContentLab\awwal\data'

image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),

                                          data_transforms[x])

                  for x in ['train', 'val','test']}

dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=4,

                                             shuffle=True, num_workers=0)

              for x in ['train', 'val', 'test']}

dataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'val','test']}

class_names = image_datasets['train'].classes

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

plt.ion()   # interactive mode

 

 

All right then, our model is set now. Let’s show a sample of our dataset using an imshow function:

 

def imshow(inp, title=None):

    """Imshow for Tensor."""

    inp = inp.numpy().transpose((1, 2, 0))

    mean = np.array([0.485, 0.456, 0.406])

    std = np.array([0.229, 0.224, 0.225])

    inp = std * inp + mean

    inp = np.clip(inp, 0, 1)

    plt.imshow(inp)

    if title is not None:

        plt.title(title)

    plt.pause(0.001)  # pause a bit so that plots are updated

# Get a batch of training data

inputs, classes = next(iter(dataloaders['train']))

# Make a grid from batch

out = torchvision.utils.make_grid(inputs)

imshow(out, title=[class_names[x] for x in classes])

  

Running the code above will give you a sample of 4 real and fake images as follows:

Note that each time you run this function, new images will appear as we are randomly shuffling the images.

 


Loading ResNet18 Model and Weights

The next step after loading the images is to load the pre-trained model as we are using transfer learning. We will load the ResNet18 and its corresponding weights and then customize its last fully connected layer to fit our classification task: fake faces detection. Hence, we will change the last fully connected layer of this network to have two outputs: real and fake faces. Moreover, in this part we will also set the loss function as Crossentropy, and configure the optimizer by setting its type as SGD (stochastic gradient descent) and the learning rate to 0.001.

 

model_ft = models.resnet18(pretrained=True)

num_ftrs = model_ft.fc.in_features

# Here the size of each output sample is set to 2.

# Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).

model_ft.fc = nn.Linear(num_ftrs, 2)

model_ft = model_ft.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized

optimizer_ft = optim.SGD(model_ft.parameters(), lr=0.001, momentum=0.9)

# Decay LR by a factor of 0.1 every 1 epochs

exp_lr_scheduler = lr_scheduler.StepLR(optimizer_ft, step_size=1, gamma=0.1)

 

Training ResNet18 To Recognize Fake Faces

Since ResNet18 is loaded and parameters are set, we can start fine-tuning our network to classify real and fake faces. For such purpose, we will create a function that allows early-stopping of the network by monitoring the validation accuracy. The network will stop training when training accuracy starts to decrease, and the model of the best accuracy will be saved.

 

# model 

def train_model(model, criterion, optimizer, scheduler, num_epochs=2):

    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())

    best_acc = 0.0

    for epoch in range(num_epochs):

        print('Epoch {}/{}'.format(epoch, num_epochs - 1))

        print('-' * 10)

 

        # Each epoch has a training and validation phase

        for phase in ['train', 'val']:

            if phase == 'train':

                model.train()  # Set model to training mode

            else:

                model.eval()   # Set model to evaluate mode

            running_loss = 0.0

            running_corrects = 0

            # Iterate over data.

            for inputs, labels in dataloaders[phase]:

                inputs = inputs.to(device)

                labels = labels.to(device)

                # zero the parameter gradients

                optimizer.zero_grad()

                # forward

                # track history if only in train

                with torch.set_grad_enabled(phase == 'train'):

                    outputs = model(inputs)

                    _, preds = torch.max(outputs, 1)

                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase

                    if phase == 'train':

                        loss.backward()

                        optimizer.step()

                # statistics

                running_loss += loss.item() * inputs.size(0)

                running_corrects += torch.sum(preds == labels.data)

            if phase == 'train':

                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]

            epoch_acc = running_corrects.double() / dataset_sizes[phase]


            print('{} Loss: {:.4f} Acc: {:.4f}'.format(

                phase, epoch_loss, epoch_acc))

            # deep copy the model

            if phase == 'val' and epoch_acc > best_acc:

                best_acc = epoch_acc

                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since

    print('Training complete in {:.0f}m {:.0f}s'.format(

        time_elapsed // 60, time_elapsed % 60))

    print('Best val Acc: {:4f}'.format(best_acc))

    # load best model weights

    model.load_state_dict(best_model_wts)

    return model

 

Now, we can train the model by running the above train function. In here, we can select the number of epochs that we need and call the train function to start fine-tuning. At first, we will set a number of epochs to 3 and visualize the network’s performance.

 

from PIL import ImageFile

ImageFile.LOAD_TRUNCATED_IMAGES = True

model_ft = train_model(model_ft, criterion, optimizer_ft, exp_lr_scheduler,

                       num_epochs=3)

 

 

Model Evaluation

The evaluation of a model is done by testing it on images that were not part of the training and the validation sets. Hence, to test our model we will use the test set of real and fake images and evaluate its performance. We will create a function that takes a specific number of images, pass them to the trained model and show their predicted labels. Note that this function can also be used to visualize the predicted labels of training or validation images. In our case, we will use this function to read images from the testing set and check the results.

 

def visualize_model(model, num_images=6):

    was_training = model.training

    model.eval()

    images_so_far = 0

    fig = plt.figure()

    with torch.no_grad():

        for i, (inputs, labels) in enumerate(dataloaders['test']):

            inputs = inputs.to(device)

            labels = labels.to(device)

            outputs = model(inputs)

            _, preds = torch.max(outputs, 1)

 

            for j in range(inputs.size()[0]):

                images_so_far += 1

                ax = plt.subplot(num_images//2, 2, images_so_far)

                ax.axis('off')

                ax.set_title('predicted: {}'.format(class_names[preds[j]]))

                imshow(inputs.cpu().data[j])

 

                if images_so_far == num_images:

                    model.train(mode=was_training)

                    return

        model.train(mode=was_training)

 

 



For a more accurate evaluation, we will read only fake faces and pass them through the model using this function and check their performance. To do so, we just have to change your test file into fake faces only. Running the function on fake face images only will let you discover the power of your model when tested on fake faces.

 

Wrap up

As we’ve discussed throughout this article, the power of algorithms like StyleGAN in creating fake human-like faces raises a lot of questions. On one hand, these faces can help in various creative applications and technologies such as design and illustration applications. On the other hand, there are also some drawbacks of such artificial intelligence technology. Fake human faces can be used for harmful and hacking purposes such as fraudulence and abuse as we are seeing nowadays when people are using GANs to paste people’s faces on target videos and so on... Hence, the motivation for this work lies in preventing such abusive and disturbing actions. We used a simple approach for detecting fake faces, however, this work can be improved by building a deeper network trained on millions of real and fake faces in order to have the power of recognizing fake faces instantly. Such networks could be useful for preventing political propaganda and influencing campaigns that are getting benefits from fake faces generated using GANs.


P.S. You can find the dataset and source code of this work on my Github 

..............................Stay Tuned for more content ✌✌✌...............................................

 

Comments

Latest Posts

How to Run Stable Diffusion on Your PC to Generate AI Images

  First of all, let's define Stable Diffusion. Stable Diffusion is an open-source machine learning model that is capable of creating images from text, altering images based on text, or adding information to low-resolution or low-detail images. Also, it can produce outcomes that are comparable to those from DALL-E 2 and MidJourney  as it was trained on billions of images. Such a model was created by Stability AI and made available to the public for the first time on August 22, 2022. Unlike several AI text-to-image generators, Stable Diffusion doesn't have a clean user interface (yet), but it has a very permissive license, and luckily it is open-source so we can use it on our PC and maybe fine-tune it to do other customized image generation tasks.  What Do You Need to Run Stable Diffusion on Your Computer? To be able to run a stable diffusion model on your computer, the latter should at least be a Gaming Laptop with the following requirements:  A GPU with at least 6 gigabytes (

How to Create AI images with Stable Diffusion Model (Extended Article)

  In a previous article,  we showed how to prepare your computer to be able to run Stabe Diffusion, by installing some dependencies, creating a virtual environment, and downloading Stabe Diffusion from Github. In this article, we will show you how to run Stable Diffusion and create images. First of all, You must activate the ldm environment we built previously each time you wish to use stable diffusion because it is crucial. In the Miniconda3 window, type conda activate ldm and press "Enter." The (ldm) on the left-hand side denotes the presence of an active ldm environment.  Note: This command is only necessary while Miniconda3 is opened. As long as you don't close the window, the ldm environment will be active. Before we can generate any images, we must first change the directory to "C:stable-diffusionstable-diffusion-main.":  cd C:stable-diffusionstable-diffusion-main. How to Create Images with Stable Diffusion We're going to use a program called txt2img.p

U-Net Implementation For the Segmentation of Nuclei

  Introduction Image segmentation is the partitioning of images into various regions, in which every region has a different entity. An efficient tool for image segmentation is a convolutional neural network (CNN) . Recently, there has been a significant impact of CNNs that are designed to perform image segmentation. One the best models presented was the U-Net . A U-Net is U-shaped convolutional neural network that was originally designed to segment biomedical images. Such a network is better than conventional models, in terms of architecture and pixel-based image segmentation formed from convolutional neural network layers. Similar to all CNNs, this network consists of convolution, Max-pooling, and ReLU activation layers. However, in a general view, U-Net can be seen as an encoder-decoder network. The encoder is the first part of this network and it is a conventional convolutional neural network like VGG or ResNet that is composed of convolution, pooling, and downsampling laye

Why is everyone talking about ChatGPT?

    Well, a simple answer to that question is that  ChatGPT is so cool!!! Why cool! It is because it can engage in conversational interaction just like humans, which is what artificial intelligence models should be like, mimicking human behavior as much as possible. With that said, let’s start by first getting to know ChatGPT. What is it? How was it trained? And How can we use it?   What is ChatGPT?   Conversational AI refers to the use of artificial intelligence to enable computers to have natural, human-like conversations with people. This technology has become increasingly popular in recent years, as it allows businesses to automate customer service and enables people to interact with computers in a more natural and intuitive way. Some of the key components of conversational AI include natural language processing (NLP), which allows computers to understand and interpret human language, and machine learning, which enables the computer to improve its performance over time. Ove

Multi-Label Classification with Deep Learning

In machine learning, multilabel classification is a classification task where multiple labels may be assigned to each instance. This is in contrast to traditional classification, where each instance is assigned only one label. Multilabel classification is useful in cases where there are multiple possible labels for a single instance, and each label represents a different aspect or category of the data. For example, an image recognition system might be trained to recognize multiple objects in an image, such as a cat, a dog, and a person, and assign one or more labels to each image accordingly. Because each instance can have multiple labels, the output of a multilabel classification model is often represented as a binary matrix, where each column corresponds to a different label and each row corresponds to a different instance. How to create a convolutional neural network for multilabel classification in TensorFlow! To create a convolutional neural network (CNN) for multilabel classifica

How To Create Videos Using Artificial Intelligence Content Creation Tools

  Artificial intelligence (AI) is used by AI video creators to produce videos using information from a range of sources, such as text, images, and audio files. AI video creators can generate videos with little to no human input, however, some human direction is still required. So how does this work? First, you need content, script/text to be narrated in your video, which is the story you are telling. You can have your own script but guess what! you can also generate content using some text generators such as Copy.ai , Pictory AI , or Rytr AI . These text generators will write you a long text/blog/story with a little help from you. Let's see an example of how we can generate text using Copy.ai for free.  First, go to Copy.ai  and Create New Document. The second step is to go to Tools on the left side of the Figure above, and then select the type of text we want to generate. As seen in the Figure below, say we want to generate a Blog. To do so, write a blog in the Tool button, and th