Emotion Detection in TV Shows: Enhancing Viewer Engagement for Banijay Benelux

In collaboration with my peers, I published a package called emotion_detective to detect emotions in videos and audio files. This project was made in collaboration with Banijay Benelux who provided us with TV Show data for the training and evalution process.

View our PyPI package

Read our Documentation

Natural Language Processing,
Transformer Models,
Azure DevOps

Release Date: June, 2024
University Project

I’m excited to share Emotion Detective, a project I recently completed with my peers.

It is our first published Python package, which analyses emotions from video and audio content.

This was a collaborative effort with Amy Suneeth, Martin Vladimirov, Andrea Tosheva, Kacper Janczyk, and it challenged us to develop and deploy a fully functional machine learning pipeline, combining natural language processing (NLP) and audio processing.

Our journey didn’t stop there; we took things further by deploying the entire pipeline on Azure to create an automated solution for emotion analysis. We collaborated with Banijay, a global media company, to ingest their multimedia data, train models, and analyze the emotional content of their video files.

Here’s an overview of what we achieved and the new skills we gained during this journey!

Project Overview

The Emotion Detective package is designed to extract, process, and analyse emotions in multimedia files (such as videos or audio files). Whether it's analysing emotions in movie dialogues, podcasts, or any other content, our package offers a comprehensive solution. It can perform emotion classification at the sentence level using various NLP models, such as RoBERTa and RNN.

Deploying on Azure: A Fully Automated Pipeline

One of the most exciting achievements was deploying the project on Azure, creating a fully automated pipeline to handle data ingestion, model training, and inference. We worked closely with Banijay, using their video data as input to automatically train emotion detection models.

Here's how the Azure integration works:

Data Ingestion: We set up a pipeline that takes Banijay's video files as input, converts them to audio, and transcribes the dialogue into text.
Automated Model Training: The system automatically trains RoBERTa and RNN models using the transcribed data. Azure’s cloud infrastructure allowed us to scale the training, optimizing time and computing resources.
Inference and Emotion Detection: Once the models are trained, they are used to detect emotions in the videos provided by Banijay. The Emotion Detective package’s inference pipeline is integrated into this Azure-based workflow, which predicts the emotions in each sentence and provides insights into the emotional content of the videos.

By deploying the pipeline in Azure, we were able to automate the entire workflow, from ingesting raw video data to producing emotion analysis reports, making it a valuable tool for Banijay and other content creators.

Our Major Milestone:
Publishing Our First Python Package

One of the biggest achievements for our team was successfully publishing our first Python package. It is available for installation via pip install emotion_detective.

The package includes everything from data ingestion and preprocessing functions to the final emotion detection pipelines for training and inference.

Publishing a package involved not only coding but also documentation, version control, dependency management, and debugging—all skills that I’m glad we had the chance to develop.

We also built Sphinx documentation to ensure that others can easily understand and use the package.

Skills Gained

Throughout the project, I gained several valuable skills:

Natural Language Processing (NLP): Learned how to preprocess text data, balance emotion classes, and implement state-of-the-art NLP models like RoBERTa and RNN.
Cloud Deployment on Azure: Developed skills in deploying machine learning pipelines to Azure, allowing us to automate data ingestion, training, and inference tasks at scale.
Audio Processing: Developed functions for converting video files to audio and transcribing speech into text for emotion analysis.
Model Training & Evaluation: I learned how to implement and fine-tune different machine learning models to improve performance, including creating pipelines for both training and inference.
Collaborative Development: We collaborated effectively using Git, ensuring smooth integration of our individual contributions across multiple modules.

How It Works: The Pipelines

The core functionality of the package lies in its two key pipelines: Training Pipeline and Inference Pipeline.

Training Pipeline

Our training pipeline allows users to train their own NLP models using custom datasets. Here’s a quick breakdown of its functionality:

Data Loading and Preprocessing: Loads CSV/JSON datasets, balances classes, and preprocesses text (including tokenization, lemmatization, and spell checking).
Model Training: Users can choose between the RoBERTa or RNN model for emotion classification, setting parameters like learning rate, number of epochs, batch size, and more.
Model Saving: The trained model is saved for future use in the inference pipeline.

Example Usage:

from emotion_detective.training import training_pipeline

training_pipeline(
train_data_path='path/to/train.csv',
test_data_path='path/to/test.csv',
text_column='text',
emotion_column='emotion',
num_epochs=5,
model_type='roberta',
model_dir='./models/',
model_name='emotion_model'
)

Inference Pipeline

Once a model is trained, our inference pipeline processes audio/video files, transcribes the speech, and analyzes the emotion in each sentence. This pipeline can handle both mp4 video files and mp3 audio files.

Example Usage:

from emotion_detective.inference import main

results = main(
input_media_path='path/to/video.mp4',
model_path='path/to/model.pth',
model_type='roberta',
emotion_mapping_path='path/to/emotion_mapping.csv'
)
print(results)

What’s Inside the Package?

The package is organized into several modules:

Data Handling: Modules for data ingestion, preprocessing (e.g., data_ingestion.py, data_preprocessing.py).
Modeling: Functions for defining, training, and saving machine learning models (e.g., model_definitions.py, model_training.py).
Logging: Robust logging for tracking training and inference processes (e.g., logger.py).
Inference: Functions to handle video/audio processing and prediction (e.g., main.py, model_predict.py).

Each file has clearly defined functions to make the package easy to extend or modify for different use cases.

Key Challenges and Lessons Learned

One of the biggest challenges was implementing the emotion classification model in a way that could handle real-world noisy data, especially during audio transcription. Additionally, deploying the entire system on Azure required careful orchestration of the pipelines to ensure smooth operation. Learning how to manage cloud resources, configure virtual machines, and utilize Azure’s machine learning services was crucial to our success.

[From left: Amy Suneeth, Rebecca Borski, Kacper Janczyk, Andrea Tosheva, Martin Vladimirov]

I’m really proud of the work we accomplished as a team and the collaboration with Banijay, which provided real-world data to test and validate our system. With the Azure deployment, Emotion Detective has the potential to be a valuable tool for researchers, content creators, and media companies alike.

Feel free to check out our documentation or install the package to try it out yourself!