Story Genre Classification with Hugging Face Transformers and Fastai

Topics Covered: Data Scraping Data Preprocessing Data Cleaning Deep Learning NLP (Natural Language Processing) Machine Learning Transfer Learning Language Model

How much do you know about fan made stories? Did you know about the immense world of fan community where non-professional and semi-professional writers alike showcase their writing ability. Well, I found out about it and thought, why not utilize these data to make a NLP classification model that can learn pattern from fan made stories.

This project showcases the implementation of multi-label text classification using Hugging Face Transformers and Fastai. It involves training a deep learning model to predict multiple genre labels for text descriptions. Key features include web scraping with Selenium Webdriver, data extraction from Royal Road (a fan-made stories website), utilization of pre-trained transformer models, Fastai's text processing capabilities, performance analysis metrics like F1 score and ROC AUC, and detailed code explanations.

Project Presentation Video

Data Collection: Exploring the World of Fan-Made Stories

The Cool World of Fan-Made Stories

In the expansive and captivating world of fan-made stories, a realm where the creativity of non-professional and semi-professional writers intertwines, a vast tapestry of narratives unfolds. The fan community serves as a dynamic platform, allowing writers to showcase their talents and readers to immerse themselves in unique and imaginative tales.

Choice of Data Source: Royal Road

For this project, our journey into the cool world of fan-made stories led us to the doorstep of Royal Road, a prominent website known for hosting a diverse collection of fan-made stories. Here, writers from various backgrounds share their creations, weaving narratives that span across genres and captivate the hearts of readers.

Data Collection Process

Web Scraping with Selenium Webdriver:
Leveraging the power of Selenium Webdriver, we embarked on a web scraping journey to extract valuable information from Royal Road. Our focus was on gathering story summaries and their corresponding genres, crucial elements for training a multi-label text classification model.
Cover Images Data:
In addition to the story summaries and genres, we extracted a special kind of data unrelated to the trained model: cover images. During the scraping process, we also opened the cover image for each story. Using real-time K-means clustering techniques, we isolated the 5 dominant color codes for each cover image, adding a unique visual dimension to our dataset.
Ethical Considerations:
Throughout the data collection process, we maintained a commitment to ethical practices, respecting the terms of use of the Royal Road website and ensuring responsible scraping.

View Data Collection Process

Outcome

Our endeavor to explore the cool world of fan-made stories on Royal Road resulted in the creation of a robust dataset. Each story summary, intertwined with its associated genres, and complemented by unique cover image color data, became a comprehensive foundation for our project on multi-label text classification. This dataset, cultivated from the diverse narratives of Royal Road, forms the basis for training a deep learning model capable of predicting multiple genre labels for a given text description.

Model Training

Leveraging Pre-trained Transformer Models

In this project, we harnessed the power of pre-trained transformer models from Hugging Face Transformers to elevate our text classification capabilities. By tapping into these models, we gained access to their extensive knowledge of language and context, allowing us to achieve impressive results in genre classification.

Our Choice: bert-base-uncased Architecture

For this project, we selected the bert-base-uncased architecture, a well-established variant of BERT that's specifically designed for uncased text. By making this strategic choice, we enabled our model to comprehend the subtleties of language usage, leading to enhanced accuracy and performance in genre classification.

Model Training in model_training.ipynb

Our journey is documented in the model_training.ipynb notebook, where we documented each step of the training process. From preprocessing our data to configuring the model, defining the loss function, selecting optimization strategies, and monitoring training progress, the notebook serves as a comprehensive guide to our approach.

View Model Training Process

Model Evaluation and Deployment

Model Evaluation

Here is how our model performed after training for 7 epochs:

Metric	Score
Accuracy Multi	0.87
F1 (Micro)	0.65
F1 (Macro)	0.52
ROC AUC (Micro)	0.77
ROC AUX (Macro)	0.71

Refer to the model_evaluation.ipynb notebook for detailed instructions.

Convert Model to ONNX

Convert your trained model to the ONNX format for deployment using the convert_to_onnx.ipynb notebook.

Model Deployment

Hugging Face: The model is deployed and accessible on Hugging Face's Model Hub.

HuggingFace Space

Render: Additionally, the model is deployed and hosted on Render, allowing you to interact with it via a web interface.

Web App

Future Work

Cover Data Collection for Future Insights

Our data collection process went beyond capturing story summaries and genres. In addition to these components, we meticulously gathered cover images and extracted dominant color codes using real-time K-means clustering. This rich source of visual information presents a unique opportunity for future exploration.

Next NLP Model: Predicting Color Codes

Building upon our comprehensive dataset, our future endeavors will delve into the creation of yet another NLP model. This model will be trained to predict appropriate color codes as output, given the text description as input. By leveraging the insights gained from the cover data collection, we aim to develop a tool that not only classifies genres but also enhances the visual representation of fan-made stories.

Ongoing Innovation and Exploration

As technology evolves and new possibilities emerge, our commitment to innovation remains steadfast. We envision continuous exploration and refinement of models that not only decode the intricate world of text but also contribute to the fusion of language and visuals in storytelling. The journey doesn't end here; it unfolds into a realm of endless possibilities and creative synthesis.

Conclusion

Impact and Legacy

Our foray into the world of fan-made stories, powered by advanced natural language processing and machine learning, has culminated in a project with profound implications. The model's performance, boasting an accuracy of 87%, showcases its efficacy in predicting multiple genre labels for text descriptions. This not only streamlines genre classification but also augments the exploration of fan creativity.

Deployment Avenues and Accessibility

The deployment of our model on Hugging Face's Model Hub and Render opens avenues for accessibility and interaction. Whether through seamless inference on Hugging Face or an engaging web interface on Render, users can now leverage our model to navigate the diverse landscape of fan-made stories.

Crossroads of Technology and Creativity

This project stands at the crossroads of technology and creativity, with the fusion of text classification and visual exploration. As we celebrate the success of our endeavors, the impact ripples beyond accurate predictions; it extends into the intricate nuances of fan storytelling and the vibrant hues of cover images.

Gratitude and Future Endeavors

We express gratitude to the fan community and the contributors on Royal Road who make this project possible. As we reflect on this accomplishment, our eyes are set on the horizon of future endeavors. The journey continues, marked by ongoing innovation, exploration, and the unwavering pursuit of pushing the boundaries of what technology can achieve in the realms of storytelling.