Mammal World - AI for Wildlife Conservation in the Indian Subcontinent

Topics Covered: Data Scraping Data Preprocessing Data Cleaning Deep Learning Computer Vision Machine Learning

The Indian subcontinent is blessed with a diverse range of mammal species, each contributing to the region's rich biodiversity. However, wildlife conservation faces significant challenges, including the need for efficient species identification, behavior understanding, and habitat preservation. Our project addresses these challenges by harnessing AI technology to aid in wildlife conservation efforts.

Project Presentation Video

Data Collection

Data Sources

Our endeavor to support wildlife conservation starts with the meticulous gathering of data. We began by deploying automated scripts using Selenium to scrape comprehensive records of mammal species across the Indian subcontinent from indiabiodiversity.org. This critical step helped us to compile a preliminary dataset that serves as the foundation of our project.

To enhance the accuracy and reliability of our information, we meticulously cleaned and curated our dataset. Each piece of data was cross-verified against several esteemed and trustworthy web sources, including prominent databases like animaldiversity.org. This rigorous validation process ensures that our dataset meets high standards of veracity and comprehensiveness.

In the final stride of our data collection journey, we amassed an extensive gallery of imagery, showcasing 103 mammal species that are native to the Indian subcontinent. This rich visual dataset was curated using advanced web scraping techniques, employing the Google Search Engine API for broad queries and the 'search_images_ddg()' function from Fastai for more focused image retrieval. The resulting compilation of images not only represents the splendor of the region's mammalian diversity but also forms the visual backbone of our AI-driven identification model.

View Image Collection Process

Data Cleaning & Augmentation

The journey of transforming raw data into a refined form suitable for analytical prowess is both intricate and critical. Recognizing this, we embarked on a thorough data cleaning and preprocessing phase, laying the groundwork for sophisticated AI modeling.

Organization: Initially, we orchestrated our images into a finely-tuned hierarchical directory. This not only streamlined the data for efficient processing but also facilitated easier access and management.
Scraping Descriptive Data: Leveraging the power of web scraping, we meticulously extracted detailed text descriptions, taxonomy, and habitat information for each species, enriching our dataset with comprehensive metadata.
Augmentation with Nighttime Imagery: To ensure our AI models' efficacy under various conditions, we expanded our dataset with augmented nighttime images, simulating low-light environments where many of these mammals thrive.
Behavioral Annotation: We went a step further by annotating each image with the behaviors and actions exhibited by the species. This empowers the AI not only to recognize the species but also to understand their behavioral patterns.

This meticulous process of data cleaning and augmentation is pivotal, as it directly correlates to the accuracy and reliability of our AI-driven insights, which are instrumental in the conservation of the Indian subcontinent's majestic wildlife.

Behavioral Model Development

View Behavior Model Training in Colab

In the pursuit of creating a more nuanced AI system capable of not just species identification but also behavior prediction, we developed a dedicated behavioral model. This model is a testament to the confluence of domain expertise and advanced machine learning techniques.

We chose the ResNet-34 architecture as our foundation, leveraging its deep learning capabilities through transfer learning. This approach allowed us to build upon pre-trained weights, adapting the network to recognize a spectrum of animal behaviors with minimal additional training time and data.

Once our behavioral model was trained, we applied it to our primary dataset. The model scrutinized each image, predicting behaviors and actions with remarkable accuracy. These predictions were then integrated back into the dataset, enriching it with valuable behavioral insights.

This enriched dataset serves as the bedrock for our multi-target computer vision model. By incorporating both species identification and behavior prediction, our model stands at the forefront of AI-assisted wildlife conservation, offering a dual advantage in the monitoring and understanding of mammal species in their natural habitats.

Main Model Training

View Final Model Training in Colab

Our machine learning model is built upon the powerful ResNet101 architecture, employing transfer learning to effectively recognize and classify 103 distinct mammal species native to India. The model is designed as a multi-target computer vision system, enabling it to accurately identify various species in diverse conditions and scenarios. With extensive training and fine-tuning, our model serves as a robust tool for biodiversity research and conservation efforts.

Final Model Scores

The last epoch of our model training achieved the following scores, showcasing its accuracy and precision in identifying the mammal species and their behavior:

Train Loss	Valid Loss	Accuracy Multi	F1 Score	Precision Score
0.020471	0.024824	0.990407	0.735815	0.843947

Deployment

The deployment of our AI model represents a pivotal step in making our wildlife conservation efforts more accessible and impactful. We have successfully deployed our model on the following platforms:

Hugging Face: Our model is hosted on Hugging Face, allowing researchers and conservationists to directly interact with the model through a simple interface. View on Hugging Face
Streamlit App: For a more customized and interactive experience, we've also developed a Streamlit app that demonstrates the model's capabilities in real-time. Launch Streamlit App

Future Work

Our journey towards enhancing biodiversity conservation through AI is ongoing. We are diligently developing version two of our model, which promises to predict an even broader array of behaviors and identify up to 200 species with improved accuracy.

This ambitious upgrade is designed to offer more granular insights and a wider scope for conservationists, researchers, and enthusiasts around the globe. By expanding our dataset and refining our algorithms, we aim to push the boundaries of what's possible in wildlife identification and behavioral analysis using machine learning.

Collaboration is at the heart of our progress. If you share our passion for wildlife conservation and have ideas or skills to contribute, we warmly invite you to join our efforts. Please visit our GitHub repository and consider contributing via a pull request.

Contribute on GitHub