movie recommendation system

Movie recommendation system Machine Learning

Time to play around with some Movie datasets and build an exciting Web App. This brief article lets us look into the Anime or Movie Recommendation System from Scratch in Python.

If you are new to Python programming, consider checking out the fun-based Python tutorial series with vast examples and mini-codes.

Build an Anime/Movie Recommendation System From Scratch In Python

With the rapid development of the data, every company is trying to build a Recommendation System to reach a larger reach. We see and use the Recommendation system in our daily life. Consider Youtube for an example, it has a massive Movie Recommendation system algorithm built to gain more stay time from the audience. We always watch one Youtube video and switch between multiple similar contents while we are at it. YouTube recommends the video based on our likes and interests. The next example is Netflix, before creating your account, Netflix recommends the watchlist, and later once you become its consumer the recommendation watchlist is updated based on your interest. For instance, if you love Anime, it will recommend you Anime related content.

And other examples include Google Search, Amazon, Flipkart, Spotify, and so on. To categorize these examples into recommendation systems, there are three types of Recommendation Systems:

  • Content-based
  • Collaborative-based
  • Hybrid-based

Types of Movie Recommendation System

Content based Movie Recommendation System

Consider a user LUFFY. Luffy loves One Piece and he now is looking to watch similar-based Anime, say adventure Genre. So he starts watching Naruto. This is an example of content-based filtering. In content filtering, a recommendation system recommends a product/movie that is similar to the likes of the user.

E.g., Google New, YouTube

Collaborative based Movie Recommendation System

Now consider there are two users: LUFFY and NARUTO. Luffy loves One Piece and Attack on Titan whereas Naruto loves only Attack on Titan. Now if Naruto is looking for a new recommendation then the system would recommend One Piece to the user. This is an example of Collaborative filtering. In collaborative filtering, the system analyses similar users and recommend products/movie based on each other interest.

E.g., Netflix, Amazon, Flipkart

Hybrid based Movie Recommendation System

Hybrid-based is a combination of both content and collaborative recommendation systems.

E.g., Google Search, YouTube

It’s finally the time. Let’s build an Anime/ Movie Recommendation System in Python from scratch.

Step-by-Step Guide for Anime/Movie Recommendation System in Python

movie recommendation system

We shall create an Anime/Movie Recommendation System Web App with help of the Streamlit library.

Step1: Import necessary modules

Throughout the code, we shall use the Pandas and sklearn module.

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from wordcloud import WordCloud, STOPWORDS

Step2: Read the dataset and add a title to the Streamlit Web App

Download the dataset on Kaggle: AnimeWorld Dataset

st.title("Anime Recommendation System")

st.markdown("[Read Article for Step-by-step guide](https://animevyuh.org/movie-recommendation-system)")
st.markdown("Download Dataset: [Anime World: Kaggle Dataset](https://www.kaggle.com/datasets/tarundalal/anime-dataset)")

data = pd.read_csv("AnimeWorld.csv")
anime_data = data[['Anime','Genre','Description','Studio']]
anime_data.fillna(method="ffill",inplace=True)
st.header("Anime Dataset")
st.dataframe(anime_data)

Also Read: Streamlit Python Tutorial: Build Machine Learning Model

You can learn more about how to create Streamlit Web App in the starter tutorial with Streamlit and Machine Learning.

st.title is similar to the h1 tag in HTML whereas st.header is similar to the h2 tag. st.dataframe is used to display the dataset in rows and columns on the Web App.

Step 3: Cleaning of the Data

In step 2 you have already seen. we will perform Recommendation System on the features such as Studio, Genre, and Description. And there is a lot of disturbance in the Genre column, let’s filter it out.

def filter_genre(data):
    if data[0]=='[':
        return data.strip('[]').replace(' ','').replace("'",'')
    else:
        return data

anime_data['Genre'] = anime_data['Genre'].apply(filter_genre) 

In the above code Genre column in the dataset is of the type list within the String. So we shall filter the entire list in the proper String data type.

Step 4: Generate Word Cloud

A Word Cloud is a bunch of shuffled words. The simple term which can be used here is collage. These words are graphically represented with stunning random colors/specific shapes.

Check: Word Cloud In Python

Whenever the button to display the Word Cloud is clicked in the Web App, the program will generate three different Word Clouds, each for Anime, Genre, and Studio. To understand the Syntax make sure you check the Wordcloud in Python article.

def WC_generate(col,size,words):
    plt.figure(figsize=(15,15))
    wordcloud = WordCloud(stopwords=STOPWORDS,background_color = 'black', width = size,  height = size, max_words = words)
    wordcloud.generate(' '.join(anime_data[col].astype(str)))
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.savefig(f"{col}.jpg")
    st.image(f"{col}.jpg")

if st.button("Show Word Cloud"):
    st.header("Anime Name")
    WC_generate('Anime',1050,150)

    st.header("Anime Studio")
    WC_generate('Studio',1000,100)

    st.header("Anime Genre")
    WC_generate('Genre',500,50)

Step 5: Vectorize the Data and Find Similar vectors using cosine_similarity

Now we shall prompt the user to pick a feature from the selectbox and then recommend a similar Anime based on his/her choice. The available choice inside the selectbox are: Genre, Studio, and Description. Further once the choice is selected we shall Vectorize that specific column with helf of TfidfVectorizer. TfidfVectorizer calculates tf-idf values (term frequency-inverse document frequency) for each string in a set of documents. Stop_words arguments will filter out all the common English words while vectorizing each string.

>>> from sklearn.feature_extraction.text import TfidfVectorizer
>>> sample = ['One Piece','is','best','Anime'] #make sure you pass a string within list
>>> tfidf = TfidfVectorizer(stop_words='english')
>>> tfidf_matrix = tfidf.fit_transform(sample)
>>> for i in tfidf_matrix:
...  print(i)
... 
  (0, 2)	1.0

  (0, 1)	1.0
  (0, 0)	1.0
>>> tfidf_matrix.shape
(4, 3)

The reason why we get (4,3) is that ‘is’ in the sample variable will be filtered out by stop_words. TfidfVectorizer also removes punctuation and white space from the string.

choice = st.selectbox('On what basis would you like to be Recommended?',('Genre','Studio', 'Description'))

tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(anime_data[choice])

cosine_sim = cosine_similarity(tfidf_matrix,tfidf_matrix)
index_sim = pd.Series(anime_data.index, index=anime_data['Anime']).drop_duplicates()

After tokenizing or Vectorizer, we shall pass the tfidf_matrix to cosine similarity to find the common similar features among the given dataset.

Step 6: Sort the Top 10 Similar Anime to Recommend to the User based on his/her choice

In the final step check the similarity score from the cosine similarity series and sort them. The reason why we sort in descending order is to get the top 10 similar Anime to recommend to users based on his/her pick. We shall again use a selectbox filled with available Anime title in the dataset. So that users can pick their favorite Anime and look for similar Anime to watch.

def get_recommendations(title):
    idx = index_sim[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    movie_indices = [i[0] for i in sim_scores]
    return list(anime_data['Anime'].iloc[movie_indices].values)

choice2 = st.selectbox(f'Pick Anime for similar recommendation based on {choice}',sorted(anime_data['Anime']))

if st.button("Recommend Similar Anime"):
    st.header("Here are your 10 Recommendation")
    for index,anime_name in enumerate(get_recommendations(choice2),start=1):
        st.write(f"{index}: {anime_name}")

Putting all together: Anime/Movie Recommendation System Streamlit App

A content-based approach complete code:

import streamlit as st
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.metrics.pairwise import cosine_similarity
from sklearn.feature_extraction.text import TfidfVectorizer
from wordcloud import WordCloud, STOPWORDS

import warnings
warnings.filterwarnings("ignore")

st.title("Anime Recommendation System")

st.markdown("[Read Article for Step-by-step guide](https://animevyuh.org/movie-recommendation-system)")
st.markdown("Download Dataset: [Anime World: Kaggle Dataset](https://www.kaggle.com/datasets/tarundalal/anime-dataset)")

data = pd.read_csv("AnimeWorld.csv")
anime_data = data[['Anime','Genre','Description','Studio']]
anime_data.fillna(method="ffill",inplace=True)
st.header("Anime Dataset")
st.dataframe(anime_data)

def filter_genre(data):
    if data[0]=='[':
        return data.strip('[]').replace(' ','').replace("'",'')
    else:
        return data

anime_data['Genre'] = anime_data['Genre'].apply(filter_genre)

def WC_generate(col,size,words):
    plt.figure(figsize=(15,15))
    wordcloud = WordCloud(stopwords=STOPWORDS,background_color = 'black', width = size,  height = size, max_words = words)
    wordcloud.generate(' '.join(anime_data[col].astype(str)))
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.savefig(f"{col}.jpg")
    st.image(f"{col}.jpg")

if st.button("Show Word Cloud"):
    st.header("Anime Name")
    WC_generate('Anime',1050,150)

    st.header("Anime Studio")
    WC_generate('Studio',1000,100)

    st.header("Anime Genre")
    WC_generate('Genre',500,50)

choice = st.selectbox('On what basis would you like to be Recommended?',('Genre','Studio', 'Description'))

tfidf = TfidfVectorizer(stop_words='english')
tfidf_matrix = tfidf.fit_transform(anime_data[choice])

cosine_sim = cosine_similarity(tfidf_matrix,tfidf_matrix)
index_sim = pd.Series(anime_data.index, index=anime_data['Anime']).drop_duplicates()

def get_recommendations(title):
    idx = index_sim[title]
    sim_scores = list(enumerate(cosine_sim[idx]))
    sim_scores = sorted(sim_scores, key=lambda x: x[1], reverse=True)
    sim_scores = sim_scores[1:11]
    movie_indices = [i[0] for i in sim_scores]
    return list(anime_data['Anime'].iloc[movie_indices].values)

choice2 = st.selectbox(f'Pick Anime for similar recommendation based on {choice}',sorted(anime_data['Anime']))

if st.button("Recommend Similar Anime"):
    st.header("Here are your 10 Recommendation")
    for index,anime_name in enumerate(get_recommendations(choice2),start=1):
        st.write(f"{index}: {anime_name}")


st.markdown("[Support Anime Vyuh](https://www.buymeacoffee.com/trjtarun)")

Conclusion

This is a descent Movie Recommendation System, in the future we shall implement a Recommendation System based on Rating and Timestamp. Also we completed with, Content based filtering, up next we shall implement Collaborative Filtering.

Join our Discord servers and subscribe to the newsletter to stay updated. Links are provided below.

Join our Discord Server and become a part of the Anime Vyuh Community. We share content on 100DaysOfCodeAnime ReviewOne Piece Theory, and Character Analysis articles.

Subscribe to our Newsletter to never miss out on the content: https://animevyuh.org/newsletter. Join our Newsletter now for fantastic Anime recommendationsPython, and Machine Learning Content.

Support Us: https://www.buymeacoffee.com/trjtarunhttps://ko-fi.com/tarunrjain751.
GitHub: https://github.com/lucifertrj.
Twitter: https://twitter.com/TRJ_0751.

Want to learn Machine Learning with Proper Roadmap and resources. Then check this Repository: https://github.com/lucifertrj/100DaysOfML

1 thought on “Movie recommendation system Machine Learning”

  1. Pingback: Build A Anime Recommendation System Web App In Python – WONDERFUL PORTAL

Comments are closed.