Web Scraping using Python

Web scraping simply means retrieving images/information/documents from a website. Is it possible to take the data from some other websites by coding? Of course, you can do it.

Here is how you can web scrape the website data and analyze it using a Bar Graph. To do this you need to know Python.

Best Anime Characters

There is this awesome website CBR, this website shares amazing stuff related to Comics, Anime, Manga, and many more. One of the CBR posts where it has posted related to Best Anime Characters based on votes. This is a perfect project for web scraping. Information Credits to CBR where we will just use the names and votes of the best anime characters.

This list includes god-tier anime characters that include White Demon Sakata Gintoki from Gintama, Humanity’s Strongest Soldier Levi Ackerman from Shingeki no Kyojin, Future Pirate King Monkey D Luffy from One Piece, Loved Hokage Naruto Uzumaki from Naruto, Emperor Lelouch Vi Britannia from Code Geass and many more. Check out this article for detailed information about the Most Popular Anime Characters.

When it comes to the most popular anime characters it doesn’t just include the main protagonist but also the female lead and side characters as well. This list includes just 15 characters and it’s obvious that there are many characters who deserved to be on this list.

BeautifulSoup Python

You can now retrieve this most popular anime character information using BeautifulSoup. bs4/ BeautifulSoup is a Python library or packages which makes web scraping easy. You need to know the basics of HTML to inspect the website. Now, what does inspecting the website all about? Inspect is something that lets you surf the contents contained in the website. It allows you to check the HTML tags used to build the website. You can even check the styles used on the website.

To understand web scraping you need to know how to inspect the website. Open the website which you want to scrape, right-click on the website and select inspect, a window on the right side appears. You need to have basic HTML knowledge to inspect websites. Now take these tags and put them to use. Use these tags in your Python code.

Once you get familiar with inspecting the next thing is to search the data you need for the scrap. For example, if you need to inspect a name from the website, right-click on the name and then inspect, check for the HTML tags. Now take these tags and put them to use. You can get the website data by using the requests library in Python. Once you request the URL of the site using the get() function, you can then parse the URL content in BeautifulSoup. Using the find_all function from BeautifulSoup stores the popular anime character’s name and their votes in a separate list.

Matplotlib Python

Now that you have names and votes list of popular anime characters you can plot a bar graph for sophisticated output. You can create a graph with help of the matplotlib Python library.

How to create graph using Matplotlib

Import matplotlib.pyplot library and create any graph suitable for the project. With help of matplotlib you can create Bar, Pie, Scatter, Histogram, Line graphs easily. Before plotting you can store the data that is to be plotted using NumPy or CSV data from pandas. Matplotlib even provides various attributes to make the graph look attractive. You change the width, height, color of the graph. By using labels you can name the x-axis and y-axis. Once your graph is ready you can save the graph locally. Here is an example to generate a graph and save it.

Here is the Bar Graph for the most popular Anime Characters, the final result of the python code.

Conclusion

While web scraping a website it’s better to give credits to the original owners of the content. Some websites don’t allow you to scrape data. You are free to inspect any website but misusing the data is not recommended. I recommend you install Jupyter Notebook to work on a Data Science project that includes the plotting of graphs.

Download Jupyter Notebook here.

Get the above project Jupyter Notebook of the Source code on GitHub.

Did you get into some errors, reach out to me at GitHub.

Project info Credits: CBR.

Do check out CBR for amazing stuff related to Comics, Anime, Manga, and more