Web Scraping
Automating Web Data Gathering using Python
Introduction
This project aims at automating web data gathering using Python and on Jupyter Notebook hosted on Google Colab. It entails obtaining structured data from a live website i.e. Three important libraries were used:
Requests
for handling HTTP requests, BeautifulSoup
for parsing HTML, and pandas
for storing and manipulating data. Data was collected and organized into a DataFrame
, and the results exported to a .csv
file.
Project Execution
Important libraries requests, BeautifulSoup, and pandas were imported
The link to the live website was then assigned to a variable url and requests to handle http requests from the website
Data was extracted from the provided web page using
BeautifulSoup
libraryHockey scores data table was extracted from the entire dataset
The table headers were extracted from the table as part of data wrangling
The Hockey table headers were then saved in a Pandas DataFrame
Loop through the entire dataset and save the data to a Pandas DataFrame
Check the shape. The Hockey DataFrame consists of 25 rows and 9 columns
Conclusion
The structured data obtained from the website consisted of 25 rows and 9 columns. The columns include [Team Name
, Year
, Wins
, Losses
, OT Losses
, Win %
, Goals For (GF)
, Goals Against (GA)
, + / -
]. It was stored into a Pandas DataFrame and later saved as a .csv file (Hockey.csv
). The link to the code can be obtained below: