Python: How to scrape Data from the internet into your project without APIs (e.g. Crypto price)
Web scraping is the process of extracting, copying, storing, and reusing third-party content on the web.
In this tutorial, we will use web scraping in python to get data from the CoinMarketCap website to use later for our projects.
Everything we use here can be reused on any website by just modifying a few lines of code.
We will write the code step by step. You will find the whole code at the end of the tutorial.
What do you need for this project?
- python installed on our PC
- Installing some libraries (lxml, requests, beautifulsoup4)
1. Installing the libraries we need
You need to have python and pip installed on your computer to be able to use the libraries.
To install the libraries you have to open the terminal on your computer (CMD on windows).
Type the following command:
pip install lxml requests beautifulsoup4
Try the command with pip3 if the previous command didn’t work:
pip3 install lxml requests beautifulsoup4
2. Import the libraries we installed
You need to import the libraries we downloaded.
Make a new file (e.g. price.py) and import the libraries by typing:
import requests
from bs4 import BeautifulSoup
3. Get the HTML code of the website we want to get data from
Now we want to get the price of some cryptocurrencies from the CoinMarketCap website. For this, we need to copy the link to the page of the currency we need. For example, if I want to get the price of Ethereum I should find the link to Ethereum price on CoinMarketCap and make HTML request to that link and get the whole HTML code.
It should look something like this:
eth_url = requests.get("https://coinmarketcap.com/de/currencies/ethereum/")
eth_src = eth_url.content
If you want to get the price of multiple currencies, you have to repeat these steps as many times as the number of currencies you want to get. In my example, I want to get the price of BNB, Anime Token, Monero, and Ethereum. This means that I should repeat the step 4 times and store the HTML code containing the price of each currency in 4 variables (bnb_src, ani_src, xmr_src, and eth_src).
bnb_url = requests.get("https://coinmarketcap.com/de/currencies/binance-coin/")
bnb_src = bnb_url.content
ani_url = requests.get("https://coinmarketcap.com/de/currencies/anime-token/")
ani_src = ani_url.content
xmr_url = requests.get("https://coinmarketcap.com/de/currencies/monero/")
xmr_src = xmr_url.content
eth_url = requests.get("https://coinmarketcap.com/de/currencies/ethereum/")
eth_src = eth_url.content
4. Beautifulsoup settings
Now that we have the content of the page, we need to get the data we need. For this, we need the beautifulsoup library. We create now a new variable (eth_soup) and call the constructor of beautifulsoup. The construct takes two parameters (the source code and the parser). We already installed the lxml parser in the first step and it helps us to do processing on our site so we will use it.
eth_soup = BeautifulSoup(eth_src, "lxml")
Since we need the price of 4 cryptocurrencies in our project, we have to repeat the step 4 time:
bnb_soup = BeautifulSoup(bnb_src, "lxml")ani_soup = BeautifulSoup(ani_src, "lxml")xmr_soup = BeautifulSoup(xmr_src, "lxml")eth_soup = BeautifulSoup(eth_src, "lxml")
5. Find out where the data you need is located in the HTML code
Select the data you need to have and open it with the inspect using google chrome as shown in the picture.
Now find out in which div the data is contained in the HTML code and copy the name of the div. In our example, it is called “priceValue”.
6. Find the data with beautifulsoup
Now we know in which div our data is located. We want now to extract that information with python using beautifulsoup. The following code will remove all irrelevant HTML codes and leave just the one that includes the div:
eth_ = eth_soup.find("div", {"class":"priceValue"})
Output of eth_:
<div class="priceValue"><span>€3,026.64</span></div>
We want to remove the HTML tags so in conclusion, we came up with:
eth_list = []
eth_ = eth_soup.find("div", {"class":"priceValue"})
eth_list.append(eth_.text) #get just the text
Output of eth_list:
['€3,026.64']
Do this step 4 times for each currency we worked with in the steps before.
7. Make the necessary edits to the data
Since we received [‘€3,026.64’] as an output, we want now to make a few changes. We need the price as a double number. We should therefore remove a few characters and do typecasting. We will remove the “€” and “,” characters and convert the string number to a float.
x = eth_list[0][0] #this is the "€" symbol (if USD it should be the "$" symbol)
eth_list[0] = eth_list[0].replace(x, '').replace(",", '')#reomoving the "€" and the ","
ETH = float(eth_list[0]) #convert the string number to float
This step should be also repeated 4 times for each currency we worked with. You will find the whole code in the next step.
8. The whole Code
So in this code, you will be able to know how much money you have if you put the amount of the cryptocurrency you have in the variables my_BNB, my_ANI, my_XMR, and my_ETH. The output will look something like this:
##you need to install the following libraries
##pip install lxml this is the parser we need to use
##pip install requests
##pip install beautifulsoup4
import requests
from bs4 import BeautifulSoup
##We need those lists to append the value of each currency in it
bnb_list = []
ani_list = []
xmr_list = []
eth_list = []
##get the html code of the page you want to get the informations from
bnb_url = requests.get("https://coinmarketcap.com/de/currencies/binance-coin/")
bnb_src = bnb_url.content
ani_url = requests.get("https://coinmarketcap.com/de/currencies/anime-token/")
ani_src = ani_url.content
xmr_url = requests.get("https://coinmarketcap.com/de/currencies/monero/")
xmr_src = xmr_url.content
eth_url = requests.get("https://coinmarketcap.com/de/currencies/ethereum/")
eth_src = eth_url.content
##Using the lxml parser
bnb_soup = BeautifulSoup(bnb_src, "lxml")
ani_soup = BeautifulSoup(ani_src, "lxml")
xmr_soup = BeautifulSoup(xmr_src, "lxml")
eth_soup = BeautifulSoup(eth_src, "lxml")
##Search on each CoinMarketCap html for the value we need
bnb_ = bnb_soup.find("div", {"class":"priceValue"})
bnb_list.append(bnb_.text) #get just the text and append it in bnb_list
x = bnb_list[0][0] # this is the "€" symbol we want to remove from the string
bnb_list[0] = bnb_list[0].replace(x, '').replace(",", '') #reomoving the "€" and the ","
BNB = float(bnb_list[0]) #typecast the string to float
##same steps as before
ani_ = ani_soup.find("div", {"class":"priceValue"})
ani_list.append(ani_.text)
x = ani_list[0][0] #this is the "€" symbol (if USD it should be the "$" symbol)
ani_list[0] = ani_list[0].replace(x, '').replace(",", '') #reomoving the "€" and the ","
ANI = float(ani_list[0])
##same steps as before
xmr_ = xmr_soup.find("div", {"class":"priceValue"})
xmr_list.append(xmr_.text)
x = xmr_list[0][0] #this is the "€" symbol (if USD it should be the "$" symbol)
xmr_list[0] = xmr_list[0].replace(x, '').replace(",", '') #reomoving the "€" and the ","
XMR = float(xmr_list[0])
##same steps as before
eth_ = eth_soup.find("div", {"class":"priceValue"})
eth_list.append(eth_.text) #get just the text
x = eth_list[0][0] #this is the "€" symbol (if USD it should be the "$" symbol)
eth_list[0] = eth_list[0].replace(x, '').replace(",", '') #reomoving the "€" and the ","
ETH = float(eth_list[0])
##how much you own of each currency (you have to put values that suit you)
my_BNB = 0.5
my_ANI = 1000000
my_XMR = 5.234
my_ETH = 12
##value in euro
BNB_EUR = my_BNB * BNB
ANI_EUR = my_ANI * ANI
XMR_EUR = my_XMR * XMR
ETH_EUR = my_ETH * ETH
##how much you have in total
summary = BNB_EUR + ANI_EUR + XMR_EUR + ETH_EURa
print ("\n\n==================================\n\n - Binance: %f BNB\n ==== %f € ====\n\n - AnimeToken: %f ANI\n ==== %f € ====\n\n - Monero: %f XMR \n ==== %f € ====\n\n - Etherium: %f ETH \n ==== %f € ====\n\n\n======== %f € ========\n\n==================================" %(BNB, BNB_EUR, ANI, ANI_EUR, XMR, XMR_EUR, ETH, ETH_EUR, summary))