Summarized conclusions about the process of crawling a complete Blockchain network in order to download and collect all existing DApps
Table of Content
In this article I’m going to show you how an automated data extraction can be dome for massive data analysis processes. The goal is to download all interesting Blockchain information as a processable file like a CSV. I’ll guide you throw the process of making it possible.
Requirements
To be able to crawl the Blockchain data, we must have first a ledger peer syncronized and running. If you dont have one, you can check
this guide about how to setup one.
Once installed, we need to figure out the exact location of virtualenv application. We can find the exac path with which virtualenv command
1
2
which virtualenv
/usr/local/bin/virtualenv
Create the virtualenv
1
virtualenv -p /usr/bin/python3.6 venv
Activate the virtualenv
To activate the new virtual environment, run the following:
1
source venv/bin/activate
The name of the current virtual environment appears to the left of the intent. For example: (venv)
Version check
To verify the correct version of Python, run the following:
1
python -V
Any package you install using pip is now located in the virtual environment project folder, isolated from the global Python installation.
Deactivate
When you finish your work in your virtual environment, you can deactivate it by running the following:
1
deactivate
Delete your virtual environment
To delete your virtual environment, simply delete the project folder. Using the above example, run the following command:
1
rm -rf venv
Installing Ethereum ETL
To install the required tool used to download the data from the Blockchain in CSV format, we need to request pip to install it with pip install command.
1
pip3 install ethereum-etl
A successful installation should install following packages:
Checking Ethereum ETL tool is successfully installed
To check ethereumetl tool is successfully installed, we just print tool version to stdout.
1
ethereumetl --version
which report us the tool version number as expected.
1
ethereumetl, version 1.10.1
Data extraction process
Now that the crawling tool is working, we can start our data extraction process. In order to extract the contract (dapps) information there are some steps that needs to be done:
Fetch all existing Blocks.
For each block, extract transaction information.
For each transaction, check whether contains contract installation instruction or not.
Store to CSV all found contract data for further analysis.
1. Crawling Ethereum blocks and transaction information
I configured my node to connect via IPC file located at file://$HOME/.ethereum/rinkeby/geth.ipc. If your *.ipc files is located in other path, update it accordingly.
Next step, it to extract transaction from readed blocks.
Note: if you encounter issues when requesting transaction information, make sure your node is running with --txlookuplimit=0 flag. This will index all transactions’ hash.
Remember you can tune --batch-size, --max-workers for performance.
4 Store All contract data as CSV
At this point, you should have all existing contracts downloaded and stored into contracts.csv.
Conclusion
We learn a new way we can use to fetch data from Ethereum ledger (Rinkeby network) being applicable to other Testnet and networks. Consider this way as another option when looking for data for your projects just instead of using some sort of third party provider APIs like Etherscan. You can get faster and cheaper results, if you know how to handle them.
Drawbacks
After running all this entire process these are the disadvantages seen:
The process of synchronizing a node requires time and many SSD space.
Installing blockchain-etl and use it is very easy.
blockchain-etl is a very slow tool that also requires many time for data extraction.
blockchain-etl seems to generate reasonable well results. However, you must take into account that duplicate contracts may exists if they are found on different transactions.
You must take into account that duplicate contracts may exists if they are found on different transactions.
Advantages
Installing blockchain-etl and use it is very easy.
blockchain-etl seems to generate reasonable well results.
Thanks for checking this out and I hope you found the info useful! If you have any questions, don't hesitate to write me a comment below. And remember that if you like to see more content on, just let me know it and share this post with your colleges, co-workers, FFF, etc.