Table of Content
What is the decompilation process?
The decompilation process involves going back to the original source code from compiled source code so that security engineers might have a better understanding of the programs instead of working directly with machine code; in this context of EVM, the goal is to convert EVM bytecode into solidity like code.
The challenge
Compilation back to the original source code is impossible because all variable names, type names and even function names are removed. It might be technically possible to arrive at some source code that is similar to the original source code but that is very complicated, especially when the optimizer was used during compilation. I don’t know of any tools that do more than converting bytecode to opcodes.
Decompilers in Ethereum
Ethereum is gaining a significant popularity in the blockchain community, mainly due to fact that it is design in a way that enables developers to write decentralized applications (Dapps) and smart-contract using blockchain technology.
Ethereum blockchain is a consensus-based globally executed virtual machine, also referred as Ethereum Virtual Machine (EVM) by implemented its own micro-kernel supporting a handful number of instructions, its own stack, memory and storage. This enables the radical new concept of distributed applications.
Contracts live on the blockchain in an Ethereum-specific binary format (EVM bytecode). However, contracts are typically written in some high-level language such as Solidity and then compiled into byte code to be uploaded on the blockchain. Solidity is a contract-oriented, high-level language whose syntax is similar to that of JavaScript.
This new paradigm of applications opens the door to many possibilities and opportunities. Blockchain is often referred as secure by design, but now that blockchains can embed applications this raise multiple questions regarding architecture, design, attack vectors and patch deployments.
As we, reverse engineers, know having access to source code is often a luxury. Hence, EVM bytecode decompilers into readable code is needed.
Ethereum Virtual Machine (EVM) decompilers today
Currently, we have following alternatives when dealing with bytecode decompilation:
- Porosity [UNMAINTAINED]: Claimed to be a Decompiler and Security Analysis tool for Blockchain-based Ethereum Smart-Contracts
- Another attempt to write a decompiler for Ethereum Virtual Machine: https://github.com/krzys-h/Ethereum-EVM-decompiler
- EVM Decompiler: It’s is a fork of the Panoramix original repo that’s not maintained actively by its author anymore. Available at https://github.com/palkeo/panoramix
- Etherscan Online Decompiler: Based on Palkeoramix. Available at: https://etherscan.io/bytecode-decompiler
- EVM: https://github.com/MrLuit/evm
- Soldec: https://github.com/mkurzmann/soldec
- EVM decompiler: built by jubnzv. Available at https://github.com/jubnzv/evm-decompiler
- Gigahorse: A binary lifter and analysis framework for Ethereum smart contracts . Available at https://github.com/nevillegrech/gigahorse-toolchain.
- Ethervm.io: Free online decompiler available at https://ethervm.io/decompile.
Design Approach
As you seen, there are many decompilers out there but none of them create as good quality code as possible. This is mostly, because reversing back to Solidity code is a hard task difficult to accomplish.
To make our own EVM bytecode decompiler, we will follow the next steps:
- Extract
.runtime
section code from bytecode. - From the opcode sequence, extract the EVM instruction information and arguments when available. See the previous post Converting EVM bytecode to OPCODES in microseconds to know more about it.
- Find the entrypoint at
0x0
- Convert the sequence of
.runtime
opcodes to EVM CFG. - From EVM CFG, remove stack related operations and convert the code to register based instructions
- Remove compiler optimizations when possible. For example:
$var & 0xFFFFF
- reconstruct dispatcher section
- resolve public methods name when possible.
- resolve internal functions when possible.
- lookup and map storage and memory variables usage applying SSA methodology.
- iterate over CFG tree transpiling code to higher Solidity and Yul representation.
- Detect common code similarities with known Solidity code and known libraries.
- Generate the final output.
In the next post, I will show you how to generate a EVM CFG diagram to get a better idea of the bytecode design and instruction execution flow. Bye :)
Next reading
If you like this content, continue reading and find out how to process EVM bytecode in next steps at Part 2: Building Ethereum EVM decompiler from scratch. Getting Code Blocks
References
- https://ethereum.org/en/developers/tutorials/reverse-engineering-a-contract/
- https://github.com/msuiche/porosity
- https://github.com/crytic/pyevmasm
- https://docs.qiling.io/en/latest/evm/
- https://yanniss.github.io/elipmoc-oopsla22.pdf
- https://blog.trustlook.com/smart-contract-guardian-an-online-evm-decompiler/
Subscribe, donate or become premium
💬 Share this post in social media
Thanks for checking this out and I hope you found the info useful! If you have any questions, don't hesitate to write me a comment below. And remember that if you like to see more content on, just let me know it and share this post with your colleges, co-workers, FFF, etc.