Understanding how EVM bytecode can be converted to its OPCODE representation fast in Go
Table of Content
Decompiling EVM Bytecode
The process of decompiling an ethereum application can be very expensive. When you start an analysis process, you can take as a basis the network bytecode or the source code in Solidity. Doing a white box analysis is generally less expensive and requires programming knowledge in Solidity only. A bytecode analysis, however, requires low-level knowledge, understanding the operation of opcodes, EVM instructions, the stack, memory and storage modules, etc.
Regardless, one of the first tasks to do, when trying to understand what a Dapp does, is to process its bytecode.
And by the way, low level stuff is more fun!
Bytecode modelling in Go
First thing we need to do, it to model the bytecode representation in Go language. For this purpose, we take that all input data will be as hexadecimal strings in format 0x... or without the prefix 0x.
Since opcode can vary from 0x00 to 0xFF we will consider a single byte as data type for an opcode, given its defined data range.
1
2
// OpCode is an EVM opcode
typeOpCodebyte
In the same way, we also define an OPCODE category as below (for future uses)
1
2
// OpcodeCategory is a predefined category of the opcode
typeOpcodeCategoryuint8
Bytecode processor in Go
To define what we do when we found a valid OPCODE or EVM instruction, we create custom datatype that will be used as data processor. In this case, it enough with:
1
typeopcodeProcessorfunc(inst*datatype.OpcodeData)
With this, we decouple the parsing logic from output processing logic.
Note that OpcodeData is just a wrapper struct that holds all required information to print the EVM instruction friendly to stdout.
Main EVM bytecode processing loop
The logic of EVM processing is simple. We need to read each of the opcodes and convert it to its human naming representation. Additionaly, if readed opcode is a PUSH instruction, we also need to read the parameters. When the instruction reading is over, we need to move to following one. This process must be repeated until no more opcodes are left.
Warning
Following code is a kind of pseudo code and might not compile if you copy and paste as is.
For completeness, previous description is transpiled to next Go pseudocode:
funcparseBytecode(code[]byte,onInstructionopcodeProcessor)error{// check that no empty bytecode was provided
codesize:=uint64(len(code))// notes for reader:
// * remember to handle edge cases properly
// * add implement errors
i:=uint64(0)fori<codesize{// read opcode bytes
// since its encoded in hex, we need to read 2 bytes (2 char)
op:=code[i:i+2]// get opcode value from extracted bytes
instruction:=hex2opcode(op)vardataSizebyte=0ifinstruction.IsPush(){// size of the input data to be pushed is calculated based on current opcode
// data size = opcode value - 0x60 + 1
dataSize=byte(instruction)-0x60+1}i,inst,err:=buildOpcodeParameters(code,instruction,i,dataSize)// handle error
// onInstruction will call the processor we previously define
onInstruction(inst)}returnnil}
Final EVM bytecode reader
After all, the final EVM bytecode reader will be handled in two separe functions
OpcodeViewer function triggers the process of converting EVM bytecode to OPCODES.
parseBytecode function contains all common logic to parse any EVM bytecode efficiently
processor := func(inst *datatype.OpcodeData) the proccessor function. This is a developer defined function that will contain the logic to be executed when a EVM bytecode instruction is parsed. By default, it just adds the instruction string representation to an strings.Builder so that it can be printed in stdout later.
This design decouples the parsing logic from the result processing logic, allowing more modular, clean and reusable code.
1
2
3
4
5
6
7
8
9
10
11
12
13
// OpcodeViewer reads input bytecode to extract opcode list
funcOpcodeViewer(runtimeCodestring)(strings.Builder,error){varsbstrings.Builderprocessor:=func(inst*datatype.OpcodeData){iflen(inst.Input)>0{// the current OPCODE represents an instruction with parameters (PUSH)
sb.WriteString(fmt.Sprintf("%s %s 0x%s\n",inst.PC(0),inst.Name,inst.Input))}else{sb.WriteString(fmt.Sprintf("%s %s\n",inst.PC(0),inst.Name))}}returnsb,bytecodereader.ReadBytecode(runtimeCode,processor)}
I also defined a Test case and measured execution time with timeTrack function.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
functimeTrack(starttime.Time,namestring){elapsed:=time.Since(start)log.Printf("%s took %s",name,elapsed)}funcTestOpcodeViewer(t*testing.T){t.Run("from-bytecode",func(t*testing.T){defertimeTrack(time.Now(),"convert-to-opcodes")varsbstrings.Builderlist,err:=OpcodeViewer(sb,"608060405234801561001057600080fd5b50600436106100365760003560e01c806370a082311461003b578063a9059cbb1461006d575b600080fd5b61005b61004936600461010f565b60006020819052908152604090205481565b60405190815260200160405180910390f35b61008061007b366004610131565b610082565b005b3360009081526020819052604090205481111561009e57600080fd5b33600090815260208190526040812080548392906100bd908490610171565b90915550506001600160a01b038216600090815260208190526040812080548392906100ea908490610188565b90915550505050565b80356001600160a01b038116811461010a57600080fd5b919050565b60006020828403121561012157600080fd5b61012a826100f3565b9392505050565b6000806040838503121561014457600080fd5b61014d836100f3565b946020939093013593505050565b634e487b7160e01b600052601160045260246000fd5b6000828210156101835761018361015b565b500390565b6000821982111561019b5761019b61015b565b50019056fea26469706673582212205f26ba2996f6408ce05b13250d4ef2b5afa54847ef63bc2a113693b22dcf6f5764736f6c634300080b0033")assert.NoError(t,err)assert.NotEmpty(t,list)fmt.Println(list.String())})}
Finally, when running the test, will print to stdout the ordered list of EVM OPCODES, instruction PC and arguments. This is my output below:
As you can see, whole process of processing, parsing and output only takes 96.993µs
Bonus point: memory profiling
As addition, we can profile the CPU and memory usage of the application to detect bottlenecks and improvement points. The current implementation memory profile, shown below, flags the areas of the program in where most memory is allocated. As seen, the allocation process takes part in:
some []byte to string conversion
fmt.Printf method used to format each instruction to human representation
Memory profiling view in Go pprof
Memory profile view of the OPCODE converter
Some minor tweaks are still posible to gain a bit more speed. Future work then!
Thanks for checking this out and I hope you found the info useful! If you have any questions, don't hesitate to write me a comment below. And remember that if you like to see more content on, just let me know it and share this post with your colleges, co-workers, FFF, etc.