Back
Featured image of post Converting EVM bytecode to OPCODES in microseconds

Converting EVM bytecode to OPCODES in microseconds

Understanding how EVM bytecode can be converted to its OPCODE representation fast in Go

Table of Content

Decompiling EVM Bytecode

The process of decompiling an ethereum application can be very expensive. When you start an analysis process, you can take as a basis the network bytecode or the source code in Solidity. Doing a white box analysis is generally less expensive and requires programming knowledge in Solidity only. A bytecode analysis, however, requires low-level knowledge, understanding the operation of opcodes, EVM instructions, the stack, memory and storage modules, etc.

Regardless, one of the first tasks to do, when trying to understand what a Dapp does, is to process its bytecode. And by the way, low level stuff is more fun!

Bytecode modelling in Go

First thing we need to do, it to model the bytecode representation in Go language. For this purpose, we take that all input data will be as hexadecimal strings in format 0x... or without the prefix 0x.

Since opcode can vary from 0x00 to 0xFF we will consider a single byte as data type for an opcode, given its defined data range.

1
2
// OpCode is an EVM opcode
type OpCode byte

In the same way, we also define an OPCODE category as below (for future uses)

1
2
// OpcodeCategory is a predefined category of the opcode
type OpcodeCategory uint8

Bytecode processor in Go

To define what we do when we found a valid OPCODE or EVM instruction, we create custom datatype that will be used as data processor. In this case, it enough with:

1
type opcodeProcessor func(inst *datatype.OpcodeData)

With this, we decouple the parsing logic from output processing logic.

Note that OpcodeData is just a wrapper struct that holds all required information to print the EVM instruction friendly to stdout.

Main EVM bytecode processing loop

The logic of EVM processing is simple. We need to read each of the opcodes and convert it to its human naming representation. Additionaly, if readed opcode is a PUSH instruction, we also need to read the parameters. When the instruction reading is over, we need to move to following one. This process must be repeated until no more opcodes are left.

Warning

Following code is a kind of pseudo code and might not compile if you copy and paste as is.

For completeness, previous description is transpiled to next Go pseudocode:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
func parseBytecode(code []byte, onInstruction opcodeProcessor) error {
    // check that no empty bytecode was provided
    codesize := uint64(len(code))
    // notes for reader:
    // * remember to handle edge cases properly
    // * add implement errors
    i := uint64(0)

    for i < codesize {
        // read opcode bytes
        // since its encoded in hex, we need to read 2 bytes (2 char)
        op := code[i : i+2]
        // get opcode value from extracted bytes
        instruction := hex2opcode(op)
        var dataSize byte = 0
        if instruction.IsPush() {
            // size of the input data to be pushed is calculated based on current opcode
            // data size = opcode value - 0x60 + 1
            dataSize = byte(instruction) - 0x60 + 1
        }
        i, inst, err := buildOpcodeParameters(code, instruction, i, dataSize)
        // handle error
        // onInstruction will call the processor we previously define
        onInstruction(inst)
    }
    return nil
}

Final EVM bytecode reader

After all, the final EVM bytecode reader will be handled in two separe functions

  1. OpcodeViewer function triggers the process of converting EVM bytecode to OPCODES.
  2. parseBytecode function contains all common logic to parse any EVM bytecode efficiently
  3. processor := func(inst *datatype.OpcodeData) the proccessor function. This is a developer defined function that will contain the logic to be executed when a EVM bytecode instruction is parsed. By default, it just adds the instruction string representation to an strings.Builder so that it can be printed in stdout later.

This design decouples the parsing logic from the result processing logic, allowing more modular, clean and reusable code.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
// OpcodeViewer reads input bytecode to extract opcode list
func OpcodeViewer(runtimeCode string) (strings.Builder, error) {
    var sb strings.Builder
    processor := func(inst *datatype.OpcodeData) {
        if len(inst.Input) > 0 {
            // the current OPCODE represents an instruction with parameters (PUSH)
            sb.WriteString(fmt.Sprintf("%s %s 0x%s\n", inst.PC(0), inst.Name, inst.Input))
        } else {
            sb.WriteString(fmt.Sprintf("%s %s\n", inst.PC(0), inst.Name))
        }
    }
    return sb, bytecodereader.ReadBytecode(runtimeCode,processor)
}

I also defined a Test case and measured execution time with timeTrack function.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16

func timeTrack(start time.Time, name string) {
    elapsed := time.Since(start)
    log.Printf("%s took %s", name, elapsed)
}

func TestOpcodeViewer(t *testing.T) {
    t.Run("from-bytecode", func(t *testing.T) {
        defer timeTrack(time.Now(), "convert-to-opcodes")
        var sb strings.Builder
        list, err := OpcodeViewer(sb, "608060405234801561001057600080fd5b50600436106100365760003560e01c806370a082311461003b578063a9059cbb1461006d575b600080fd5b61005b61004936600461010f565b60006020819052908152604090205481565b60405190815260200160405180910390f35b61008061007b366004610131565b610082565b005b3360009081526020819052604090205481111561009e57600080fd5b33600090815260208190526040812080548392906100bd908490610171565b90915550506001600160a01b038216600090815260208190526040812080548392906100ea908490610188565b90915550505050565b80356001600160a01b038116811461010a57600080fd5b919050565b60006020828403121561012157600080fd5b61012a826100f3565b9392505050565b6000806040838503121561014457600080fd5b61014d836100f3565b946020939093013593505050565b634e487b7160e01b600052601160045260246000fd5b6000828210156101835761018361015b565b500390565b6000821982111561019b5761019b61015b565b50019056fea26469706673582212205f26ba2996f6408ce05b13250d4ef2b5afa54847ef63bc2a113693b22dcf6f5764736f6c634300080b0033")
        assert.NoError(t, err)
        assert.NotEmpty(t, list)
        fmt.Println(list.String())
    })
}

Finally, when running the test, will print to stdout the ordered list of EVM OPCODES, instruction PC and arguments. This is my output below:

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
=== RUN   TestOpcodeViewer
=== RUN   TestOpcodeViewer/from-bytecode
0x0000 PUSH1 0x80
0x0002 PUSH1 0x40
0x0004 MSTORE
0x0005 CALLVALUE
0x0006 DUP1
0x0007 ISZERO
0x0008 PUSH2 0x0010
0x000B JUMPI
0x000C PUSH1 0x00
0x000E DUP1
0x000F REVERT
0x0010 JUMPDEST
0x0011 POP
0x0012 PUSH1 0x04
0x0014 CALLDATASIZE
0x0015 LT
0x0016 PUSH2 0x0036
0x0019 JUMPI
0x001A PUSH1 0x00
0x001C CALLDATALOAD
0x001D PUSH1 0xe0
0x001F SHR
0x0020 DUP1
0x0021 PUSH4 0x70a08231
0x0026 EQ
0x0027 PUSH2 0x003b
0x002A JUMPI
0x002B DUP1
0x002C PUSH4 0xa9059cbb
0x0031 EQ
0x0032 PUSH2 0x006d
0x0035 JUMPI
0x0036 JUMPDEST
0x0037 PUSH1 0x00
0x0039 DUP1
0x003A REVERT
0x003B JUMPDEST
0x003C PUSH2 0x005b
0x003F PUSH2 0x0049
0x0042 CALLDATASIZE
0x0043 PUSH1 0x04
0x0045 PUSH2 0x010f
0x0048 JUMP
0x0049 JUMPDEST
0x004A PUSH1 0x00
0x004C PUSH1 0x20
0x004E DUP2
0x004F SWAP1
0x0050 MSTORE
0x0051 SWAP1
0x0052 DUP2
0x0053 MSTORE
0x0054 PUSH1 0x40
0x0056 SWAP1
0x0057 SHA3
0x0058 SLOAD
0x0059 DUP2
0x005A JUMP
0x005B JUMPDEST
0x005C PUSH1 0x40
0x005E MLOAD
0x005F SWAP1
0x0060 DUP2
0x0061 MSTORE
0x0062 PUSH1 0x20
0x0064 ADD
0x0065 PUSH1 0x40
0x0067 MLOAD
0x0068 DUP1
0x0069 SWAP2
0x006A SUB
0x006B SWAP1
0x006C RETURN
0x006D JUMPDEST
0x006E PUSH2 0x0080
0x0071 PUSH2 0x007b
0x0074 CALLDATASIZE
0x0075 PUSH1 0x04
0x0077 PUSH2 0x0131
0x007A JUMP
0x007B JUMPDEST
0x007C PUSH2 0x0082
0x007F JUMP
0x0080 JUMPDEST
0x0081 STOP
0x0082 JUMPDEST
0x0083 CALLER
0x0084 PUSH1 0x00
0x0086 SWAP1
0x0087 DUP2
0x0088 MSTORE
0x0089 PUSH1 0x20
0x008B DUP2
0x008C SWAP1
0x008D MSTORE
0x008E PUSH1 0x40
0x0090 SWAP1
0x0091 SHA3
0x0092 SLOAD
0x0093 DUP2
0x0094 GT
0x0095 ISZERO
0x0096 PUSH2 0x009e
0x0099 JUMPI
0x009A PUSH1 0x00
0x009C DUP1
0x009D REVERT
0x009E JUMPDEST
0x009F CALLER
0x00A0 PUSH1 0x00
0x00A2 SWAP1
0x00A3 DUP2
0x00A4 MSTORE
0x00A5 PUSH1 0x20
0x00A7 DUP2
0x00A8 SWAP1
0x00A9 MSTORE
0x00AA PUSH1 0x40
0x00AC DUP2
0x00AD SHA3
0x00AE DUP1
0x00AF SLOAD
0x00B0 DUP4
0x00B1 SWAP3
0x00B2 SWAP1
0x00B3 PUSH2 0x00bd
0x00B6 SWAP1
0x00B7 DUP5
0x00B8 SWAP1
0x00B9 PUSH2 0x0171
0x00BC JUMP
0x00BD JUMPDEST
0x00BE SWAP1
0x00BF SWAP2
0x00C0 SSTORE
0x00C1 POP
0x00C2 POP
0x00C3 PUSH1 0x01
0x00C5 PUSH1 0x01
0x00C7 PUSH1 0xa0
0x00C9 SHL
0x00CA SUB
0x00CB DUP3
0x00CC AND
0x00CD PUSH1 0x00
0x00CF SWAP1
0x00D0 DUP2
0x00D1 MSTORE
0x00D2 PUSH1 0x20
0x00D4 DUP2
0x00D5 SWAP1
0x00D6 MSTORE
0x00D7 PUSH1 0x40
0x00D9 DUP2
0x00DA SHA3
0x00DB DUP1
0x00DC SLOAD
0x00DD DUP4
0x00DE SWAP3
0x00DF SWAP1
0x00E0 PUSH2 0x00ea
0x00E3 SWAP1
0x00E4 DUP5
0x00E5 SWAP1
0x00E6 PUSH2 0x0188
0x00E9 JUMP
0x00EA JUMPDEST
0x00EB SWAP1
0x00EC SWAP2
0x00ED SSTORE
0x00EE POP
0x00EF POP
0x00F0 POP
0x00F1 POP
0x00F2 JUMP
0x00F3 JUMPDEST
0x00F4 DUP1
0x00F5 CALLDATALOAD
0x00F6 PUSH1 0x01
0x00F8 PUSH1 0x01
0x00FA PUSH1 0xa0
0x00FC SHL
0x00FD SUB
0x00FE DUP2
0x00FF AND
0x0100 DUP2
0x0101 EQ
0x0102 PUSH2 0x010a
0x0105 JUMPI
0x0106 PUSH1 0x00
0x0108 DUP1
0x0109 REVERT
0x010A JUMPDEST
0x010B SWAP2
0x010C SWAP1
0x010D POP
0x010E JUMP
0x010F JUMPDEST
0x0110 PUSH1 0x00
0x0112 PUSH1 0x20
0x0114 DUP3
0x0115 DUP5
0x0116 SUB
0x0117 SLT
0x0118 ISZERO
0x0119 PUSH2 0x0121
0x011C JUMPI
0x011D PUSH1 0x00
0x011F DUP1
0x0120 REVERT
0x0121 JUMPDEST
0x0122 PUSH2 0x012a
0x0125 DUP3
0x0126 PUSH2 0x00f3
0x0129 JUMP
0x012A JUMPDEST
0x012B SWAP4
0x012C SWAP3
0x012D POP
0x012E POP
0x012F POP
0x0130 JUMP
0x0131 JUMPDEST
0x0132 PUSH1 0x00
0x0134 DUP1
0x0135 PUSH1 0x40
0x0137 DUP4
0x0138 DUP6
0x0139 SUB
0x013A SLT
0x013B ISZERO
0x013C PUSH2 0x0144
0x013F JUMPI
0x0140 PUSH1 0x00
0x0142 DUP1
0x0143 REVERT
0x0144 JUMPDEST
0x0145 PUSH2 0x014d
0x0148 DUP4
0x0149 PUSH2 0x00f3
0x014C JUMP
0x014D JUMPDEST
0x014E SWAP5
0x014F PUSH1 0x20
0x0151 SWAP4
0x0152 SWAP1
0x0153 SWAP4
0x0154 ADD
0x0155 CALLDATALOAD
0x0156 SWAP4
0x0157 POP
0x0158 POP
0x0159 POP
0x015A JUMP
0x015B JUMPDEST
0x015C PUSH4 0x4e487b71
0x0161 PUSH1 0xe0
0x0163 SHL
0x0164 PUSH1 0x00
0x0166 MSTORE
0x0167 PUSH1 0x11
0x0169 PUSH1 0x04
0x016B MSTORE
0x016C PUSH1 0x24
0x016E PUSH1 0x00
0x0170 REVERT
0x0171 JUMPDEST
0x0172 PUSH1 0x00
0x0174 DUP3
0x0175 DUP3
0x0176 LT
0x0177 ISZERO
0x0178 PUSH2 0x0183
0x017B JUMPI
0x017C PUSH2 0x0183
0x017F PUSH2 0x015b
0x0182 JUMP
0x0183 JUMPDEST
0x0184 POP
0x0185 SUB
0x0186 SWAP1
0x0187 JUMP
0x0188 JUMPDEST
0x0189 PUSH1 0x00
0x018B DUP3
0x018C NOT
0x018D DUP3
0x018E GT
0x018F ISZERO
0x0190 PUSH2 0x019b
0x0193 JUMPI
0x0194 PUSH2 0x019b
0x0197 PUSH2 0x015b
0x019A JUMP
0x019B JUMPDEST
0x019C POP
0x019D ADD
0x019E SWAP1
0x019F JUMP
0x01A0 INVALID
0x01A1 LOG2
0x01A2 PUSH5 0x6970667358
0x01A8 UNKNOWN 0x22
0x01A9 SLT
0x01AA SHA3
0x01AB UNKNOWN 0x5f
0x01AC UNKNOWN 0x26
0x01AD GETLOCAL
0x01AE UNKNOWN 0x29
0x01AF SWAP7
0x01B0 RNGSEED
0x01B1 BLOCKHASH
0x01B2 DUP13
0x01B3 UNKNOWN 0xe0
0x01B4 JUMPDEST
0x01B5 SGT
0x01B6 UNKNOWN 0x25
0x01B7 UNKNOWN 0x0d
0x01B8 UNKNOWN 0x4e
0x01B9 CALLCODE
0x01BA BEGINSUB
0x01BB UNKNOWN 0xaf
0x01BC UNKNOWN 0xa5
0x01BD BASEFEE
0x01BE SELFBALANCE
0x01BF UNKNOWN 0xef
0x01C0 PUSH4 0xbc2a1136
0x01C5 SWAP4
0x01C6 JUMPSUB
0x01C7 UNKNOWN 0x2d
0x01C8 UNKNOWN 0xcf
0x01C9 PUSH16 0x5764736f6c634300080b0033

convert-to-opcodes took 96.993µs

--- PASS: TestOpcodeViewer (0.00s)

Process finished with the exit code 0

As you can see, whole process of processing, parsing and output only takes 96.993µs

Bonus point: memory profiling

As addition, we can profile the CPU and memory usage of the application to detect bottlenecks and improvement points. The current implementation memory profile, shown below, flags the areas of the program in where most memory is allocated. As seen, the allocation process takes part in:

  • some []byte to string conversion
  • fmt.Printf method used to format each instruction to human representation

Memory profiling view in Go pprof

Memory profile view of the OPCODE converter
Memory profile view of the OPCODE converter

Some minor tweaks are still posible to gain a bit more speed. Future work then!

References



💬 Share this post in social media

Thanks for checking this out and I hope you found the info useful! If you have any questions, don't hesitate to write me a comment below. And remember that if you like to see more content on, just let me know it and share this post with your colleges, co-workers, FFF, etc.

Please, don't try to hack this website servers. Guess why...