A pure Go Blazinly fast PRNG 64 bit ID generator

Table of Content

Blazinly fast PRNG 64 bit are here

Sometimes, we require to have an identifier in order to store things, track behaviours etc. In this scenario, I was required to rapidly generate ID strings in order to be used as Request ID for an HTTP service. Those IDs would be included in HTTP service response X-Request-ID header and tracked in Opentracing. I did some initial implementation and saw a performance degradation that I did not like it. That’s the main reason to dig deeper in Go PRNG generator, to see how it works, and make some tweaks.

Warning

The IDs created by this algorithm are not pure random nor based on a secure entropy origin. Do not use them for any crypto related stuff. If you want secure random IDs or low collision probability IDs, see crypto/rand package at https://pkg.go.dev/crypto/rand#Read

Creating IDs: the common way

I say it’s the common way because is the method used by many developers, as show in StackOverflow questions and other Internet resources:

So, Let’s follow this idea and assume we have a function gen() that creates a hexadecimal encoded UUID of $8$ bytes. The function can be described as follows:

1
2
3
4
5
6
7
8
// gen creates a hex encoded string of 16 chars
func gen() string {
    dst := make([]byte, 8)
    for i := 0; i < 8; i++ {
        dst[i] = byte(rand.Intn(256))
    }
    return hex.EncodeToString(dst[:])
}

And that’s all, the easiest way to create a random ID in Go, that creates outputs like

1
336159b54b9e2839

Run it many times, and you will get different results each time.

`gen()` function description

There are several things gen() functions does under the hood:

At the beginning, gen() allocates a new byte slice to hold 8 bytes.
Next, it fills the byte slice positions with pseudorandom values calling the function rand.Intn in where returned values will be in range $[0, 255]$, the range of the byte data type.
Finally, it encodes the []byte slice result as hexadecimal and converts it to string

`gen()` function known allocations

As we know from other optimization post series such us Lightning fast stock market data parsing, we know that

[]byte slice conversion to string requires allocating memory.

`gen()` function known optimization points

As we know from other optimization post series, we know that

EncodeToString method of package hex makes an additional loop over input data. In this specific case, this extra loop could be avoided processing all in same initial loop for i:=0; i<8;i++.

However, this time, we need to focus on rand.Intn implementation to see how it works, and if we have room for improvements.

After, analyzing internal code of rand.Intn the conclusions are:

It has an internal data access lock to support concurrent code. We can remove this lock, and delegate this responsability on callers. This change will create an speedup on prng generation code.
rand.Intn calculates the maximum size of the random number range, and applies different logic if number max value is $2^n$ or not. Since our max range is $256$, we fullfil the condition of $256 \equiv 2^n \equiv 2^8$

Finally, there are other minor tweaks we can do, on original function, such as:

use custom data type to tell Go compiler the expected size of slices instead of using []byte
expand loops

Optimization steps

1. Replace `[]byte` with a custom data type

In order to avoid data allocations as much as possible it is highly recommended to use custom Go data types when possible. In this scenario we need to replace our original byte slice with a custom data type. For this reason, we create a custom struct called RandUUID

1
type RandUUID [16]byte

As you can see, this struct doubles our initial size of $8$ bytes. The reason for this, is that its ready for holding the result already encoded as hexacimal, and hex format requires $2n$ bytes, being our original size $n=8$ bytes.

2. Generate a non-negative pseudo random number

In our original example, in where we call dst[i] = byte(rand.Intn(256)) we are generating on each loop round, a pseudo random number. However, internally, the pseudo random number generator, will generate a int64 number for each round.

Luckily for us, with a single int64 number we can fill our original []byte slice of size $8$ and we can avoid the additional calls to rand.Intn, so instead of making $8$ calls, we make just $1$. We make this single call using the modified function Int63() defined below.

1
2
3
4
// Int63 returns a non-negative pseudo-random 63-bit integer as an int64.
func (rng *rngSource) Int63() int64 {
	return int64(rng.Uint64() & rngMask)
}

Original implementation of rgnSource can be found online at https://golang.org/src/math/rand/rng.go and https://github.com/golang/go/blob/master/src/math/rand/rng.go

The trick here, is that we need to process the returned int64 number to fill the []byte slice correctly.

3. Processing the 63-bit integer as an `int64`

In order to process the generated int64 by function Int63(), we need to make some byte level processing. In this case, we do some right shifting for each othe bytes we need to get. Since our []byte slice is of $n = 8$, we need to make $8$ shifting as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
var uid RandUUID

// generate a single int64 pseudo random number
u := rng.Int63()

// just in case, tell go compiler max byte size
// to avoid continuous bound checking
_ = uid[15]

// process the original int64, shifting bytes
uid[0] = byte(u)
uid[2] = byte(u >> 8)
uid[4] = byte(u >> 16)
uid[6] = byte(u >> 24)
uid[8] = byte(u >> 32)
uid[10] = byte(u >> 40)
uid[12] = byte(u >> 48)
uid[14] = byte(u >> 56)

At this point, our result RandUUID contains the $8$ bytes filled in their expected positions.

Did you notice we did not fill the slice in sequentical positions?

This is because, we are using the same slice to hold final result encoded as hexadecimal format.

4. Convert our result to hexadecimal format

Our hexadecimal conversion of the []byte slice to string will be based on hex/encoding Go package function EncodeToString which is described at https://pkg.go.dev/encoding/hex#EncodeToString. The function indeed, has the following signature

1
func EncodeToString(src []byte) string

It receives a []byte slices, do some computation and converts the result to string

However, we need to tweak the function to:

Do not allocate a new []byte slice for the result
Avoid data conversion betweeen []byte to string

Taking a brief look to EncodeToString we discover the simple logic used to convert an arbitrary byte slice to hex format.

1
2
3
4
5
6
7
8
9
func Encode(dst, src []byte) int {
	j := 0
	for _, v := range src {
		dst[j] = hextable[v>>4]
		dst[j+1] = hextable[v&0x0f]
		j += 2
	}
	return len(src) * 2
}

So the final tweak we need to do, is to expand the loop. We can do this, because we have a fixed size input slice, that will be always constant with a value of $n=8$

4.1 Expanding the loop

To expand the loop, we just need to execute manually the $8$ iterations as follows in order to convert the data to hex format.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
// expanded for loop
var v byte
_ = uid[15]
v = uid[0]
uid[0] = hextable[v>>4]
uid[1] = hextable[v&0x0f]

v = uid[2]
uid[2] = hextable[v>>4]
uid[3] = hextable[v&0x0f]

v = uid[4]
uid[4] = hextable[v>>4]
uid[5] = hextable[v&0x0f]

v = uid[6]
uid[6] = hextable[v>>4]
uid[7] = hextable[v&0x0f]

v = uid[8]
uid[8] = hextable[v>>4]
uid[9] = hextable[v&0x0f]

v = uid[10]
uid[10] = hextable[v>>4]
uid[11] = hextable[v&0x0f]

v = uid[12]
uid[12] = hextable[v>>4]
uid[13] = hextable[v&0x0f]

v = uid[14]
uid[14] = hextable[v>>4]
uid[15] = hextable[v&0x0f]

Measuring performance

After all our efforts of having a fast prng, we need to measure in order to be sure of our results. To do so, we use Go built in benchmarking tool as follows:

Compute a benchmark on our base gen() function and save it as old.txt
Apply all our changes to gen() function
Compute a benchmark on our modified gen() function and save it as new.txt
Check results between both algorithms using benchstat

1. Compute a benchmark on our base gen() function and save it as old.txt

1
go test -v -run=^$ -bench=^BenchmarkGen/\$ -benchtime=2s -count=10 > old.txt

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
BenchmarkGen-0  4127385  596.6 ns/op  1.68 MB/s  32 B/op  2 allocs/op
BenchmarkGen-0  3917702  621.9 ns/op  1.61 MB/s  32 B/op  2 allocs/op
BenchmarkGen-0  4035345  583.7 ns/op  1.71 MB/s  32 B/op  2 allocs/op
BenchmarkGen-0  4172944  588.9 ns/op  1.70 MB/s  32 B/op  2 allocs/op
BenchmarkGen-0  4136286  586.1 ns/op  1.71 MB/s  32 B/op  2 allocs/op
BenchmarkGen-0  3956214  597.3 ns/op  1.67 MB/s  32 B/op  2 allocs/op
BenchmarkGen-0  4073704  589.3 ns/op  1.70 MB/s  32 B/op  2 allocs/op
BenchmarkGen-0  3812052  603.9 ns/op  1.66 MB/s  32 B/op  2 allocs/op
BenchmarkGen-0  4140816  599.4 ns/op  1.67 MB/s  32 B/op  2 allocs/op
BenchmarkGen-0  3886472  594.8 ns/op  1.68 MB/s  32 B/op  2 allocs/op

2. Apply all our changes to gen() function

Apply our changes to the code.

3. Compute a benchmark on our modified gen() function and save it as new.txt

1
go test -v -run=^$ -bench=^BenchmarkGen/\$ -benchtime=2s -count=10 > new.txt

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
BenchmarkGen-0  41574139  56.28 ns/op  17.77 MB/s  0 B/op  0 allocs/op
BenchmarkGen-0  43105999  55.83 ns/op  17.91 MB/s  0 B/op  0 allocs/op
BenchmarkGen-0  42470292  55.33 ns/op  18.07 MB/s  0 B/op  0 allocs/op
BenchmarkGen-0  44294082  55.39 ns/op  18.05 MB/s  0 B/op  0 allocs/op
BenchmarkGen-0  43065757  55.16 ns/op  18.13 MB/s  0 B/op  0 allocs/op
BenchmarkGen-0  43489963  54.92 ns/op  18.21 MB/s  0 B/op  0 allocs/op
BenchmarkGen-0  40353843  55.29 ns/op  18.09 MB/s  0 B/op  0 allocs/op
BenchmarkGen-0  44173886  55.38 ns/op  18.06 MB/s  0 B/op  0 allocs/op
BenchmarkGen-0  44519041  54.78 ns/op  18.26 MB/s  0 B/op  0 allocs/op
BenchmarkGen-0  43383703  55.69 ns/op  17.96 MB/s  0 B/op  0 allocs/op

4. Check results between both algorithms using benchstat

In order to find if we have some performance gain or decrease, we compute the difference using benchstat tool comparing our both old.txt and new.txt files.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
benstat old.txt new.txt

name      old time/op    new time/op     delta
Gen-0     593ns ± 2%     55ns ± 2%       -90.66%   (p=0.000 n=9+10)

name      old speed      new speed       delta
Gen-0     1.69MB/s ± 2%  18.05MB/s ± 2%  +970.22%  (p=0.000 n=9+10)

name      old alloc/op   new alloc/op    delta
Gen-0     32.0B ± 0%     0.0B            -100.00%  (p=0.000 n=10+10)

name      old allocs/op  new allocs/op   delta
Gen-0     2.00 ± 0%      0.00            -100.00%  (p=0.000 n=10+10)

Putting results of time/op in a graph, we get a clear view of the performance improvements of that +90%

Final results

I show you how you can optimize programs and algorithms using a real example. As a gift, I assembled a library you can use as additional tool to create IDs, and published the library as opensource at Github. It’s up to you if it fits in your needs or not.

Example Go script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
package main

import (
	"fmt"
	"github.com/zerjioang/prng"
)

func main(){
	uuid := prng.New()
	fmt.Println(uuid)
}

Creates the output

1
dca90c93b8848229

being +90% CPU and +970.22% memory faster with zero allocations.

Get the code at Github

You can download the code of the prng at Github visiting https://github.com/zerjioang/prng or you can import it in your Go project as module/library as

1
2
3
import (
    "github.com/zerjioang/prng"
)

References

https://yourbasic.org/golang/bitwise-operator-cheat-sheet/

Subscribe, donate or become premium

Thanks for checking this out and I hope you found the info useful! If you have any questions, don't hesitate to write me a comment below. And remember that if you like to see more content on, just let me know it and share this post with your colleges, co-workers, FFF, etc.