Chess Devlog: Huge Performance Improvements

Date: June 20, 2025

Tags: Technology

I just released v2.0.3 of https://github.com/brighamskarda/chess. And it has some huge improvements. It went from taking over 20 seconds to calculate perft 6 from the staring position to taking just 6.5 seconds with profile guided optimization enabled. Here are some of the changes I made.

Performance Improvements

Remove Square Validation

There were three bitboard functions that had squares as parameters. Bitboard operations are meant to be low level and are called quite often. Removing the bit of code that checks if a square was valid in these functions was technically a breaking change, but it was reasonable tradeoff of speed vs correctness.

Reducing Slice Growths

This was definitely one of the biggest gains in performance. By pre-allocating all the memory a slice might need on creation I was able to eliminate a ton of expensive slice growths in the move generation sections of my code.

Before Optimization

moves := make([]Move, 0)
moves = append(moves, pawnMoves(pos)...)
moves = append(moves, rookMoves(pos)...)
moves = append(moves, knightMoves(pos)...)
moves = append(moves, bishopMoves(pos)...)
moves = append(moves, queenMoves(pos)...)
moves = append(moves, kingMoves(pos)...)

After Optimization

pawnMoves := pawnMoves(pos)
rookMoves := rookMoves(pos)
knightMoves := knightMoves(pos)
bishopMoves := bishopMoves(pos)
queenMoves := queenMoves(pos)
kingMoves := kingMoves(pos)
moves := make([]Move, 0, len(pawnMoves)+len(rookMoves)+len(knightMoves)+len(bishopMoves)+len(queenMoves)+len(kingMoves))
moves = append(moves, pawnMoves...)
moves = append(moves, rookMoves...)
moves = append(moves, knightMoves...)
moves = append(moves, bishopMoves...)
moves = append(moves, queenMoves...)
moves = append(moves, kingMoves...)

Magic Bitboards

I would be lying if I said that I truly understood how magic bitboards work. But with a little help from the internet I was able to throw together an implementation of magic bitboards that would be crucial in bringing down my perft time from 11 seconds to below 8.

Many thanks to Chess Programming for providing an example implementation I could use. https://youtu.be/4ohJQ9pCkHI?si=OYqejChUKLDzd_-V

Other Changes

Of course there were many smaller optimizations applied throughout the codebase. But these were the major ones. I think these two images share the story well. The first one is the go profile output at the start, and the second one is the go profile output at the end. Just look at how much the total runtime for each of the functions has decreased. (You may want to open them in a separate tab to view them better).

Results

I did a bunch of testing, and I am quite surprised at how competitive my library seems to be. It is faster than all but one other chess libraries written in Golang. Here are the results of my testing in seconds.

Repository	Starting Position (Perft-6)	KiwiPete Position (Perft-5)	End Position (Perft-6)
brighamskarda/chess	7.58	12.04	0.69
CorentinGS/chess	32.49	52.94	4.95
dylhunn/dragontoothmg	1.44	1.46	0.22
keelus/chess	435	*	37.86
malbrecht/chess	77.39	147.97	5.44
eightsquared/chess	7.075	22.35	1.01

Of course, there are many chess engines written in go that are much faster than these, but in terms of chess libraries that are ready and easy to use, this is pretty much what there is. Even the dylhunn/dragontoothmg library, which is faster than mine, lacks pgn support.

In the end I am satisfied with these results for now. Calculating 15 million nodes per second, while not mind-boggling, is still an achievement. And it seems to be plenty sufficient for making 2000+ ELO engines.