Jim McDonald11 Sep 2020

Evaluating Beacon Nodes

Multiple implementations of the Ethereum 2 node software exist. This article compares some of them to see if one is significantly better than the others.

Introduction

Beacon nodes are a fundamental part of the Ethereum 2 network to the extent that it can be said that they are the network. A beacon node communicates with other beacon nodes to share data, collates that data to provide an up-to-date state of the Ethereum 2 chain, tells validator clients of the duties they should perform, and provides those validator clients with information to sign in the form of attestations and proposals.

There are multiple implementations of the beacon node at various stages of development. These often have a specific focus (low memory usage, enterprise features, etc.) but all carry out roughly the same work. However, this does not mean that they are equally effective. As part of its ongoing consideration of which beacon nodes to use in its staking service, Attestant has taken a brief look at one aspect of such nodes and presents the results here.

Methodology

Three beacon nodes from different providers were configured with approximately the same configuration. Each beacon node was allowed to sync up fully with the Medalla testnet prior to testing starting.

Testing was undertaken for a single feature: requesting beacon block proposals. This was chosen because it involves a number of features within the beacon node, including:

network connectivity, to receive both individual and aggregate attestations
algorithmic efficiency, to aggregate attestations effectively
data access and caching, to propose a beacon block quickly

At the beginning of each slot^[A slot being a 12-second period of time as defined by the Ethereum 2 chain parameters.] a beacon block proposal for that slot was requested from each beacon node via their API, and the block evaluated by Attestant to provide a score. The score was related to the usefulness of the data in the block to the network, and was calculated as follows:

$0.75+\frac{0.25}{n}$ for each validator in each attestation included with an inclusion distance^[For more about inclusion distance, see this article.] of $n$
$700$ for each slashing event included

It should be noted that the testing is somewhat rough and ready, with a number of areas in which there was not full control or measurement of the environment:

the beacon nodes had different peers, some of which may have been more connected than others
attestations with an inclusion distance greater than 1 could have been duplicate attestations, and so scored incorrectly
attestations could have contained incorrect data, but were scored as if all data was correct
all beacon nodes ran on the same server, so were competing for resources
the data was gathered over a single 24-hour period, with no repeat testing or testing across different periods
no consideration was taken of CPU or memory usage, or other operating system metrics

Because of this, the names of the providers are not supplied with the data or results. Instead, they are labelled as 'node A', 'node B', and 'node C'.

Results and discussion

The raw data for these results is available for viewing and analysis.

Looking at the data, the obvious question to ask is: "which node provides the best block proposals?" To answer this, the scores for each of the block proposals for each slot were compared across the three nodes. If one node had the highest score of the three it was marked as providing the best block proposal for the slot, with honors shared if two or three nodes returned equally high scores. On this basis the following results were obtained:

Figure 1: % of slots for which each node's proposal had the highest score

This appears to show nodes A and C roughly equal, with node B far behind. However, it is always important to consider the data in context. The method above provides no indication of how much better one score is than another. To provide a different comparison, the scores for each node were summed across all slots as follows:

Figure 2: Total score for each node

We see a very different picture here, with nodes A and B providing proposals with roughly the same scores, and node C slightly ahead. So although node B rarely provided the best block, it was very close to doing so. And lower scores can in fact be a feature: as mentioned above, attestations with higher inclusion distances could be duplicates. A beacon node that takes the time and care to remove duplicate attestations from its proposals, resulting in less data stored on-chain and fewer cycles required to process the block, would be scored lower using the above methodology even though it is objectively doing a better job for the network. A more nuanced methodology would provide a clearer view, but is outside of the scope of this article due to the additional data gathering and analysis required.

The contents of the block proposal are not the only thing that matters. The time taken to propose a block is important: if the block takes too long to propose then it could end up not reaching validator clients before they attest, in which case the block could be excluded from the chain. As part of the data gathering, the time taken for each block proposal to be supplied was recorded. The distribution of proposal times is shown below:

Figure 3: Block proposal time distribution (milliseconds)

The vast majority of node A's proposals are returned around 200ms, with a few reaching up to around 300ms, and node B's proposals are very similar. Node C is interesting: around 90% of its proposals are returned very quickly, at around 100ms or less, but a significant minority take much longer, in the 900ms-1,200ms range.

This raises the question of whether the occasional longer duration of node C's proposal generation has an impact on block inclusion; are those blocks more likely to be excluded, and if so would this alter the long-term performance of the beacon node? This is another question for further study.

Conclusion

From this very brief and specific investigation of beacon node performance, Attestant discovered that whilst each node had strengths and weaknesses, all three nodes performed well.

So what does this tell you if you are considering which beacon node software to run? The most important point is that there is no best choice, and there is no wrong choice. No single beacon node excelled in all aspects, and none stood out as underperforming. And, indeed, as mentioned above, some of the node scores may be artificially high, and, regardless, users should be aware of the possible benefits of lower scores.

This is great news for client diversity: users can select whichever of the beacon nodes suits their own infrastructure, requirements and desires without fear that their choice will have an impact on their validators' performance (and hence earnings). And with optimizations ongoing with all client teams the future is looking bright regardless of the beacon node you choose.

Beacon nodes
Ethereum consensus layer