preloader

Evaluating beacon nodes 2022 edition

BY Jim McDonald ON Mar 03, 2022

Beacon nodes are a fundamental part of the Ethereum consensus network: in fact, it could be said that they are the network. A beacon node communicates with other beacon nodes to share data, collates that data to provide an up-to-date state of the Ethereum 2 chain, tells validator clients of the duties they should perform, and provides those validator clients with information to sign in the form of attestations and proposals.

There are multiple implementations of the beacon node at various stages of maturity. These often have a specific focus (low memory usage, enterprise features, etc.) but all carry out roughly the same work. However, this does not mean that they are equally good in all situations . As part of its ongoing evaluation of their effectiveness, Attestant has taken a look at one aspect of such nodes and presents the results here.

Much has changed since our original evaluation in 2020. The beacon node mainnet has launched and gone through a hardfork, block value calculations have changed, and the number of validators has increased massively. What may have worked well for a network with 16,000 validators may not perform so well when faced with 280,0001. Equally, every beacon node has gone through multiple releases which could have improved performance. Such evolution of the network means that a re-evaluation is now more than worthwhile.

Methodology

Four beacon nodes from different providers were configured with the same parameters where possible. Each beacon node was allowed to sync up fully with the Prater testnet prior to the test beginning.

Testing was undertaken for a single feature: the generation of beacon block proposals. This was chosen because it involves a number of attributes within the beacon node, including:

  • network connectivity, to receive data to place in the block
  • algorithmic efficiency, to aggregate attestations effectively prior to placing in the block
  • completeness/time trade-offs, when packing attestations in to a block

For 300 slots2 over a 24-hour period a beacon block proposal for that slot was requested from each beacon node via their API, and the block evaluated by Attestant to provide a score. The score was related to the usefulness of the data in the block to the network, and was calculated as follows:

  • \(\frac{14}{64}\) for each validator in each attestation included with a correct and timely head vote
  • \(\frac{14}{64}\) for each validator in each attestation included with a timely source vote3
  • \(\frac{26}{64}\) for each validator in each attestation included with a correct and timely target vote
  • \(\frac{2}{64}\) for each validator included in the sync committee aggregate

The above values are based on the specification’s rewards for the items’ inclusion, which in turn are based on the usefulness of the information to help progress the chain. This means that a block’s score is relatable to its value to both the proposer and the chain.

Note that the above process is history-aware: it gives no score for repeated votes, either repeated in the same block or in an earlier block. This is an improvement over the previous evaluation, and provides a much more accurate score. That said, it should be noted that the testing is somewhat rough and ready, with a number of areas in which there was not full control or measurement of the environment:

  • the beacon nodes had different peers, some of which may have been more connected than others
  • all beacon nodes ran on the same server, so were competing for resources
  • the data was gathered over a single 24-hour period, with no repeat testing or testing across different periods
  • no consideration was taken of CPU or memory usage, or other operating system metrics that could affect results when running on different hardware

In addition, the testing process itself is relatively immature and could have bugs or other issues that resulted in inaccurate scores. Because of this, the names of the beacon node software are not supplied with the data or results, instead each node is labelled ‘node A’, ‘node B’, ‘node C’ and ‘node D’ to differentiate them.

Results and discussion

The raw data for these results is available for viewing and analysis.

The first score to look at is that for immediate attestations. Immediate attestations are those generated in the slot immediately preceding the block in which they are included, for example an attestation for slot 1234 being included in block 1235. These attestations are the only ones that can provide a timely head vote, and as such could provide the most useful data for the chain. The average scores for the immediate attestations across all 300 proposed blocks are shown below:

Average scores for immediate attestations

Figure 1: Average scores for immediate attestations

This shows nodes A, B and C all returning approximately the same score, with node D notably lower. It appears that we are already removing one node from the picture due to its poor performance, but immediate attestations are only one factor to consider. Attestations that are included later than one slot after they are generated do not have the same maximum value to the chain due to it not being possible for them to have a timely head vote, but can have a higher achieved value. Attestations included up to 5 slots after they are generated can still provide a timely source vote, and those included up to 32 slots after they are generated can still provide a timely target vote. The average scores for the later attestations across all 300 proposed blocks are shown below:

Average scores for later attestations

Figure 2: Average scores for later attestations

Here nodes B and D are returning significantly higher scores than nodes A and C. Node D, in particular, makes up for its poorer performance with immediate attestations with these later attestations. But why are nodes returning such different scores for immediate and later attestations, when they presumably see much the same information?

To understand what is going on, a bit more information about attestations in general and aggregate attestations in particular is required. The data on which a validator votes, known as the attestation data, is shown below:

Attestation data

Figure 3: Attestation data

Each validator obtains this data from its beacon node, and votes on it4. What happens if two validators vote for the same data? A naive approach would be to include both attestations in the block, but with hundreds of thousands of validators voting every few minutes this would quickly create a massive amount of data to store, process and transmit. Instead, the common information is provided once and a tally of validators that agree with that data is provided in a simple array.

Aggregate attestation

Figure 4: Aggregate attestation

Aggregate attestations can themselves be aggregated, which should result in a highly efficient way of storing large numbers of attestations where there is a single set of data and a list of all of the validators that voted for that data. There is, however, an issue with this aggregation process. It can be seen above that the aggregate contains not only the array of agreeing validators, but also a composite signature. The specific details are not relevant for the purposes of this article, but it creates a requirement that a single validator can only be included once in an aggregate. To understand what this means, the figure below shows an attempt to aggregate two aggregate attestations with different attesting validators:

A successful aggregation

Figure 5: A successful aggregation

Here, an aggregate attestation containing data from the validators in the first, third and fourth elements of the array is aggregated with an aggregate attestation containing data from the second, fifth and seventh elements of the array. This aggregation can take place, and results in a single aggregate that has most elements filled. However, the following highlights a problem:

An unsuccessful aggregation

Figure 6: An unsuccessful aggregation

In this situation the third element is already present in both aggregate attestations, so they cannot be combined in the way that was shown in figure 5.

The aim of the beacon node, then, is to find the smallest number of aggregates that between them result in the largest number of agreeing validators. This is far from a simple problem to solve; the example below gives a much simplified version of the task the beacon node has to undertake.

Find the optimal set of aggregations

Figure 7: Find the optimal set of aggregations

There are various factors that complicate matters further. At the time of writing, there are 64 separate groups of validators5 that need to be aggregated independently, each group consisting of around 140 validators rather than the seven shown in the examples above. Beacon nodes receive a mix of partially (but not fully) aggregated attestations from other nodes on the network from which to start their own aggregation process, and there will often be multiple conflicting votes from different validators. All of these factors make aggregation a difficult task, which is itself a precursor to the packing problem.

There are a maximum of 128 attestations in a block. Because of this restriction, different beacon nodes take different approaches to packing attestations in to a block. Some may try building and rebuilding aggregates each time they receive updates from the network, others may wait for a suitable number of aggregates before attempting any additional work. Some may put together a quick but low-scoring set of aggregations for a block so they can respond to block requests immediately and then build incrementally, others may wait until the last moment to build a block. Some may consider that including as many immediate attestations is more important than including some, potentially higher-value, later attestations. This combination of factors results in a lot of flexibility in how beacon nodes generate their block proposals, and explains the difference in the scores between the nodes.

Returning to the block, the final piece of the overall block score is obtained from the sync committee aggregate. Each block contains a single sync committee aggregate, consisting of confirmations of the head of the chain by a subset of the validators.

Average scores for sync committee aggregate

Figure 8: Average scores for sync committee aggregate

The scores here are very close across all nodes. Given that including the optimal sync committee is a much simpler task than including attestations it is no surprise that all nodes give much the same result.

Putting these three scores for immediate attestations, later attestations and the sync committee aggregate together provides a clearer picture of the overall value of blocks produced.

Average combined score

Figure 9: Average combined score

Another way to look at this is to rate the highest-scoring node as 100% and see how the other nodes compare.

Average combined score (%)

Figure 10: Average combined score (%)

Here we can see that nodes B and D are both producing very good scores, with node C around 10% lower and node A more than 15% lower. Can we use this information to decide on which beacon node is best? Perhaps not, as these figures are averages. In reality each slot has a single validator proposing a block, and that validator has to pick one from the proposals given to it by the beacon nodes. Across the 300 slots in which the test validators proposed, how often did each node provide the best block?

Best block selection (%)

Figure 11: Best block selection (%)

Although the combined score chart in figure 10 showed that node B provided the best average returns, it only produces the highest-scoring block less than 20% of the time. By contrast, node D produces the highest-scoring block nearly 80% of the time. For most validating environments with a small number of validators, where the chance of proposing a block is low, they would in fact be better off using node D than node B.

Returning to the earlier discussion on aggregation and block packing strategies, it appears that there is no one node that is best in all situations. In a dynamic environment like the beacon chain this is no surprise, as optimizing for one situation often results in sub-optimal results for another. A beacon node that generates the best block under normal network conditions may struggle when there is a significant backlog of attestations, or a lack of finality in the network, or a block that arrives late, or various other conditions that are known to occur.

Selecting the best block

Given all of the above, ideally a validator client would be free to request a beacon block from multiple beacon nodes as required, select the best block from the candidates returned, and use that as the basis of the block proposal. This would allow each beacon node to produce the best block when network conditions favored it, and the validator client to always propose the best possible block. This is one of the features of Vouch, a multi-node validating client written by Attestant.

Average combined score (%) including Vouch

Figure 12: Average combined score (%) including Vouch

It should be clear to see that using Vouch in this way guarantees that it will always provide better results than using any single beacon node exclusively, which is borne out by the above graph.

Because Vouch treats each block proposal as an independent process it can adapt to upgrades in beacon nodes. For example, if a new release of beacon node C starts to generate more highest-scoring blocks Vouch will automatically select them when appropriate. Changes in the protocol itself, for example those in the merge, will likely result in more divergence between nodes as the amount of work they need to carry out to propose a block increases and more decisions with trade-offs are taken.

Summing up

Beacon nodes all carry out the same task, but differences in how they approach those tasks can have a significant impact on the help they give the network and the rewards they generate for validators. There is no single “best” beacon node, and it is important to benchmark your node to decide if it gives the best results for your particular situation.

The merge and subsequent changes to the Ethereum protocol will introduce new areas where beacon nodes will vary in terms of their performance. As such, any measurements should be revisited periodically to ensure that performance is maintained.

Vouch is capable of using multiple beacon nodes and selecting the best block proposal, ensuring that regardless of the trade-offs made by each individual beacon node the optimal block can always be proposed.


  1. The approximate number of active validators on the beacon chain at time of writing.↩︎

  2. A slot being a 12-second period of time as defined by the Ethereum beacon chain parameters.↩︎

  3. Source votes are always correct, as attestations with an incorrect source vote are invalid and not included in blocks.↩︎

  4. Technically, the vote is a signature of a hash of the data.↩︎

  5. Known as committees.↩︎

Share: