Jim McDonald25 Aug 2020

Introducing Dirk

Attestant uses their own distributed remote keymanager, providing high levels of security and availability whilst fitting neatly in to production environments. This article introduces Dirk and explains its features.

In a previous article, we discussed the protection of Ethereum 2 validator keys and features of a system that would provide both security and availability to validators. The software that provides this security and availability is called a keymanager. A keymanager holds the keys required by validators to attest for the Ethereum 2 network, signing information when requested and protecting against malicious actors attempting to steal or use the keys in ways that would be to the detriment of the validator.

This article introduces Dirk¹, a distributed remote keymanager built by Attestant.

Why use a dedicated keymanager?

Most Ethereum 2 validator clients have key management built in to them, providing mechanisms to create keys and use them to attest. So what is to be gained by using a dedicated keymanager over that supplied with a validator client?

Validating and managing keys are two separate tasks. If a single piece of software carries out both tasks it is, by definition, more complex than a piece of software that carries out just one of the tasks. By dedicating Dirk to carrying out key management alone it is easier to manage, has lower code complexity, and is focused on doing its only job.

This separation also forces a cleaner architecture, as all communication is carried out over well-defined interfaces between Dirk and the validator clients. This benefits the validator client, as well: if the validator client does less there is less that can go wrong.

There are also security benefits: any fault in the validator client cannot expose the validator keys if it does not hold them. Another significant benefit of separating validating from key management is that access to the physical server containing the keys can be more strictly controlled. Operators who require access to the validator client no longer have access to keys as well, and the security of the key management server can be increased as a result.

Furthermore, if the validator client and keymanager are separate pieces of software, the validator client becomes easier to use in a production infrastructure. A validator client without keys has very similar configuration across multiple instances, and can more easily be used with modern software packaging, deployment and management systems such as Docker and Kubernetes.

Consider a simple active/passive architecture, where two servers each run a validator client. If both validator clients contain keys, then to prevent slashing events only one should be active at any given time. This can be achieved through a variety of monitoring and scripts, but inevitably results in a complicated and fragile infrastructure. Such an architecture can be simplified if the validator client does not contain any keys: there is no longer a strict requirement for only one validator client to be active at a time, as the keymanager can decide which of the validator clients is allowed to sign a given request. Failover from one validator client to another can be much simplified, as can normal operational procedures such as upgrading software or rebooting hardware.

Some users may consider that these benefits are not worth the additional work to deploy and manage a separate key management process, but for those that want these benefits, for example those with large stakes or who are running a service for others, software such as Dirk will allow them to achieve the highest levels of security and availability.

Principles of operation

When designing software it is important to understand its focus: what is this software trying to achieve? In Dirk's case the following principles of operation apply:

security: providing high levels of security for users' keys
availability: providing high levels of availability for signing operations
visibility: providing high levels of visibility into the result and performance of operations undertaken

Dirk focuses on these three principles above all else. This not only informs what is available, but what is not available in the product. Avoiding "feature creep" is critical, as each feature adds complexity and potentially weakens other parts. With Dirk, a feature will only be added if it benefits one of the principles; if not it is discarded as out of scope.

What is Dirk?

Dirk is Attestant's distributed remote keymanager, acting as a gateway between the internet and wallets.

Figure 1: Dirk as a gateway

As discussed earlier, such such a gateway is not strictly necessary but Dirk provides a number of features that simply will not be available if the wallet is on a local filesystem, leading to higher security and availability.

Network access

Network access allows Dirk to provide access to the same wallet, or wallets, for multiple validator clients, without the user needing to know details of the wallet beyond its name.

Figure 2: Network-enabled access

This layer of abstraction is very useful for a system such as Ethereum 2, where current support for hardware wallets is limited and it is currently unclear how support will be implemented: as Dirk adds hardware wallet support it becomes available to users without them having to alter their requesting process.

Permissioned operations

Network access is good, but must be carefully controlled to ensure that only authorized users have the ability to sign attestations with the keys Dirk holds. Dirk controls access to operations based on the identity of the client making the request, the type of request, and the account. A client could be a validator client, a separate administration process (in the case of signing deposits for new validators), or a manual human-initiated process (in the case of requesting the validator to stop attesting).

Clients that want to talk to Dirk over the network require a proof of their identity, known as a certificate, when sending requests. The Dirk administrator² issues these certificates to clients that want to talk to Dirk, and the clients must provide the certificate every time they make a request. This means that whenever Dirk receives a request it has a high degree of assurance that it knows the identity of the client³.

Figure 3: Client request

(In this and future examples the diagram shows a single Dirk instance with three wallets called "Validators", "Test", and "Withdrawal". Each wallet holds a number of accounts, with each account containing a single key.)

Here in figure 3, a client identified as "Client 1" by its certificate is requesting Dirk to sign some data using the account "Validators/00001". Even though Dirk knows the identity of the client that doesn't mean it will carry out any request that is made of it. Instead, Dirk uses an access control list to decide which actions it should allow:

Figure 4: Access control list

The access control list describes which clients can carry out which actions with which wallets⁴. Every request is checked against the access control list before Dirk proceeds. If the action is on the list it is allowed:

Figure 5: Access allowed

In the above case the access control list states that "Client 1" is allowed to request signing on any account in the "Validators" wallet, so the request is accepted. Conversely, if the action is not on the list it is denied:

Figure 6: Access denied

In the above case the access control list does not state that "Client 1" is allowed to request signing on an account in the "Withdrawal" wallet, so the request is denied.

Access control in Dirk requires explicit permission: in its default configuration it will refuse all requests, ensuring that an unconfigured server will not provide accidental access to accounts.

Slashing protection

Because Dirk is a full server it can carry out computation against signing requests, and store information about signing requests it has carried out in the past. This allows it to provide slashing protection, securing validator accounts against accidental or malicious slashing events.

Every signing request is passed through a slashing protection process that decides if Dirk should proceed or not:

Figure 7: Slashing protection process

If the slashing protection process decides the request could not cause a slashing event it will inform Dirk that the request can proceed:

Figure 8: Allowing a non-slashable signing request

If however the slashing protection process decides the request carries with it a risk of slashing it will inform Dirk that the request should not proceed:

Figure 9: Denying a slashable signing request

In a properly-functioning attesting environment it is not expected that Dirk will receive any requests that could result in slashable events, however defence in depth is an important part of any secure setup and Dirk's slashing protection increases security for end users. It also increases operational flexibility, for example allowing a "warm standby" configuration without having to worry about two validators running at the same time and generating slashable attestations.

Because slashing protection requires information to be written to disk after every signature it can take significantly longer to run. To give an idea of what this means, below are some real-world figures running Dirk against the Medalla testnet:

Action Unprotected Protected

Block proposal 2ms 17ms Block attestation 2ms 28ms

Whilst simply signing a proposal or attestation without slashing protection takes a couple of milliseconds, adding in slashing protection significantly increases these numbers. However, most of this additional time is spent in ensuring that the protection information is written safely to disk. It is important, therefore, to use fast storage when running Dirk⁵.

Threshold signing

A single instance of Dirk is susceptible to failure, as well as providing a single target for attackers to attempt to gain access to Dirk's private keys. This is mitigated by creating multiple instances of Dirk and using threshold signing. Threshold signing was covered in a previous article; here it is sufficient to note that it allows a strict subset of Dirk instances to create a valid signature. For example, in a $\frac{2}{3}$ threshold signing configuration any 2 of the 3 instances can provide a response for the client.

In normal operation, a client will contact all of the Dirk instances for signatures. As soon as two instances have responded, the client will be able to construct a valid signature.

Figure 10: Threshold signing

If one of the Dirk instances is unavailable, for example if there is a network issue that stops the client from talking to an instance, it is still possible to obtain responses from the other instances and carry on as usual.

Figure 11: Threshold signing with one Dirk instance unavailable

The above example allows for 1 instance to be unavailable without problems; if a higher level of availability is desired then more instances can be used to provide thresholds of $\frac{3}{5}$ , $\frac{4}{7}$ , or even higher. In general it is possible for half of the instances (rounded down) to be unavailable and still provide full functionality without losing security.

Distributed key generation

Threshold signing commonly starts with a known secret key, which means that the process is susceptible to being hijacked. Distributed key generation avoids this issue by allowing keys to be created in a collaborative manner, where each Dirk instance retains its own secret key throughout the process. This removes the single point of failure with the setup process for threshold signing, significantly increasing the security of the overall system.

Designed for production

Dirk is designed to run in a production environment, and as such it has been designed to make life as easy as possible.

Configuration

All configuration options are available through environment variables, command-line flags or a dedicated configuration file.

Secrets management

Dirk uses Majordomo for management of secrets. Majordomo allows secure data such as keystore passphrases to be held on a remote system, for example Google Secret Manager or AWS Secrets Manager. Central control of these secrets brings additional security and allows them to be revoked or altered easily and remotely.

Metrics

In operational terms, a product is only as good as the ability to monitor it. Dirk provides metrics focused on providing the information that an operations department needs to know. They provide a focus on activity and performance, tracking each signing activity and presenting the aggregate metrics in the standard Prometheus format. The metrics have been designed to allow easy aggregation across Dirk instances, as well as different signing request types and results, resulting in the ability to provide operations dashboards and alerting systems that zero in on the information you need to ensure your signing infrastructure is operating correctly and in a timely fashion.

Figure 12: Sample Grafana dashboard monitoring 5 Dirk instances

Release methodology

A strict release methodology is a great help when attempting to run a production infrastructure. Dirk will have a clear separation between patch releases and feature releases, to avoid situations where upgrading to a new release to patch a bug also involves configuration changes, API alterations, and other items that make what should be a quick fix turn into a major operational headache.

Figure 13: Release methodology

This methodology follows semantic versioning, and ensures that users can separate out bug fixes from functional upgrades.

Summary

Dirk is available now on the Attestant github site, where further information is available about its release status as well as instructions on how to install and use it in various configurations. It is available under a permissive open source license, allowing use in a wide variety of environments. The community for Dirk is in the #dirk channel of the Attestant Discord.

Dirk provides distributed remote key management for Ethereum 2 keys, and brings an operational focus to signing operations. It is an important part of Attestant's infrastructure: we are excited to share it with you, and look forward to making more elements of our infrastructure available in the near future.

Distributed remote keymanager. ↩
Here, administrator means whomever is running the Dirk instance. ↩
It is possible for certificates to be stolen, however additional layers of security allow revocation of stolen certificates. ↩
Access controls can also be defined for specific accounts within a wallet if required. ↩
Note, however, the signing process is not usually the slowest component in proposing or attesting blocks, so speeding up the servers for the beacon node and validator client usually provides better results. ↩

Ethereum consensus layer
Key management