• Per connection authentication, not per request
    • User ID + password based authentication implemented for the gRPC API
    • Authentication must be refreshed after auth policy changes
  • Its functionality should be as simple and useful as v2
    • v3 provides a flat key space, unlike the directory structure of v2. Permission checking will be provided as interval matching.
  • It should have stronger consistency guarantees than v2 auth

Main required changes

  • Add permission information (user ID and authorized revision) to the Raft commands (etcdserverpb.InternalRaftRequest)
  • Every request is permission checked in the state machine layer, rather than API layer

The metadata for auth should also be stored and managed in the storage controlled by etcd’s Raft protocol like other data stored in etcd. It is required for not sacrificing availability and consistency of the entire etcd cluster. If reading or writing the metadata (e.g. permission information) needs an agreement of every node (more than quorum), single node failure can stop the entire cluster. Requiring all nodes to agree at once means that checking ordinary read/write requests cannot be completed if any cluster member is down, even if the cluster has an available quorum. This unanimous scheme ultimately degrades cluster availability; quorum based consensus from raft should suffice since agreement follows from consistent ordering.

The authentication mechanism in the etcd v2 protocol has a tricky part because the metadata consistency should work as in the above, but does not: each permission check is processed by the etcd member that receives the client request (etcdserver/api/v2http/client.go), including follower members. Therefore, it’s possible the check may be based on stale metadata.

This staleness means that auth configuration cannot be reflected as soon as operators execute etcdctl. Therefore there is no way to know how long the stale metadata is active. Practically, the configuration change is reflected immediately after the command execution. However, in some cases of heavy load, the inconsistent state can be prolonged and it might result in counter-intuitive situations for users and developers. It requires a workaround like this:

Inconsistent permissions are unsafe for linearized requests

Inconsistent authentication state is most serious for writes. Even if an operator disables write on a user, if the write is only ordered with respect to the key value store but not the authentication system, it’s possible the write will complete successfully. Without ordering on both the auth store and the key-value store, the system will be susceptible to stale permission attacks.

At first, a client must create a gRPC connection only to authenticate its user ID and password. An etcd server will respond with an authentication reply. The response will be an authentication token on success or an error on failure. The client can use its authentication token to present its credentials to etcd when making API requests.

The client connection used to request the authentication token is typically thrown away; it cannot carry the new token’s credentials. This is because gRPC doesn’t provide a way for adding per RPC credential after creation of the connection (calling grpc.Dial()). Therefore, a client cannot assign a token to its connection that is obtained through the connection. The client needs a new connection for using the token.

Notes on the implementation of Authenticate() RPC

Authenticate() RPC generates an authentication token based on a given user name and password. etcd saves and checks a configured password and a given password using Go’s bcrypt package. By design, ‘s password checking mechanism is computationally expensive, taking nearly 100ms on an ordinary x64 server. Therefore, performing this check in the state machine apply phase would cause performance trouble: the entire etcd cluster can only serve almost 10 Authenticate() requests per second.

For good performance, the v3 auth mechanism checks passwords in etcd’s API layer, where it can be parallelized outside of raft. However, this can lead to potential time-of-check/time-of-use (TOCTOU) permission lapses:

  1. client A sends a request Authenticate()
  2. the API layer processes the password checking part of Authenticate()
  3. another client B sends a request of ChangePassword() and the server completes it
  4. the state machine layer processes the part of getting a revision number for the Authenticate() from A
  5. the server returns a success to A
  6. now A is authenticated on an obsolete password

Resolving a token in the API layer

After authenticating with , a client can create a gRPC connection as it would without auth. In addition to the existing initialization process, the client must associate the token with the newly created connection. grpc.WithPerRPCCredentials() provides the functionality for this purpose.

Every authenticated request from the client has a token. The token can be obtained with grpc.metadata.FromIncomingContext() in the server side. The server can obtain who is issuing the request and when the user was authorized. The information will be filled by the API layer in the header (etcdserverpb.RequestHeader.Username and etcdserverpb.RequestHeader.AuthRevision) of a raft log entry (etcdserverpb.InternalRaftRequest).

The auth info in is checked in the apply phase of the state machine. This step checks the user is granted permission to requested keys on the latest revision of auth store.

Two types of tokens: simple and JWT

There are two kinds of token types: simple and JWT. The simple token isn’t designed for production use cases. Its tokens aren’t cryptographically signed and servers must statefully track token-user correspondence; it is meant for development testing. JWT tokens should be used for production deployments since it is cryptographically signed and verified. From the implementation perspective, JWT is stateless. Its token can include metadata including username and revision, so servers don’t need to remember correspondence between tokens and the metadata.

The etcd v3 model requires multiple lookup of the metadata unlike the file system like systems. The worst case lookup cost will be sum the user’s total granted keys and intervals. The cost cannot be avoided because v3’s flat key space is completely different from Unix’s file system model (every inode includes permission metadata). Practically the cost won’t be a serious problem because the metadata is small enough to benefit from caching.