Instrumenting a library

Overview

To instrument a library, you’ll need to understand and create the following elements:

Core data structures - the information that is collected and sent to Zipkin
Trace identifiers - what tags for the information are needed so it can be reassembled in a logical order by Zipkin
- Generating identifiers - how to generate these IDs and which IDs should be inherited
- Communicating trace information - additional information that is sent to Zipkin along with the traces and their IDs.
Timestamps and duration - how to record timing information about an operation.Alright, ready? Here we go.

Core data structures

Core data structures are documented in detail in Thrift comments. Here’s a high-level description to get you started:

Annotation

An Annotation is used to record an occurrence in time. There’s a set of coreannotations used to define the beginning and end of an RPC request:

cs - Client Send. The client has made the request. This sets thebeginning of the span.
sr - Server Receive: The server has received the request and will startprocessing it. The difference between this and will be combination ofnetwork latency and clock jitter.
ss - Server Send: The server has completed processing and has sent therequest back to the client. The difference between this and sr will be theamount of time it took the server to process the request.
cr - Client Receive: The client has received the response from the server.This sets the end of the span. The RPC is considered complete when thisannotation is recorded.

When using message brokers instead of RPCs, the following annotations helpclarify the direction of the flow:

ms - Message Send: The producer sends a message to a broker.
mr - Message Receive: A consumer received a message from a broker.

Unlike RPC, messaging spans never share a span ID. For example, each consumerof a message is a different child span of the producing span.

Other annotations can be recorded during the request’s lifetime in order toprovide further insight. For instance adding an annotation when a server beginsand ends an expensive computation may provide insight into how much time isbeing spent pre and post processing the request versus how much time is spentrunning the calculation.

BinaryAnnotation

Binary annotations do not have a time component. They are meant to provide extrainformation about the RPC. For instance when calling an HTTP service, providingthe URI of the call will help with later analysis of requests coming into theservice. Binary annotations can also be used for exact match search in theZipkin Api or UI.

EndpointAnnotations and binary annotations have an endpoint associated with them. With twoexceptions, this endpoint is associated with the traced process. For example, theservice name drop-down in the Zipkin UI corresponds with Annotation.endpoint.serviceNameor BinaryAnnotation.endpoint.serviceName. For the sake of usability, the cardinalityof Endpoint.serviceName should be bound. For example, it shouldn’t include variablesor random numbers.

Span

A set of Annotations and BinaryAnnotations that correspond to a particular RPC.Spans contain identifying information such as traceId, spanId, parentId, andRPC name.

Spans are usually small. For example, the serialized form is often measured inKiB or less. When spans grow beyond orders of KiB, other problems occur, such ashitting limits like Kafka message size (1MiB). Even if you can raise messagelimits, large spans will increase the cost and decrease the usability of thetracing system. For this reason, be conscious to store data that helps explainsystem behavior, and don’t store data that doesn’t.

Trace

A set of spans that share a single root span. Traces are built by collecting allSpans that share a traceId. The spans are then arranged in a tree based onspanId and parentId thus providing an overview of the path a request takesthrough the system.

Trace identifiers

In order to reassemble a set of spans into a full trace three pieces ofinformation are required. Trace identifiers can be 128-bit, but span identifierswithin a trace are always 64-bit.

Trace Id

The overall 64 or 128-bit ID of the trace. Every span in a trace shares this ID.

Span Id

The ID for a particular span. This may or may not be the same as thetrace id.

Parent Id

This is an optional ID that will only be present on child spans. That is thespan without a parent id is considered the root of the trace.

Let’s walk through how Spans are identified.

When an incoming request has no trace information attached, we generate a randomtrace ID and span ID. The span ID can be reused as the lower 64-bits of thetrace ID, but it can also be completely different.

If the request already has trace information attached to it, the service shoulduse that information as server receive and server send events are part of thesame span as the client send and client receive events

If the service calls out to a downstream service a new span is created as achild of the former span. It is identified by the same trace id, a new span id,and the parent id is set to the span id of the previous span. The new span idshould be 64 random bits.

Communicating trace information

Trace information needs to be passed between upstream and downstream services inorder to reassemble a complete trace. Five pieces of information are required:

Trace Id
Span Id
Parent Id
Sampled - Lets the downstream service know if it should record traceinformation for the request.
Flags - Provides the ability to create and communicate feature flags. This is howwe can tell downstream services that this is a “debug” request.

Check for the format.

Finagle provides mechanisms for passing this information with HTTP and Thriftrequests. Other protocols will need to be augmented with the information fortracing to be effective.

Instrumentation sampling decisions are made at the edge of the system

Downstream services must honour the sampling decision of the upstream system. Ifthere’s no “Sampled” information in the incoming request, the library shouldmake a decision on whether to sample this request, and include the decision infurther downstream requests. This simplifies the math when it comes tounderstanding what’s sampled and what isn’t. It also ensures that a request iseither fully traced, or not traced at all, making the sampling policy easier tounderstand and configure.

Note that the debug flag will force a trace to be sampled, regardless of anysampling rules. The debug flag also applies to storage tier sampling, which isconfigured on the server side of Zipkin.

HTTP Tracing

HTTP headers are used to pass along trace information.

The B3 portion of the header is so named for the original name of Zipkin:BigBrotherBird.

Ids are encoded as :

X-B3-TraceId: 128 or 64 lower-hex encoded bits (required)
X-B3-SpanId: 64 lower-hex encoded bits (required)
X-B3-Sampled: Boolean (either “1” or “0”, can be absent)
X-B3-Flags: “1” means debug (can be absent)

For more information on B3, please see its specification.

Thrift Tracing

Finagle clients and servers negotate whether they can handle extra informationin the header of the thrift message when a connection is established. Oncenegotiated trace data is packed into the front of each thrift message.

Timestamps and duration

Span recording is when timing information or metadata is structured and reportedto zipkin. One of the most important parts of this process is appropriatelyrecording timestamps and duration.

Timestamps are microseconds

All Zipkin timestamps are in epoch microseconds (not milliseconds). This valueshould use the most precise measurement available. For example, clock_gettimeor simply multiply epoch milliseconds by 1000. Timestamps fields are stored as64bit signed integers eventhough negative is invalid.

Microsecond precision primarily supports “local spans”, which are in-processoperations. For example, with higher precision, you can tell nuances of whathappened before something else.

All timestamps have faults, including clock skew between hosts and the chance ofa time service resetting the clock backwards. For this reason, spans shouldrecord their duration when possible.

Span duration is also microseconds

While it is possible to get nanosecond-precision timing information, Zipkin usesmicrosecond granularity. Here are some reasons why:

First, using the same unit as timestamps makes math easier. For example, if youare troubleshooting a span, it is easier to identify with terms in the same unit.

Next, the overhead of recording a span is often variable and can be microsecondsor more: suggesting a higher resolution than overhead can be distracting.

Future versions of Zipkin may revisit this topic, but for now, everything ismicroseconds.

When to set Span.timestamp and duration

Span.timestamp and duration should only be set by the host that started the span.

The simplest logic is generally this:

In this case, the client started the span, so it should record Span.timestamp andduration, and those values should match the difference between “cs” and “cr”. Theserver did not start this span, so it should not set Span.timestamp or duration.

Another common case is when a server starts a root span from an uninstrumentedclient, such as a web browser. It knows it should start a trace because none waspresent in B3 headers or similar. Since it started the trace, it should recordSpan.timestamp and duration on the root span.

Note: When a span is incomplete, you could set Span.timestamp, but not duration asthere’s not enough information to do that accurately.

What happens when Span.timestamp and duration are not set?

Span.timestamp and Span.duration are fields added in 2015, 3 years after Zipkinstarted. Not all libraries log these. When these fields are not set, Zipkin addsthem at query time (not collection time); this is not ideal.

The duration querywill not work as there’s no data to query. Also, local (in-process) spans aren’t requiredto have annotations, so they cannot be queried unless their timestamp is set.

When duration isn’t set by instrumentation, Zipkin tries to derive duration at query time,it has to use the problematic method of timestamp math. Ex. if an NTP update happened insidethe span, the duration Zipkin caculates will be wrong.

Finally, there’s a desire for many to move to single-host spans. The migration pathtowards this is to split dual-host RPC spans into two. When instrumentation logs timestamponly for spans it owns, splitting collectors have a heuristic to distinguish a server-initiatedroot span from a client-initiated, dual-host one.

The bottom-line is that choosing not to record Span.timestamp and duration will resultin less accurate data and less functionality. Since it is very easy to record these authoritativelybefore reporting, all Zipkin instrumentation should do it or ask someone to help them do it.

One-way RPC Tracing

One-way is the same as normal RPC tracing, except there is no response anticipated.

In normal RPC tracing 4 annotations are used: “cs” “sr” (request) then “ss” “cr” (response).In one-way tracing, the first two are used “cs” “sr” as there is no response returned to the caller.

So, the client adds “cs” to a span and reports it to zipkin. Then, the server adds “sr” to thesame span and reports it. Neither side add Span.timestamp or duration because neither side know bothwhen the span started and finished.

Here’s a diagram of one-way RPC tracing:

   Client Tracer                                      Server Tracer
+------------------+                               +------------------+
| +--------------+ |     +-----------------+       | +--------------+ |
| | TraceContext |======>| Request Headers |========>| TraceContext | |
| +--------------+ |     +-----------------+       | +--------------+ |
+--------||--------+                               +--------||--------+
   start ||                                                 ||
         \/                                          finish ||
span(context).annotate("cs")                                \/
                                             span(context).annotate("sr")

Here’s an example of this process using the Brave Tracer:

Client side:

Server side:

// Parse the span from request headers
TraceContextOrSamplingFlags result =
    tracing.propagation().extractor(Request::getHeader).extract(request);
// Reuse the same span ids by joining that context
// start the server side and flush instead of processing a response
span.kind(Span.Kind.SERVER)
    .start().flush();
// The above will report to zipkin trace identifiers, a "sr" annotation with the
// endpoint of the server

The above flow assumes a tracer can “flush” a span, which simply sends the spanto Zipkin without attempting to calculate duration locally.

Message Tracing

Message Tracing is different than RPC tracing because the producer and consumerdon’t share span IDs.

In normal RPC tracing, client and server annotations go on the same span. Thisdoesn’t work for messaging because there may be multiple consumers for a givenmessage. The trace context propagated to the consumer is the parent.

Similar to one-way RPC tracing, messaging tracing doesn’t have a response path:only two annotations are used “ms” and “mr”. Unlike one-way RPC tracing, it isfine to set Span.timestamp and duration as the producer and each consumer useseparate spans.

So, the producer adds “ms” to a span and reports it to zipkin. Then, eachconsumer creates a child span adding “mr” to it.

Here’s a diagram of Message tracing:

Here’s an example of this process using the Brave Tracer:

Producer side:

// Add trace identifiers to the outbound span
       .inject(span.context(), message);
producer.send(message);
// start and finish the producer side
span.kind(Span.Kind.PRODUCER)
    .remoteEndpoint(broker.endpoint())
    .start().finish();
// The above will report to zipkin trace identifiers, a "ms" annotation with the
// endpoint of the producer, and a "ma" (Message Address) with the endpoint of
// the broker

Consumer side:

Many consumers act in bulk, receiving many messages at the same time. It may behelpful to inject each consumer span’s trace context into its correspondingmessage headers. This allows a processor to create a child later, at the rightplace in the trace tree.

public ConsumerRecords<K, V> poll(long timeout) {
  ConsumerRecords<K, V> records = delegate.poll(timeout);
  for (ConsumerRecord<K, V> record : records) {
    handleConsumed(record);
  }
  return records;
}
void handleConsumed(ConsumerRecord record) {
  // notifies zipkin the record arrived
  Span span = startAndFinishConsumerSpan(record);
  // allows a processor to see the parent ID (the consumer trace context)
  injector.inject(span.context(), record.headers());
}