Getting started with OpenTelemetry and distributed tracing in .NET Core
Just show me the code
As always if you don’t care about the post I have upload the source code on my Github
A few months ago the first stable version of the OpenTelemetry client for dotnet was released and since then I have wanted to write a little bit about it.
OpenTelemetry is a set of APIs, SDKs, tooling and integrations that are designed for the creation and management of telemetry data such as traces, metrics, and logs. Today I’m going to focus on tracing.
From my standpoint, there are at least four main concepts about tracing and OpenTelemetry that you need to know.
Span and activities
A span is the building block that forms a trace, it has a unique identifier and represents a piece of the workflow in the distributed system.
Multiple spans are pieced together to create a trace. Traces are often viewed as a “tree” of spans that reflects the time that each span started and completed.
In dotnet a Span is represented by an Activity.
The OpenTelemetry client for dotnet is reusing the existing Activity and associated classes to represent the OpenTelemetry Span.
This means that users can instrument their applications/libraries to emit OpenTelemetry compatible traces by using just the .NET Runtime.
To create a Span in .NET we must first create a new activity:
private static readonly ActivitySource Activity = new(nameof(RabbitRepository));
And then call “StartActivity” to begin recording, everything that happens inside the using block will be recorded into that Span.
using (var activity = Activity.StartActivity("Process Message", ActivityKind.Consumer, parentContext.ActivityContext)){}
Propagators
A propagator allows us to extract and inject context across process boundaries.
This is typically required if you are not using any of the .NET communication libraries which has instrumentations already available which does the propagation (eg: HttpClient).
In such cases, context extraction and propagation is the responsibility of the library itself.
To create a propagator in .NET we must first create a TextMapPropagator
private static readonly TextMapPropagator Propagator = new TraceContextPropagator();
Then use the Inject and Extract methods for inter-process trace propagation.
The “Inject” method injects the Activity into a carrier. For example, into the headers of an HTTP request.
Propagator.Inject(new PropagationContext(activity.Context, Baggage.Current), props, InjectContextIntoHeader);
And the “Extract” method extracts the value from an incoming request. For example, from the headers of an HTTP request.
If a value can not be parsed from the carrier, for a cross-cutting concern, the implementation should not throw an exception and should not store a new value in the Context, in order to preserve any previously existing valid value.
var parentContext = Propagator.Extract(default, ea.BasicProperties, ExtractTraceContextFromBasicProperties);
Exporters
Let’s be honest emiting traces is kind of pointless if you don’t have a backend capable of aggregating the traces and displaying those traces in a friendly manner.
An exporter is how data gets sent to those different back-ends.
Generally, an exporter translates the internal format into another defined format.
These are the most well-known available trace exporters:
- Jaeger
- OTLP gRPC
- OTLP HTTP
- Zipkin
In this post I’m going to use Jaeger.
Attributes
Attributes are key:value pairs that provide additional information to a trace.
In .NET those are called Tags. We can add an attribute into an Activity like this:
activity?.SetTag("messaging.system", "rabbitmq");
activity?.SetTag("messaging.destination_kind", "queue");
activity?.SetTag("messaging.rabbitmq.queue", "sample");
I’m going to stop talking about OpenTelemetry theory because I could end up writing and entire post about it.
And to be honest I would end up paraphrasing of the official docs, so instead of that I’m going to point you towards the official documentation: https://opentelemetry.io/docs/concepts/
After talking a little bit about OpenTelemetry let me refocus on the real purpose of this post.
In this post I want to show you a practical example about how you can start using OpenTelemetry when you have a series of .NET services that communicate amongst them.
Example
I have built 4 apps beforehand, these apps are not doing any business logic operations because that’s not the goal here.
The important part is that all the apps are talking among them and also using other external services like Redis, MSSQL or Rabbit.
Here’s the diagram:
-
App1.WebApi is a NET6 Web API with 2 endpoints.
- The /http endpoint makes an HTTP request to the App2 "/dummy" endpoint.
- The /publish-message endpoint queues a message into a Rabbit queue named “sample”.
-
App2.RabbitConsumer.Console is a NET6 console application.
- Dequeues messages from the Rabbit “sample” queue and makes a HTTP request to the App3 "/sql-to-event" endpoint with the content of the message.
-
App3.WebApi is a NET6 Web API with 2 endpoints
- The /dummy endpoint returns a fixed “Ok” response.
- The /sql-to-event endpoint receives a message via HTTP POST, stores it in a MSSQL Server and afterwards publishes the message as an event into a RabbitMq queue named “sample_2”.
-
App4.RabbitConsumer.HostedService is a NET6 Worker Service.
- A Hosted Service reads the messages from the Rabbitmq “sample_2” queue and stores it into a Redis cache database.
Those apps are not using OpenTelemetry right now, so in the next sections we’re going to do a step-by-step guide about how to setup and use the OpenTelemetry client.
OpenTelemetry .NET Client
To get started with OpenTelemetry we’re going to need the following packages.
<PackageReference Include="OpenTelemetry" Version="1.2.0-rc1" />
<PackageReference Include="OpenTelemetry.Exporter.Jaeger" Version="1.2.0-rc1" />
<PackageReference Include="OpenTelemetry.Extensions.Hosting" Version="1.0.0-rc8" />
<PackageReference Include="OpenTelemetry.Instrumentation.AspNetCore" Version="1.0.0-rc8" />
<PackageReference Include="OpenTelemetry.Instrumentation.Http" Version="1.0.0-rc8" />
<PackageReference Include="OpenTelemetry.Instrumentation.SqlClient" Version="1.0.0-rc8" />
<PackageReference Include="OpenTelemetry.Instrumentation.StackExchangeRedis" Version="1.0.0-rc8" />
- The OpenTelemetry package is the core library.
- The OpenTelemetry.Exporter.Jaeger package allows us to export the traces to Jaeger.
- The OpenTelemetry.Extensions.Hosting package contains some extensions that allows us to add the dependencies into the DI container and configure the HostBuilder.
- The OpenTelemetry.Instrumentation.* packages are instrumentation libraries. These packages are instrumenting common libraries/functionalities/classes so we don’t have to do all the heavy lifting by ourselves. In our example we’re using the following ones:
- The OpenTelemetry.Instrumentation.AspNetCore instruments ASP.NET Core and collects telemetry about incoming web requests. This instrumentation also collects incoming gRPC requests using Grpc.AspNetCore.
- The OpenTelemetry.Instrumentation.Http instruments the System.Net.Http.HttpClient and System.Net.HttpWebRequest types and collects telemetry about outgoing HTTP requests.
- The OpenTelemetry.Instrumentation.SqlClient instruments the Microsoft.Data.SqlClient and System.Data.SqlClient types and collects telemetry about database operations.
- The OpenTelemetry.Instrumentation.StackExchangeRedis instruments the StackExchange.Redis type and collects telemetry about outgoing calls to Redis.
In the near future I expect to see more and more instrumentation libraries like these ones so we can instrument the most common dependencies with no effort at all, just install a nuget, add some config lines in the Startup and you’re good to go.
Adding OpenTelemetry on App1
1 . Setup the OpenTelemetry library
services.AddOpenTelemetryTracing(builder =>
{
builder.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.AddSource(nameof(PublishMessageController))
.SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("App1"))
.AddJaegerExporter(opts =>
{
opts.AgentHost = Configuration["Jaeger:AgentHost"];
opts.AgentPort = Convert.ToInt32(Configuration["Jaeger:AgentPort"]);
opts.ExportProcessorType = ExportProcessorType.Simple;
});
});
This is the first app we’re instrumenting so I’m going to explain a little bit what we’re doing line by line
AddAspNetCoreInstrumentation()
Enables NET Core instrumentation.
AddHttpClientInstrumentation()
Enable HTTP Instrumentation.
The Api 1 makes an HTTP request to the App2, if we want to trace the HTTP call between these 2 apps we can do it simply by adding this extension method.
.AddSource(nameof(PublishMessageController))
The AddSource
method can be used to add a ActivitySource to the provider. Multiple AddSource can be called to add more than one span.
Why we need this? The Api 1 queues a message into a rabbit queue and we want to record this trace.
In the former paragraph we have seen that HTTP requests has built-in instrumentation via method extension, that is not the case for the RabbitMQ dependency. In order to continue the distributed transaction, we must create a new Activity.
SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("App1"))
A Resource is the immutable representation of the entity producing the telemetry.
With the SetResourceBuilder
method we’re configuring the Resource for the application.
AddJaegerExporter(opts =>
{
opts.AgentHost = Configuration["Jaeger:AgentHost"];
opts.AgentPort = Convert.ToInt32(Configuration["Jaeger:AgentPort"]);
});
This method sets Jaeger as the exporter where all the traces are going to be sent.
2 . Instrument dependency calls
Unlike the HTTP request, OpenTelemetry does not yet have support for automatic RabbitMq trace correlation.
The code below demonstrates how the “publish message” trace can be created. The code snippet adds the trace information to the enqueued message header, which will later be used to link both operations.
Specifically we are using a Propagator to inject the activity into the message header that is going to be queued to Rabbit, afterwards the consumer application will use another Propagator to extract the Activity and link the producer activity with the consumer activity.
Also we are also using the Tags attribute to store relevant metadata into the Activity.
[ApiController]
[Route("publish-message")]
public class PublishMessageController : ControllerBase
{
private static readonly ActivitySource Activity = new(nameof(PublishMessageController));
private static readonly TextMapPropagator Propagator = Propagators.DefaultTextMapPropagator;
private readonly ILogger<PublishMessageController> _logger;
private readonly IConfiguration _configuration;
public PublishMessageController(
ILogger<PublishMessageController> logger,
IConfiguration configuration)
{
_logger = logger;
_configuration = configuration;
}
[HttpGet]
public void Get()
{
try
{
using (var activity = Activity.StartActivity("RabbitMq Publish", ActivityKind.Producer))
{
var factory = new ConnectionFactory { HostName = _configuration["RabbitMq:Host"] };
using (var connection = factory.CreateConnection())
using (var channel = connection.CreateModel())
{
var props = channel.CreateBasicProperties();
AddActivityToHeader(activity, props);
channel.QueueDeclare(queue: "sample",
durable: false,
exclusive: false,
autoDelete: false,
arguments: null);
var body = Encoding.UTF8.GetBytes("I am app1");
channel.BasicPublish(exchange: "",
routingKey: "sample",
basicProperties: props,
body: body);
}
}
}
catch (Exception e)
{
_logger.LogError("Error trying to publish a message", e);
throw;
}
}
private void AddActivityToHeader(Activity activity, IBasicProperties props)
{
Propagator.Inject(new PropagationContext(activity.Context, Baggage.Current), props, InjectContextIntoHeader);
activity?.SetTag("messaging.system", "rabbitmq");
activity?.SetTag("messaging.destination_kind", "queue");
activity?.SetTag("messaging.rabbitmq.queue", "sample");
}
private void InjectContextIntoHeader(IBasicProperties props, string key, string value)
{
try
{
props.Headers ??= new Dictionary<string, object>();
props.Headers[key] = value;
}
catch (Exception ex)
{
_logger.LogError(ex, "Failed to inject trace context.");
}
}
}
Adding OpenTelemetry on App2
1 . Setup the OpenTelemetry library
The App2 is a console app so we have to setup the library in a different way, instead of using an IServiceCollection extension we’re creating directly a TraceProvider. Apart from that the setup looks pretty much the same.
Sdk.CreateTracerProviderBuilder()
.AddHttpClientInstrumentation()
.SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("App2"))
.AddSource(nameof(Program))
.AddJaegerExporter(opts =>
{
opts.AgentHost = _configuration["Jaeger:AgentHost"];
opts.AgentPort = Convert.ToInt32(_configuration["Jaeger:AgentPort"]);
})
.Build();
2 . Instrument dependency calls
The HTTP call is being instrumented automatically thanks to the AddHttpClientInstrumentation
extension method, but this app also dequeues a message from Rabbit, so we want to instrument that part.
The code is pretty much the same as the one we have created for the Rabbit producer on App1, the only difference is that instead of injecting the activity we are to extracting it from the message header.
private static async Task ProcessMessage(BasicDeliverEventArgs ea,
HttpClient httpClient,
IModel rabbitMqChannel)
{
try
{
//Extract the activity and set it into the current one
var parentContext = Propagator.Extract(default, ea.BasicProperties, ExtractTraceContextFromBasicProperties);
Baggage.Current = parentContext.Baggage;
//Start a new Activity
using (var activity = Activity.StartActivity("Process Message", ActivityKind.Consumer, parentContext.ActivityContext))
{
var body = ea.Body.ToArray();
var message = Encoding.UTF8.GetString(body);
//Add Tags to the Activity
AddActivityTags(activity);
_logger.LogInformation("Message Received: " + message);
_ = await httpClient.PostAsync("/sql-to-event",
new StringContent(JsonSerializer.Serialize(message),
Encoding.UTF8,
"application/json"));
rabbitMqChannel.BasicAck(deliveryTag: ea.DeliveryTag, multiple: false);
}
}
catch (Exception ex)
{
_logger.LogError($"There was an error processing the message: {ex} ");
}
}
//Extract the Activity from the message header
private static IEnumerable<string> ExtractTraceContextFromBasicProperties(IBasicProperties props, string key)
{
try
{
if (props.Headers.TryGetValue(key, out var value))
{
var bytes = value as byte[];
return new[] { Encoding.UTF8.GetString(bytes) };
}
}
catch (Exception ex)
{
_logger.LogError($"Failed to extract trace context: {ex}");
}
return Enumerable.Empty<string>();
}
//Add Tags to the Activity
private static void AddActivityTags(Activity activity)
{
activity?.SetTag("messaging.system", "rabbitmq");
activity?.SetTag("messaging.destination_kind", "queue");
activity?.SetTag("messaging.rabbitmq.queue", "sample");
}
Adding OpenTelemetry on App3
1 . Setup the OpenTelemetry library
services.AddOpenTelemetryTracing(builder =>
{
builder.AddAspNetCoreInstrumentation()
.AddSource(nameof(RabbitRepository))
.AddSqlClientInstrumentation()
.SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("App3"))
.AddJaegerExporter(opts =>
{
opts.AgentHost = Configuration["Jaeger:AgentHost"];
opts.AgentPort = Convert.ToInt32(Configuration["Jaeger:AgentPort"]);
});
});
The setup looks almost identical as the ones on the App1 and App2. There is only one thing worth mentioning here:
AddSqlClientInstrumentation()
This app is not making any HTTP request, instead of that is querying a database, so we’re using the SQL extension method to enable SQL instrumentation. We’re simply swapping the OpenTelemetry.Instrumentation.Http
library for the OpenTelemetry.Instrumentation.SqlClient
library.
This app is also queuing a message into a RabbitMq queue, but the code it’s exactly the same as the one I have showed for the app1 section.
Adding OpenTelemetry on App4
1 . Setup the OpenTelemetry library
services.AddOpenTelemetryTracing(builder =>
{
var provider = services.BuildServiceProvider();
IConfiguration config = provider
.GetRequiredService<IConfiguration>();
builder.AddAspNetCoreInstrumentation()
.AddHttpClientInstrumentation()
.Configure((sp, builder) =>
{
RedisCache cache = (RedisCache)sp.GetRequiredService<IDistributedCache>();
builder.AddRedisInstrumentation(cache.GetConnection());
})
.AddSource(nameof(Worker))
.SetResourceBuilder(ResourceBuilder.CreateDefault().AddService("App4"))
.AddJaegerExporter(opts =>
{
opts.AgentHost = config["Jaeger:AgentHost"];
opts.AgentPort = Convert.ToInt32(config["Jaeger:AgentPort"]);
opts.ExportProcessorType = ExportProcessorType.Simple;
});
});
This app dequeues a message from a RabbitMq queue and stores the message content into a Redis cache database.
The instrumentation part for dequeuing the message from RabbitMq looks exactly the same as the snippet I have put on the App2 section, so I’m going to skip that part.
The instrumentation for Redis is a little more problematic.
The issue when trying to instrument Redis is that the AddRedisInstrumentation()
extension method needs an instance of the Redis ConnectionMultiplexer
.
If you’re using an IDistributedCache
interface and the AddStackExchangeRedisCache
extension method to configure the Redis connection, like this one:
services.AddStackExchangeRedisCache(options =>
{
var connString =
$"{hostContext.Configuration["Redis:Host"]}:{hostContext.Configuration["Redis:Port"]}";
options.Configuration = connString;
});
You will realize that you can’t access the ConnectionMultiplexer
property because is not publicly accesible, so I’m writing an extension method that is using Reflection to obtain it.
public static class RedisCacheExtensions
{
public static ConnectionMultiplexer GetConnection(this RedisCache cache)
{
//ensure connection is established
typeof(RedisCache).InvokeMember("Connect", BindingFlags.Instance | BindingFlags.NonPublic | BindingFlags.InvokeMethod, null, cache, new object[] { });
//get connection multiplexer
var fi = typeof(RedisCache).GetField("_connection", BindingFlags.Instance | BindingFlags.NonPublic);
var connection = (ConnectionMultiplexer)fi.GetValue(cache);
return connection;
}
}
Also, as you can see I’m not using directly the AddRedisInstrumentation()
method, instead of that I’m wrapping the AddRedisInstrumentation()
methood with the Configure()
method.
The Configure()
method is an overload method that allows to get a hold of the IServiceProvider
instance.
Jaeger
After adding the OpenTelemetry library on these 4 apps we are going to generate some traffic and afterwards access Jaeger to start analyzing the traces that the apps are sending.
If we take a look at the entire trace, this is how it looks:
Let me break it up a little bit for you. That’s a zoomed image of the spans we have created in our trace
-
The first span (publish-message) corresponds to the App1 api endpoint, that span is created automatically when the api controller is executed thanks to the
AddAspNetCoreInstrumentation()
extension method. -
The second span (RabbitMq publish) is an Activity that we have created manually on the App1. You can see the code snippet on the “Adding OpenTelemetry on App1” section.
-
The third span (Process Message) is an Activity that we have created manually on the App2, also you can see the code snippet on the “Adding OpenTelemetry on App2” section.
-
The fourth span (HTTP POST) corresponds to the HTTP request that the App2 is making to the App3. That span is created automatically thanks to the
AddHttpClientInstrumentation()
extension method found on the App2 setup. -
The fifth span (sql-to-event) corresponds to the App3 sql-to-event api endpoint, that span is created automatically when the api controller is executed thanks to the
AddAspNetCoreInstrumentation()
extension method. -
The sixth span (sqlserver) corresponds to the SQL query that the App3 is doing, that span is created automatically thanks to the
AddSqlClientInstrumentation()
extension method. -
The seventh span (RabbitMq publish) is an Activity that we have created manually on the App3.
-
The eigth span (Process Message) is an Activity that we have created manually on the App4.
-
The last 2 spans correspond to the Redis instructions that we are doing on the App4, both spans are created automatically thanks to the
AddRedisInstrumentation()
extension method.
If we want more info on any span we can drill-down and inspect the metadata. For example if we inspect the “HTTP POST” span we can see more info about it.
Also, do you remember that in the App1 we added some extra Tags on the Activity?
activity?.SetTag("messaging.system", "rabbitmq");
activity?.SetTag("messaging.destination_kind", "queue");
activity?.SetTag("messaging.rabbitmq.queue", "sample");
Here is the result:
Jaeger is capable of building a dependency-graph after looking at the traces.
Also is capable of giving performance statistics.
And compare traces.
How to test the example apps
If you want to take a look at the 4 apps, I have uploaded everything on my GitHub repository
If you want to try for yourselves to execute this example, I have uploaded also a docker-compose file with everything you need, so you can run a compose up and you’re good to go.
But there is little caveat I wanted to mention in the docker-compose file.
With docker-compose you can control the order of the services startup and shutdown with the depends_on option, but it does not wait until a container is “ready”, it only waits until is running.
In my case that’s a problem because App2 and App4 need to wait for the rabbitMq container to be ready, to avoid this problem the compose file is overwriting the “entrypoint” for both apps and executing a shell script that makes both apps sleep 30 seconds before starting up.
Here’s the docker-compose file:
version: '3.4'
networks:
tracing:
name: tracing-network
services:
rabbitmq:
image: rabbitmq:3.6.15-management
ports:
- 15672:15672
- 5672
networks:
- tracing
sqlserver:
image: mcr.microsoft.com/mssql/server:2019-GA-ubuntu-16.04
environment:
- ACCEPT_EULA=Y
- SA_PASSWORD=Pass@Word1
ports:
- 1433
networks:
- tracing
redis:
image: redis:6.2.1
ports:
- 6379:6379
networks:
- tracing
jaeger:
image: jaegertracing/all-in-one
container_name: jaeger
restart: unless-stopped
ports:
- 5775:5775/udp
- 5778:5778
- 6831:6831/udp
- 6832:6832/udp
- 9411:9411
- 14268:14268
- 16686:16686
networks:
- tracing
app1:
build:
context: ./App1.WebApi
ports:
- "5000:80"
networks:
- tracing
depends_on:
- rabbitmq
- jaeger
- app3
environment:
Jaeger__AgentHost: jaeger
Jaeger__AgentPort: 6831
RabbitMq__Host: rabbitmq
App3Endpoint: http://app3/dummy
app2:
stdin_open: true
tty: true
build:
context: ./App2.RabbitConsumer.Console
networks:
- tracing
depends_on:
- rabbitmq
- jaeger
- app3
entrypoint: ["./wait.sh", "30", "dotnet", "App2.RabbitConsumer.Console.dll"]
environment:
Jaeger__AgentHost: jaeger
Jaeger__AgentPort: 6831
RabbitMq__Host: rabbitmq
App3UriEndpoint: http://app3
app3:
build:
context: ./App3.WebApi
ports:
- "5001:80"
networks:
- tracing
depends_on:
- rabbitmq
- jaeger
- sqlserver
environment:
Jaeger__AgentHost: jaeger
Jaeger__AgentPort: 6831
RabbitMq__Host: rabbitmq
SqlDbConnString: server=sqlserver;user id=sa;password=Pass@Word1;
app4:
build:
context: ./App4.RabbitConsumer.HostedService
networks:
- tracing
depends_on:
- rabbitmq
- jaeger
- redis
entrypoint: ["./wait.sh", "30", "dotnet", "App4.RabbitConsumer.HostedService.dll"]
environment:
Jaeger__AgentHost: jaeger
Jaeger__AgentPort: 6831
RabbitMq__Host: rabbitmq
Redis__Host: redis
Redis__Port: 6379