Topology labels in Istio metrics

2023-06-15

Nowadays, it’s very common to have cluster nodes deployed in multiple locations. If so, you might be interested in how is the traffic between workloads distributed across locations. There are multiple ways how to monitor it, however, let’s try to put this information to Istio metrics by extending metrics dimensions.

Metadata exchange

Istio usually reports source_* and destination_* labels which contain information about which client sent a request (source) to which server (destination). To populate those data, Istio proxy needs to exchange information between two peers. That’s done via metadata exchange filter.

When a client starts a request, the proxy uses context information stored in flat buffers to read metadata that are inserted to request as x-envoy-peer-metadata HTTP header. Then server simply extracts the metadata header which means the server has all information from both peers already. To complete the exchange, the server uses exactly the same approach and uses response header to send metadata to the client.

a d c d l i x e - n e t n v o y - p e e e n r v - o m y e t a d a t a h e a d e r e x t r a c t e n x v - o e y n v o y - p e s e e r r - v m e e r t a d a t a h e a d e r

Important note for us here is that all metadata can be used to extend metrics.

Transfering zone information between peers

Luckily, locality information already exists in the node attributes. However, every coin has two sides, and for us, it means that locality is not exchanged between peers. So let’s do it on our own!

We will use the same approach as in metadata exchange described above.

Client:

  1. read node.locality.zone attribute which we will use for source_zone
  2. insert zone into a custom HTTP header. For example x-envoy-peer-zone
  3. dispatch the request to server

Server:

  1. check if x-envoy-peer-zone exists (we know we are destination)
  2. extract the header
  3. set downstream_peer_zone attribute which we will use in our metrics

To implement this usecase we will write very simple WASM extension. I’ll use Rust but language choice don’t matter that much here.

First of all, define a name for our header name:

const PEER_LOCALITY_EXCHANGE_HEADER: &str = "x-envoy-peer-zone";

Next, we need to catch a request and create header. For that we will use on_http_request_headers. But there is a twist. Client and server will use the same method so we need to distinguish which peer we are. Luckily that’s very easy.

If our exchange header is not set, then we are source so we can read zone from ABI and create header. If exchange header is set, then we are on our destination (server) and we need to extract metadata and set downstream_peer_zone attribute.

    fn on_http_request_headers(&mut self, _: usize, _: bool) -> Action {
        match self.get_http_request_header(PEER_LOCALITY_EXCHANGE_HEADER) {
            Some(v) => {
                self.set_http_request_header(PEER_LOCALITY_EXCHANGE_HEADER, None);
                self.set_property(vec!["downstream_peer_zone"], Some(v.as_bytes()));
            }
            None => match self.get_property(vec!["node", "locality", "zone"]) {
                Some(v) => {
                    self.set_http_request_header(
                        PEER_LOCALITY_EXCHANGE_HEADER,
                        Some(str::from_utf8(&v).unwrap()),
                    );
                }
                None => error!("enable to set locality attribute for downstream peer"),
            },
        }

        Action::Continue
    }

This already works, but metrics are generated from both sides (destination and source reporter) therefore we need to do the same thing also for response headers.

    fn on_http_response_headers(&mut self, _: usize, _: bool) -> Action {
        match self.get_http_response_header(PEER_LOCALITY_EXCHANGE_HEADER) {
            Some(v) => {
                self.set_http_response_header(PEER_LOCALITY_EXCHANGE_HEADER, None);
                self.set_property(vec!["upstream_peer_zone"], Some(v.as_bytes()));
            }
            None => match self.get_property(vec!["node", "locality", "zone"]) {
                Some(v) => {
                    self.set_http_response_header(
                        PEER_LOCALITY_EXCHANGE_HEADER,
                        Some(str::from_utf8(&v).unwrap()),
                    );
                }
                None => error!("enable to set locality attribute for upstream peer"),
            },
        }

        Action::Continue
    }

Easy!

Link to full implementation and build files is at my github: github.com/kirecek/wasm-locality-attribute.

After we deploy the WASM extension, all istio-proxies will contain a two new attributes: downstream_peer_zone and upstream_peer_zone which we can use for metrics extension.

Adding zone information to metrics

Once the zone attribute is available, the only thing left is to put it to our metrics. This can be done by patching stat envoyfilters or by using telemetry API which I am going to use and also recommend.

Since we are generating zone attribute from both peers, we need to distinguish where are we generating the labels.

On the client (source of the network traffic), we need to use the default zone attribute and upstream zone information:

      - match:
          metric: REQUEST_COUNT
          mode: CLIENT
        tagOverrides:
          source_zone:
            value: node.locality.zone
          destination_zone:
            value: upstream_peer_zone

On the server (destination of network traffic), we must use the node zone as destination and for source information from our downstream peer.

      - match:
          metric: REQUEST_COUNT
          mode: SERVER
        tagOverrides:
          source_zone:
            value: downstream_peer_zone
          destination_zone:
            value: node.locality.zone

Full configuration:

apiVersion: telemetry.istio.io/v1alpha1
kind: Telemetry
metadata:
  name: example
spec:
  metrics:
    - providers:
        - name: prometheus
      overrides:
      - match:
          metric: REQUEST_COUNT
          mode: SERVER
        tagOverrides:
          source_zone:
            value: downstream_peer_zone
          destination_zone:
            value: node.locality.zone
      - match:
          metric: REQUEST_COUNT
          mode: CLIENT
        tagOverrides:
          source_zone:
            value: node.locality.zone
          destination_zone:
            value: upstream_peer_zone

use ALL_METRICS instead of REQUEST_COUNT to affect all metrics.

And that’s it. After everything is deployed, you should shortly observe a new labels in Istio metrics.

kubectl exec -it deploy/${INJECTED_WORKLOAD} -- curl localhost:15000/stats/prometheus | grep istio_requests

in case a new dimensions would break metric name, you need to set extraStatTags in Istio global config or as an annotation. For example sidecar.istio.io/extraStatTags: source_zone,destination_zone