Dynamic extensibility and Protocol Buffers

Protocol Buffers (protobuf) are Google’s wire efficient, statically typed, language independent, data serialization format. We use protobuf in the Envoy proxy to define its v2 APIs, also known as the universal data plane API for network proxies.

In this post, I’ll dive into some of the nuances and trade-offs when dynamically extending protobufs. That is, embedding an opaque message field in a protobuf with a message type unknown at compile time. I’ll focus on the Envoy project, the context in which we’ve recently explored this trade-off, but this post should be applicable to any scenario where opaque configuration embedding is required.

Envoy extensibility requirements

One of the key features of Envoy is its extensibility. Each request/stream/connection traverses a configurable stack of L4/L7 filters. These filters can inspect or mutate the traffic, for example by inserting a header, calling out to an authentication service or transcoding between protocols. Filters follow a well defined API and any Envoy consumer may link in their own custom filters, e.g. a filter containing organization specific business logic, and configure the customer filters via the data plane API. In addition to its L4/L7 filters, Envoy has a plugin architecture for logging, tracing and statistics output.

We define in the data plane API fixed message types in .protofiles at https://github.com/envoyproxy/data-plane-api/tree/master/api for Envoy’s builtin capabilities and filters. For example, Envoy’s RouteConfiguration message describes a route table, mapping from virtual hosts and paths to routing actions:

The message types corresponding to the configuration for Envoy’s core capabilities are specified in our GitHub repository and will be extended as Envoy’s capabilities grow. In a configuration update, however, an Envoy user will want to specify both the configuration of Envoy’s core capabilities and that of their own custom filters.

Imagine Acme Corp has written an AcmeWidget filter to initiate an RPC to an authentication service on each request. The custom filter’s configuration will be defined in protobuf, e.g.:

This proto may be proprietary and is unlikely to be hosted in Envoy’s data plane API repository. Yet, we need to provide some way to encode in Envoy’s configuration updates the values of AuthService messages, without knowing the static message type. Protobuf provides two “well known” messages types for this form of opaque embedding of configuration; the Any and Struct message types.

google.protobuf.Struct

Struct is the easiest of the two message types to appreciate for this role, as it is simply a proto representation of a JSON object. When coupled with the fact that proto3 has a canonical JSON representation, any proto3 message can be mechanically transformed to JSON and embedded in a field of this type.

This is a very flexible type and brings the advantages of dynamic typing to protobuf. We use this in Envoy today to allow arbitrary filters to be embedded:

A concrete example of the text proto representation for AcmeValue embedded in Filter would be:

This has worked well to date, but comes with a set of trade-offs that are part of the flexible dynamic typing package:

It’s not possible to statically type check, without Envoy specific logic, an Envoy configuration with an embedded opaque filter config, in a way that the type correctness of the embedded opaque config is established. Instead, when Envoy ingests its configuration, it determines at runtime the corresponding protobuf type for a filter, attempts to convert the Struct to this protobuf type, and on failure raises an exception. External tools are unlikely to be able to perform this same operation, since the knowledge of mapping from filter name to schema are not standardized. However, external tools are able to display and manipulate configurations for filters dynamically, with no prior knowledge of the underlying type, since these are just JSON objects. You can also round-trip from a binary proto3 representation to the JSON canonical proto3 representation and back without any knowledge of the per-filter protobuf schemas (also known as protobuf descriptors).
This representation is not wire efficient. The inefficiency versus regular protobuf is clear from the fact that the field names in the representation are repeated at each definition, e.g. for AcmeWidget we would have {"cluster": "foo", "auth_type": "JWT"} on the wire. With a known protobuf descriptor, it is unnecessary to place “cluster” or “auth_type” field name bytes on the wire. This is part of the reason why protobufs are 3–10x smaller than XML (in addition to the efficient binary encoding). This is not a significant concern for Envoy configuration today, as its configurations are typically small and are part of Envoy’s control plane, where performance concerns do not dominate to the same extent as on the data plane. It may be a concern as we scale to very large configurations in the future.
The official language specific protobuf libraries do not have first-class support for converting between Struct and arbitrary protobuf message types, instead it’s always necessary to round-trip via JSON (de)serialization ops. This has performance considerations, but as above, these are not first class concerns today in Envoy.
The text proto format above is not particularly pleasant to read or write.

google.protobuf.Any

The Any message type embeds a binary serialized protobuf, together with type information, inside the field of another protobuf. Internally, it is just a byte array with the wire format protobuf serialization of the embedded message and a string containing a type URL. The type URL is essentially a string containing the type name of the form type.googleapis.com/packagename.messagename. If we had used Any, the above Filter definition would have looked like:

A concrete example of the text proto representation for AcmeValue embedded in Filter would now be:

While this looks similar to the Struct example, consider these differences:

Since the embedded proto has a compact serialized representation, this is almost as efficient as just in-lining the embedded protobuf, i.e. close to optimal.
There is no meaningful way to make sense of the embedded proto without its schema, i.e. its protobuf descriptor. For the Envoy binary proper, this is not a consideration, since all filters are statically linked and hence their associated protobuf descriptors are available. However, consider an independent Web app with a UI for building and visualizing Envoy configurations. It is reasonable to expect that this have available the protobuf descriptors for Envoy’s core data plane API, but it won’t know about the AcmeWidget protobuf descriptor. To do so would require extra complexity in the Web app, where you would need to first have Acme Corp compile the protobuf descriptor objects and upload them. We’ve found from experience that this adds some friction to the operational experience with Envoy when we required this for the gRPC transcoder filter.
The type URL provides information that can be used to automate static checking of both the Envoy configuration and its embedded filters. The caveat regarding the availability of protobuf descriptors in the above point holds here as well.
Messages can be efficiently (de)serialized, without JSON round-tripping.
There is a pretty text proto representation, much cleaner than that for Struct embeddings. This is useful if you want to use text proto as a configuration format for Envoy, however we generally recommend YAML, since text proto has not been standardized and is not officially supported or documented in open source protobuf.

Edit (2018–02–09): An additional consideration when using Any objects that we recently discovered is that, since the type URL of an embedded message is serialized inside the Any object, any package namespace change to a message embedded in Any will break protobuf wire compatibility. This is because the type URL is derived from the embedded message’s package namespace. This does not occur with Struct, since there is application-level knowledge of the underlying type, divorced from the specifics of protobuf package namespacing.

Which should you use?

We adopted Struct for our filter, stats, logging and tracing extension points early in our design of the Envoy data plane API. This was largely due to the advantages of a schema-less representation (look ma, no proto descriptors!). It’s easy to generate Envoy configurations and dump them.

Elsewhere in the data plane API, when describing gRPC services in which a number of different resource types could be embedded, we opted to use Any. In this situation, we needed to embed a well known set of protos that also lived in the data plane API repository. There was no concern about protobuf descriptor availability here and the efficiency advantages came for free. We could have used a oneof as well here, at the minor expense of having to update its definition each time we wanted to add a new type.

It would be possible to have the benefit of both Any and Struct by structuring the Filter config as follows:

Pushing this design concept further, Lizan Zhou has suggested that we use Any in Envoy as our basic opaque embedding type, and then embed a Struct within an Any proto to achieve a similar arrangement as above. This is a super cool idea, essentially nested protobuf types all the way down. Any embedded protobuf with the type URL type.googleapis.com/google.protobuf.Struct could be interpreted by Envoy as a Struct, while retaining the option of the wire efficient Any when not embedding in this way. This would deliver to the Envoy end user maximum flexibility to make the above trade-off for themselves. An concrete example of this double nesting is:

It’s likely we’ll adopt one of the above combined Any/Struct approaches to obtaining the best of both worlds at some point in the future. For now, we have frozen our core data plane APIs in preparation for production readiness in the Envoy 1.5 release. We will need to make this switch in a backwards compatible way when we do it, while maintaining consistency of mechanism across our extensible APIs.

Protobuf provides some powerful mechanisms to support embedding of opaque configurations inside its statically typed message schemas. Choosing the right approach for a project requires awareness of the trade-offs between these mechanisms and how they can be combined. We would have found a post with the above details invaluable when making this design decision in the Envoy project, hopefully we can benefit the community by sharing these lessons learned.

Update (2020–06–24)

Envoy has adopted the combined Any/Struct approach since the Envoy 1.12.0 release in January 2020. Rather than embedding a plain Struct inside an Any, we preferred to embed a TypedStruct in Any. This provided a type URL, similar to that used in Any, in the embedded message. The type URL allowed the Protobuf descriptor for the type to be determined unambiguously, which was useful in being able to decouple the names of Envoy’s extensions from their type. The largest wins came in Envoy’s v3 xDS APIs, where we were able to abandon untyped Struct extension fields in favor of a typed Any with the ability to still support JSON-like configuration objects.

Acknowledgements: The above survey of the Any vs. Struct trade-offs was informed by helpful discussions with John Millikin and Lizan Zhou, many thanks. Also to mattklein123 for the many PR reviews and discussions on this subject as we protoized Envoy’s data plane API.

Disclaimer: The opinions stated here are my own, not those of my company (Google).

External C++ dependency management in Bazel