Schema Registry

    • A "server-side" approach in which producers and consumers inform the system which data types can be transmitted via the topic. With this approach, the messaging system enforces type safety and ensures that producers and consumers remain synced.Both approaches are available in Pulsar, and you're free to adopt one or the other or to mix and match on a per-topic basis.

    • For the "client-side" approach, producers and consumers can send and receive messages consisting of raw byte arrays and leave all type safety enforcement to the application on an "out-of-band" basis.

    • For the "server-side" approach, Pulsar has a built-in schema registry that enables clients to upload data schemas on a per-topic basis. Those schemas dictate which data types are recognized as valid for that topic.

    Schemas are automatically uploaded when you create a typed Producer with a Schema. Additionally, Schemas can be manually uploaded to, fetched from, and updated via Pulsar's REST API.

    Pulsar schemas are applied and enforced at the topic level (schemas cannot be applied at the namespace or tenant level). Producers and consumers upload schemas to Pulsar brokers.

    Pulsar schemas are fairly simple data structures that consist of:

    • A name. In Pulsar, a schema's name is the topic to which the schema is applied.
    • A payload, which is a binary representation of the schema
    • User-defined properties as a string/string map. Usage of properties is wholly application specific. Possible properties might be the Git hash associated with a schema, an environment like dev or , etc.

    In order to illustrate how schema versioning works, let's walk through an example. Imagine that the Pulsar created using the code below attempts to connect to Pulsar and begin sending messages:

    The table below lists the possible scenarios when this connection attempt occurs and what will happen in light of each scenario:

    The following formats are supported by the Pulsar schema registry:

    • None. If no schema is specified for a topic, producers and consumers will handle raw bytes.
    • String (used for UTF-8-encoded strings)
    • JSON
    • Java