State Schema Evolution
This page provides an overview of how you can evolve your state type’s data schema. The current restrictions varies across different types and state structures (, ListState
, etc.).
Note that the information on this page is relevant only if you are using state serializers that aregenerated by Flink’s own type serialization framework.That is, when declaring your state, the provided state descriptor is not configured to use a specific or TypeInformation
, in which case Flink infers information about the state type:
Under the hood, whether or not the schema of state can be evolved depends on the serializer used to read / writepersisted state bytes. Simply put, a registered state’s schema can only be evolved if its serializer properlysupports it. This is handled transparently by serializers generated by Flink’s type serialization framework(current scope of support is listed ).
If you intend to implement a custom for your state type and would like to learn how to implementthe serializer to support state schema evolution, please refer toCustom State Serialization.The documentation there also covers necessary internal details about the interplay between state serializers and Flink’sstate backends to support state schema evolution.
- Take a savepoint of your Flink streaming job.
- Update state types in your application (e.g., modifying your Avro type schema).
- Restore the job from the savepoint. When accessing state for the first time, Flink will assess whether or not the schema had been changed for the state, and migrate state schema if necessary.The process of migrating state to adapt to changed schemas happens automatically, and independently for each state.This process is performed internally by Flink by first checking if the new serializer for the state has differentserialization schema than the previous serializer; if so, the previous serializer is used to read the state to objects,and written back to bytes again with the new serializer.
Further details about the migration process is out of the scope of this documentation; please refer to.
Supported data types for schema evolution
Currently, schema evolution is supported only for POJO and Avro types. Therefore, if you care about schema evolution forstate, it is currently recommended to always use either Pojo or Avro for state data types.
There are plans to extend the support for more composite types; for more details,please refer to .
Flink supports evolving schema of POJO types, based on the following set of rules:
- New fields can be added. The new field will be initialized to the default value for its type, as.
- Declared fields types cannot change.
- Class name of the POJO type cannot change, including the namespace of the class.Note that the schema of POJO type state can only be evolved when restoring from a previous savepoint with Flink versionsnewer than 1.8.0. When restoring with Flink versions older than 1.8.0, the schema cannot be changed.
Avro types
One limitation is that Avro generated classes used as the state type cannot be relocated or have differentnamespaces when the job is restored.
Attention Schema evolution of keys is not supported.
Example: RocksDB state backend relies on binary objects identity, rather than hashCode
method implementation. Any changes to the keys object structure could lead to non deterministic behaviour.
AttentionKryo cannot be used for schema evolution.