Avro – Unions

As a boolean:

{
	"fieldOne": "hello world",
	"fieldTwo": true,
	"foobar": {
		"boolean": false
	}
}

If value is null:

{
	"fieldOne": "hello world",
	"fieldTwo": true,
	"foobar": null
}

The above is what a field will look like as output where in Avro that field is defined as one of the Complex Types – Unions. Example below, it declares a schema which may be one of null or boolean.

{
  "type": [
    "null", 
    "boolean"
  ]
}

Kafka Consumer Error – AVRO with Field Type as a Union of null and boolean

Recently I ran into some issues in consuming Kafka AVRO messages. Everything went well up to the de-serialization of the record from the Kafka topic. It all just went crazy once it was being serialized into JSON. That’s pretty much where the proverbial buck stopped.

I was scratching my head on why this odd behavior was happening. Looking at the message published on the Kafka topic, it was following the defined AVRO schema as expected. There were no deviations.

Somehow when being converted to JSON it started throwing an exception. It was always at the same spot every time. It would not go any further.

Debugging the flow of the program, it showed me that what should have been this:

{
	"fieldOne": "hello world",
	"fieldTwo": true,
	"foobar": false
}

Became something like this:

{
	"fieldOne": "hello world",
	"fieldTwo": true,
	"foobar": {
		"boolean": false
	}
}

And that is why the program was throwing an error!

ANSWER

The solution I did was to remove null as a field type. Set the field to be strictly boolean. What I wanted was to initially have the state of this field as null. Then further down the line it can be set to either true or false.

After the AVRO schema was updated, the error went away.

This was reported to Apache AVRO Jira. More details about this problem here – https://www.joseyamut.xyz/2020/07/29/de-serializing-kafka-messages-with-union-defined-field/