Well, I plan to re-publish Kafka records that my app is consuming to another Kafka topic on another host (internal only). This sort of lets other groups consume data from this topic without having to create connections outside of the private network, since the original data source is a 3rd party vendor. They can then also do their own transformations or other business logic on it as if they were consuming from the original source. Delay from source to re-publish should be minimal. Will still be close to real time.
What I normally do when producing records is not to set a partition by default. Would this be okay when re-publishing records? Will there be conflict with original record partition number to the target topic partition?
How many partitions does your target topic have? Do you have freedom to create as many partitions as the source?
(1) If you can create 1:1 partitions then my suggestion is to retain the original partition number of that record when you publish to the destination topic.
(2) You can also assign the partition in a round-robin way when you have lesser partitions on your target topic. Save metadata info of that record in the headers, such as original partition assignment and etc. Perhaps that will be useful down the line for your consumers.
(3) Lastly, you can NOT assign a partition number. Instead, let Kafka do the magic for you. When no partitions are set, Kafka will determine the next partition in a round robin method. You can do this with a null value instead of an int for partition number.
Bottom line is that it depends on the project requirements, I suppose, or more so on the infrastructure of your destination Kafka.