Re-publish Kafka record not setting partition number?

Well, I plan to re-publish Kafka records that my app is consuming to another Kafka topic on another host (internal only). This sort of lets other groups consume data from this topic without having to create connections outside of the private network, since the original data source is a 3rd party vendor. They can then also do their own transformations or other business logic on it as if they were consuming from the original source. Delay from source to re-publish should be minimal. Will still be close to real time.

What I normally do when producing records is not to set a partition by default. Would this be okay when re-publishing records? Will there be conflict with original record partition number to the target topic partition?

ANSWER

How many partitions does your target topic have? Do you have freedom to create as many partitions as the source?

(1) If you can create 1:1 partitions then my suggestion is to retain the original partition number of that record when you publish to the destination topic.

(2) You can also assign the partition in a round-robin way when you have lesser partitions on your target topic. Save metadata info of that record in the headers, such as original partition assignment and etc. Perhaps that will be useful down the line for your consumers.

(3) Lastly, you can NOT assign a partition number. Instead, let Kafka do the magic for you. When no partitions are set, Kafka will determine the next partition in a round robin method. You can do this with a null value instead of an int for partition number.

Bottom line is that it depends on the project requirements, I suppose, or more so on the infrastructure of your destination Kafka.

How can we design view count on videos website like youtube which needs to be highly optimized

As per my understanding :

We can have a backend scalable microservice MS1 having an API. Client calls the API in case of a user plays a video. This microservice will be using a sharded cache C1 and a message broker MB1. The cache C1 will manage video count for videos<VideoId,VideoCount>, for every new request it will increment the count in cache C1 and will add the request<VideoId,UserId> to message broker MB1.
On the other side of message broker MB1 a service MS2 will update the request in Database DB1. The sharded cache C1 will fetch data in case of unavailability of data, from the MS2.

Recently, I was in a interview where the interviewer asked me to design this view count which should be scalable. He was only concerned with millions of connections created with cache C1 in case of millions of request.
And I was under impression that the cache C1 is scalable so it’s not an issue.

I have designed similar thing before in some project including like and dislike count. So same way I tried to explain him but he wasn’t convinced .
I tried to find some standard approach or algorithm to optimize it further but unable to find any over google, so finally here I am here. Kindly help me, have I done anything wrong.

Go to Source
Author: Lovin

Integrating HTTP / Webhooks with Message Queues

I’m working at a project which integrates several Applications mostly SaaS Applications. The SaaS solutions have all the possibilities to hook into the internal event system with webhooks. The webhooks give us the ability to send a message to a single system but we have to create multiple webhooks to send a single event so several systems.

My idea is to implement a message bus as a centralized middelware but the problem is that the SaaS solutions only provide an integration by http(s) and not with protocols like AMQP.

RabbitMQ for example provides the possibility to publish to a topic over http. To consume you can also use http but if the message is once consumed, the queue removes it or keeps it in the queue.

Has anybody a good solution to bridge the gap between http and aqmp? I thought about small consumer services which subscribe to a topic and then forwards the message to the RESTful API.

We try to avoid a huge enterprise service bus/iPaaS project currently. I know that this could be one of the best approaches but due to internal decisions and project time, costs and so on it’s not a possibility for the moment.

One of our requirements is to have a guaranteed delivery, so that no message will be lost.

Thanks for your suggestions.

Go to Source
Author: Gulliva

Is it a bad design to have 50K bindings on a single RabbitMQ queue?

Is it a bad design to have 50K bindings on a single RabbitMQ queue?

We are designing a new feature in our system where consumers (consumer == internal application) need to receive messages about changes in items they are interested in.
From the statistics we gathered we see the maximal number of items a single consumer can be interested in is 50K (on average it would be ~15K).
Initial tests shows that this is works OK and RabbitMQ handles it, but when we delete such a queue (if for example we scaling down the system and shutting down one of the instances) it takes a few minutes for it to be deleted and the RabbitMQ management portal becomes unresponsive.

Does it make sense to have so many bindings or is it a bad design?

  • We we’ll have around 50 instances of the consumers, each one with its own queue which is not persistent and should be auto deleted when the consumer shut down

Go to Source
Author: Tamir Dresher