Kafka Connect
What is Kafka Connect?
At first glance, it wasn’t obvious why do we need Kafka Connect. As the name suggests, it seems to me that it is just another feature or tool like Kafka Bridge. Honestly, I can’t tell what are the differences that they are going to bring from the name itself, it all sound like they are going to do the same thing. And there is also a custom resource in Kafka that is call Kafka Connector.
While Kafka Bridge is essentially a HTTP Server, serving to translate requests from HTTP to Kafka API, Kafka Connect is meant to deliver for very dynamic purposes.
Kafka Connect is acting like a host for plugins that are built to connect to external systems. It’s a mini controlplane that would spin up workers to house the connector — Kafka Connector. So you may ask, why do we need these when we can build our own producer or consumer, which basically doing the same thing? I have thought about this as well, and my take on this is reusability. As long as you are using Kafka, chances are you will have the need to pull data from source or push data to sink directly, without much logics involved — this is exactly what Kafka Connect is great for.
There are many open source connectors built for Kafka Connect to connect to different external systems, e.g. Google Pub/Sub, AWS S3, Slack, Telegram and etc. What we need to do, is to configure Kafka Connect to pull the artifacts from its respective repository (e.g. Maven, Nexus, Github releases..), it will automatically build a new Kafka Connect image which will include the artifacts, and spin up the workers using that image.
Sink to Google Pub/Sub
In this example, we will be using Strimzi and the Pub/Sub Kafka connector built by the open source community in GCP. We will extract the data from Kafka, and forward it to the Google Pub/Sub. Kafka Connect will obtain the JAR file, then build an image by including this JAR file. This image will be pushed to the image registry that we configured in the build output section. Then it would spin up the workers using this image. Note that, I have also mount my Google Service Account private key secret as a volume, this will come in handy later.
Next, we will need to create a Kafka Connector instance. Here, we will point the GCP credential to use the secret file that we have mounted earlier in Kafka Connect custom resource.
Now, if we send some messages to the Kafka Topic, we should be able to see the same messages in the Google Pub/Sub Topic.
Happy Kafka!