Data Sources > Apache Kafka

Apache Kafka

This page covers how you can set up Rockset with Apache Kafka as the data source for a Rockset collection. It uses Kafka Connect plugin for Rockset with a source connector to forward data into a Rockset collection.

Introduction

This setup assumes that you have Kafka running and configured already. We will be running Kafka connect to forward data from Kafka into a Rockset collection. There are instructions on setting up the Rockset connector plugin with Apache Kafka Connect in the kafka-connect-rockset repository.

Create a Collection

In the Rockset console, you can create a collection from Workspace > Collections > Create Collection.

Create Collection

Using the CLI, you can create a collection by running the following.

$ rock create collection my-kafka-collection 

Collection "my-kafka-collection" was created successfully.

Configuration

Once you have created a collection, you can configure the Rockset Kafka connect plugin to forward to this collection that you created. You also must provide a valid API Key and API Server URL in the plugin configuration. The configuration options are documented in the Rockset connector documentation here.

Example configuration is described below:

"name": "rockset-sink",
"config": {
    "connector.class": "rockset.RocksetSinkConnector",
    "tasks.max": "20",
    "rockset.task.threads": "5",
    "topics": "your-kafka-topics separated by commas",
    "rockset.workspace": "your-rockset-workspace",
    "rockset.collection": "your-rockset-collection",
    "rockset.apikey": "your-api-key",
    "rockset.apiserver.url": "https://api.rs2.usw2.rockset.com",
    "format": "json"
}

For a detailed walkthrough that describes the setup and operation of the above, refer to this blog post that describes real-time analytics with Kafka and Rockset.