Data Format > Time Series Data

Time Series Data

This document describes how to create collections over event data and setup time-based retention to automatically delete old events from the collection.

Event Data

Event data represents a sequence of events, each paired with a time field. The time field represents when the event occurred. Web search logs, web site clicks, stock prices, weather data, security logs, and so on are all applicable event data set examples.

Sample weather data event (source):

dt represents the time of data collection in UTC

{
  "coord": {
    "lon": 139,
    "lat": 35
  },
  "sys": {
    "country": "JP",
    "sunrise": 1369769524,
    "sunset": 1369821049
  },
  "weather": [
    {
      "id": 804,
      "main": "clouds",
      "description": "overcast clouds",
      "icon": "04n"
    }
  ],
  "main": {
    "temp": 289.5,
    "humidity": 89,
    "pressure": 1013,
    "temp_min": 287.04,
    "temp_max": 292.04
  },
  "wind": {
    "speed": 7.31,
    "deg": 187.002
  },
  "rain": {
    "3h": 0
  },
  "clouds": {
    "all": 92
  },
  "dt": 1369824698,
  "id": 1851632,
  "name": "Shuzenji",
  "cod": 200
}

Such data sets are used extensively for statistics, monitoring, finance, forecasting, tracking application events and in any field that involve a time component. Refer here to learn more about event data and building applications around it.

Rockset allows you to create collections over event data, run queries efficiently and join event data with other data sets using standard SQL.

Retention

Event data sets tend to grow very quickly and users may want to set a time-based retention (eg. past 24 hours, past 7 days, past 30 days etc.) to automatically purge older events. Rockset allows efficient deletion of old events by setting a retention period while creating a collection.

Creating Event Data Collection

Prerequisites

  1. Setup Rockset Python client or Rock CLI.
  2. Integrate with Amazon S3 and copy the event data into a S3 bucket. You can also load streaming data into S3 using a service like Amazon Kinesis Data Firehose.

Alternatively, you can use REST API to write events to Rockset.

Example collection:

This collection represents two taxi ride events. pickup_time field represents when the ride started.

[{
    "ride_id": "ride1",
    "driver_id": 1,
    "pickup_time" : 1534406486170,
    "drop_time" : 1534406487540,
    "pickup": {
                 "latitude": 37.22,
                 "longitude": 122.77
              },
    "destination": {
                 "latitude" : 40.88,
                 "longitude" : 123.78
              },
    "cost": 20,
    "payment_id" : 1
},
{
    "ride_id": "ride2",
    "driver_id": 2,
    "pickup_time" : 1534406486150,
    "drop_time" : 1534406488440,
    "pickup": {
                 "latitude": 37.22,
                 "longitude": 122.77
              },
    "destination": {
                 "latitude" : 39.27,
                 "longitude" : 121.78
              },
    "cost": 30,
    "payment_id" : 2
}]

We will map pickup_time to event-time field in Rockset using CLI and Python client below. We will also set retention for this collection as 10 days.

Using CLI

$ rock create collection taxi_rides \
    --event-time-field pickup_time \
    --event-time-format milliseconds_since_epoch \
    --event-time-default-timezone UTC \
    --retention 10d
Collection "taxi_rides" was created successfully.

Using Python:

from rockset import Client
rs = Client()
first_event_collection = rs.Collection.create(
                        "taxi_rides",
                        event_time_field="pickup_time", 
                        event_time_format="milliseconds_since_epoch", 
                        event_time_default_timezone="UTC",
                        retention_secs=10*24*60*60)

Refer here for supported time formats and time zones.

Add document:

$ rock upload taxi_rides taxi_rides.json --format=JSON

Query:

  1. Select all rides since 1534406486170
> SELECT ride_id FROM taxi_rides WHERE _event_time >= TIMESTAMP_MICROS(1534406486170000);
+-----------+
| ride_id   |
|-----------|
| ride2     |
+-----------+
  1. Join on another collection that contains payment id to description mapping to count number of rides of each payment type since 1534406486150
> SELECT a.type, COUNT(b.driver_id) num \
        FROM taxi_rides b JOIN payment_types a ON a.id = b.payment_id \
        where b._event_time >= TIMESTAMP_MICROS(1534406486150000) \
        GROUP BY a.type ORDER BY num;
+-------+--------+
| num   | type   |
|-------+--------|
| 1     | cash   |
| 1     | credit |
+-------+--------+