This document describes how to create collections over event data and setup time-based retention to automatically delete old events from the collection.
Event data represents a sequence of events, each paired with a time field. The time field represents when the event occurred. Web search logs, web site clicks, stock prices, weather data, security logs, and so on are all applicable event data set examples.
Sample weather data event (source):
dt
represents the time of data collection in UTC
{
"coord": {
"lon": 139,
"lat": 35
},
"sys": {
"country": "JP",
"sunrise": 1369769524,
"sunset": 1369821049
},
"weather": [
{
"id": 804,
"main": "clouds",
"description": "overcast clouds",
"icon": "04n"
}
],
"main": {
"temp": 289.5,
"humidity": 89,
"pressure": 1013,
"temp_min": 287.04,
"temp_max": 292.04
},
"wind": {
"speed": 7.31,
"deg": 187.002
},
"rain": {
"3h": 0
},
"clouds": {
"all": 92
},
"dt": 1369824698,
"id": 1851632,
"name": "Shuzenji",
"cod": 200
}
Such data sets are used extensively for statistics, monitoring, finance, forecasting, tracking application events and in any field that involve a time component. Refer here to learn more about event data and building applications around it.
Rockset allows you to create collections over event data, run queries efficiently and join event data with other data sets using standard SQL.
Event data sets tend to grow very quickly and users may want to set a time-based retention (eg. past 24 hours, past 7 days, past 30 days etc.) to automatically purge older events. Rockset allows efficient deletion of old events by setting a retention period while creating a collection.
Alternatively, you can use REST API to write events to Rockset.
Example collection:
This collection represents two taxi ride events. pickup_time
field represents when the ride started.
[{
"ride_id": "ride1",
"driver_id": 1,
"pickup_time" : 1534406486170,
"drop_time" : 1534406487540,
"pickup": {
"latitude": 37.22,
"longitude": 122.77
},
"destination": {
"latitude" : 40.88,
"longitude" : 123.78
},
"cost": 20,
"payment_id" : 1
},
{
"ride_id": "ride2",
"driver_id": 2,
"pickup_time" : 1534406486150,
"drop_time" : 1534406488440,
"pickup": {
"latitude": 37.22,
"longitude": 122.77
},
"destination": {
"latitude" : 39.27,
"longitude" : 121.78
},
"cost": 30,
"payment_id" : 2
}]
We will map pickup_time
to event-time field in Rockset using CLI and Python client below. We will also set retention for this collection as 10 days.
$ rock create collection taxi_rides \
--event-time-field pickup_time \
--event-time-format milliseconds_since_epoch \
--event-time-default-timezone UTC \
--retention 10d
Collection "taxi_rides" was created successfully.
from rockset import Client
rs = Client()
first_event_collection = rs.Collection.create(
"taxi_rides",
event_time_field="pickup_time",
event_time_format="milliseconds_since_epoch",
event_time_default_timezone="UTC",
retention_secs=10*24*60*60)
Refer here for supported time formats and time zones.
Add document:
$ rock upload taxi_rides taxi_rides.json --format=JSON
Query:
> SELECT ride_id FROM taxi_rides WHERE _event_time >= TIMESTAMP_MICROS(1534406486170000);
+-----------+
| ride_id |
|-----------|
| ride2 |
+-----------+
> SELECT a.type, COUNT(b.driver_id) num \
FROM taxi_rides b JOIN payment_types a ON a.id = b.payment_id \
where b._event_time >= TIMESTAMP_MICROS(1534406486150000) \
GROUP BY a.type ORDER BY num;
+-------+--------+
| num | type |
|-------+--------|
| 1 | cash |
| 1 | credit |
+-------+--------+