- Getting Started
- Quick Start
Quick Start
Get started with Rockset and go from ingesting data to executing a query. This tutorial will guide you through creating a Rockset collection, loading a sample dataset, and then executing queries on that data. It will take around 10 minutes to complete.
The steps to get started:
- Sign Up for Rockset
- Create a Collection
- Execute a Simple Query
- Execute a Parameterized Query
- Execute a Query Lambda
#Sign Up for Rockset
You can register for a Rockset account using your GitHub account, Google account, or email address.
- It’s free to get started with Rockset. And, you can build for free in perpetuity if you have less than 2 GiB of data. To register, visit https://console.rockset.com/create.
- Once you've successfully registered your account, you will be redirected to the Rockset Console where you will be prompted to set up your organization.
Once your organization is set up, you will be automatically redirected to the Collections tab of the Rockset Console where we'll continue to create your first collection!
#Create a Collection
A collection in Rockset is a set of Rockset documents. Similar to tables in traditional SQL databases, collections can be queried using SQL, either directly or using Query Lambdas.
In this tutorial, we will create two collections from public datasets hosted on AWS S3. One dataset is a sample of movies and information including their genre, popularity and revenue. The other dataset is a sample of movie ratings by user. Both datasets are publically available.
#Create the Movies Dataset
- In the Collections tab of the Rockset Console, select the Create your first Collection button.
- Next, select Sample Datasets as the data source for your collection.
- Give your collection a
name
(for this tutorial, we’ll use namesmovies
andmovie_ratings
for our two sample collections) and an optionaldescription
. Then, select the datasetMovies
and a source preview will automatically generate so you can explore the semi-structured JSON data in a tabular form.
- Note that you will also be given options to apply field mappings on incoming data, select a retention policy, and select a Virtual Instance type. For this tutorial, we will not be applying any field mappings or specifying a retention policy. You may select the Virtual Instance of your choice (we'll use the c12.shared Shared Instance here since the sample datasets are relatively small, but we recommend using a Dedicated Instance to test production performance).
- Click Create to complete the creation of your collection. You should now see a new collection
in state
Created
. It can take up to a minute for the collection to enter theReady
state, at which point you’ll be able to explore the data and run queries against it.
Once the collection has entered the Ready
state, you will enter the Collection Details view.
Here, you can see a preview of your collection, the schema that has been inferred from the
collection data, along with several statistics about your data.
#Create the Movie Ratings Dataset
Repeat all of the above steps for the Movie Ratings
sample dataset. During collection creation, be
sure to select the dataset Movie Ratings
in the Sources section.
#Execute a Simple Query
Now, we will query these two collections using SQL. Navigate to the Query Editor tab of the Rockset Console to start writing and running SQL queries.
We've constructed a sample query below to suggest movies to a user based on their genre preference
and the movie rating. Since genre
is an array field (as a single movie may fit multiple genres),
we’ll need to perform an UNNEST
to expand this array and create a record for each (genre, movie)
pair. We’ll also join against the movie_ratings
table to ensure that no previously seen movies are
included in this list.
- In this query, we’ll use the
Action
genre and user100
— we’ll generalize these literals in our next step. Copy the query below into the Query Editor.
SELECT
m.id,
m.title
FROM
commons.movies m,
UNNEST(m.genres) as genres
WHERE
genres.name = 'Action'
AND m.id NOT IN (
SELECT
r.movieId
FROM
commons.movie_ratings r
WHERE
r.userId = '100'
)
ORDER BY
m.popularity DESC;
- Click Run to execute the query!
#Execute a Parameterized Query
You can use query parameters to safely specify literal values in your SQL. We can add parameters to
the SQL query to specify the genre
and the userId
at runtime.
- To add a parameter in the Query Editor, select the Parameters tab below the SQL editing area, next to the Results tab.
Click Add Parameter to create a new parameter. For our example query, we will create two new parameters:
genre
with typestring
and valueAction
, anduserId
with typestring
and value100
.We'll also need to tweak the SQL statement from earlier to incorporate the parameters we just created. Here’s the new SQL statement with the parameters:
SELECT
m.id,
m.title
FROM
commons.movies m,
UNNEST(m.genres) as genres
WHERE
genres.name = :genre
AND m.id NOT IN (
SELECT
r.movieId
FROM
commons.movie_ratings r
WHERE
r.userId = :userId
)
ORDER BY
m.popularity DESC;
- Click Run to execute the query!
#Execute a Query Lambda
Query Lambdas are named, parameterized SQL queries stored in Rockset that can be executed from a dedicated REST endpoint. Using Query Lambdas, you can save your SQL queries as separate resources in Rockset. We generally recommend using Query Lambdas to build applications backed by Rockset as opposed to querying with raw SQL directly from application code.
In this tutorial, we will create a Query Lambda using our parameterized query from the last step.
- Click Create Query Lambda at the top of the Query Editor.
- Use the default parameter values of
genre
anduserId
at runtime from the previous step and select Create Query Lambda.
- Next, we'll need to create an API Key to authenticate our request to the Query Lambda's dedicated REST endpoint. Navigate to the API Keys tab of the Rockset Console to create an API key.
- Once your API key has been successfully created, navigate to the Query Lambdas tab of the Rockset Console and select your newly created Query Lambda. Here, you will find instructions on how to execute the Query Lambda from application code under the Trigger Executions section.
- Open your local terminal to execute the Query Lambda!
> curl --request POST \
> --url https://api.rs2.usw2.rockset.com/v1/orgs/self/ws/commons/lambdas/getRecommendedMovies/versions/23317f11c43bf2cb \
> -H 'Authorization: ApiKey F9ak2aTRcbsaS8falTg4ltcHmdolTdjO8gxivXj62bIzFe0wX3sNY3lyy7jcX5iQ' \
> -H 'Content-Type: application/json' \
> -d '{
> "parameters": [
> {
> "name": "genre",
> "type": "string",
> "value": "Action"
> },
> {
> "name": "userId",
> "type": "string",
> "value": "100"
> }
> ]
> }' \
> | python -m json.tool
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 22356 0 22140 100 216 68544 668 --:--:-- --:--:-- --:--:-- 69213
{
"aliases": [],
"collections": [
"commons.movie_ratings",
"commons.movies"
],
"column_fields": [
{
"name": "id",
"type": ""
},
{
"name": "title",
"type": ""
}
],
"query_id": "d32726b4-ebb2-4ced-8c73-3d21f3ac7170:MQfnDt2:0",
"results": [
{
"id": "155",
"title": "The Dark Knight"
},
{
"id": "22",
"title": "Pirates of the Caribbean: The Curse of the Black Pearl"
},
{
"id": "11",
"title": "Star Wars"
},
...
#Next Steps
#Invite Other Users
Invite members of your team to your organization using their email address in the
Users tab of the Rockset Console. You can determine if each new
user should have Administrator
, Member
or Read-Only
access to Rockset. Learn more about user
management here.
#Join the Rockset Community Slack
Join us in our Slack community and share what you are looking to build with Rockset. We’re hanging out and ready to answer your questions.
#Keep Building!
Check out some of the pages below to continue exploring Rockset: