SQL Clients > Jupyter

Jupyter

This page describes how to setup a Jupyter notebook and configure it for use with Rockset.

The steps below are intended for use with Python 3. They require that you have a Rockset account and have installed the latest Rockset Python client. Refer to the Python page for instructions on installing the Python client.

Introduction

Jupyter notebooks are an effective tool for data scientists to iterate on their work and share it with other data scientists. Rockset has deep integration with the Jupyter notebook workflow. There is deep SQL Magic and ipython-sql integration that lets you run SQL queries directly in your notebooks, turn the results into Pandas dataframes. There is also a Rockset Python client integration to do more advanced operations including creating collections and adding, removing, or modifying documents within collections.

Installation

Local Installation

You must first install the Rockset Python client. After that, you can install the Jupyter package by following the official guide. Ensure that you install the version of Jupyter for Python 3. In addition to Jupyter, you will need a couple of extra packages that can be installed as follows:

$ python3 -m pip install ipython-sql

If you’re using system Python, you may need to sudo before you can run the above commands. It is recommended to instead create a virtualenv.

Once you have installed Jupyter and the ipython-sql dependency, you should be able to run it by starting it from the terminal or command prompt.

$ jupyter notebook

Once you’re in the notebook environment, open up a new notebook and test that the installed version of the Python interpreter is 3.x.

import sys
sys.version

Interpreter version in Jupyter

If the version does not say 3.x, you may need to install a Python 3 kernel for your Jupyter notebook. This can be done using:

$ python3 -m pip install ipykernel
$ python3 -m ipykernel install --user

When it is installed, you should restart your Jupyter notebook and check that the new Python 3 kernel is available to be selected.

Conda Installation

If you’re using Conda enviroments with Jupyter Notebooks, then you can install Rockset using conda install.

$ conda install --yes --channel conda-forge --channel rockset rockset

If you have multiple Conda environments then you can pass in the name of the conda environment with the conda install --name option.

$ conda env list
$ conda install --name YOUR-CONDA-ENV-NAME --yes --channel conda-forge --channel rockset rockset

You can run these commands from your Jupyter notebook using the (!) shell escape:

Conda install from Jupyter

Running in Docker

You can also run a dockerized Jupyter installation. If you don’t have docker installed, install docker using the instructions on the official page. Once you have docker installed, you can use any of the images listed in Jupyter Docker Stacks.

If you’re using the image: jupyter/minimal-notebook, you can bring that up as follows.

$ docker run -p 8888:8888 jupyter/minimal-notebook

By default, you will lose your notebook when the docker container is terminated. In order to run it with persistent storage, you should mount a host volume into the container. You can pass the -v commandline argument to docker run.

$ chmod 777 /some/host/folder/for/work
$ docker run -p 8888:8888 jupyter/minimal-notebook -v /some/host/folder/for/work:/home/jovyan/work

Once the notebook is running, you can run the following command in the first cell in order to prepare it for use with Rockset.

!pip install rockset ipython-sql

Testing your Setup

Once you have installed Jupyter, you can run the following commands in a cell to ensure that you have it setup correctly. Replace your-api-key with your own API key that you have generated from the Rockset console. (Refer to the quickstart if you are unfamiliar with using the Rockset console.)

from rockset import Client, Q, F
%load_ext sql
%config SqlMagic.autocommit = False
%sql rockset://your-api-key@api.rs2.usw2.rockset.com/
%sql SELECT 1

Check the output of the cell for any errors. If no exceptions are thrown, you’re all set and Rockset is correctly configured for use with your notebook.

Usage

SQL Magic

Prior to using the %sql or %%sql magic syntax to run SQL statements within your Jupyter notebook, you must execute a few statements to establish a connection to Rockset.

%load_ext sql
%config SqlMagic.autocommit = False
%sql rockset://your-api-key@api.rs2.usw2.rockset.com/

Once the connection is successfully established, you can run queries using either %sql or %%sql in a cell.

%sql SELECT * from _events LIMIT 10
%%sql
SELECT
    *
FROM
    _events
LIMIT 10

Both of the above examples are identical except that the latter allows running multi-line SQL statements. You can run queries against your own collections in this manner.

Python Client

You can also interact with Rockset using the Python client and query builder. Using the Python client allows you to read results using cursors and higher level objects, as well as allowing you to create/delete collections, add/remove/modify documents in a collection, etc. The query builder provides an abstraction over SQL syntax to build queries using Python constructs instead of raw SQL statements.

from rockset import Client, Q, F
rs = Client()
result = rs.sql(Q('_events').select(F['details']).limit(10))
for r in result:
    print(r)

For detailed information on using the Python Query Builder, you can refer to the Python reference.

Dataframes

You can also turn your results into a Pandas Dataframe for easy wrangling and/or visualization. If you are using the one-line %sql syntax, you can do the following:

result = %sql SELECT * from _events LIMIT 1
result.DataFrame()

Magic to Dataframe

If you’re using the %%sql syntax, you can do this instead:

%%sql result <<

SELECT * from "_events" LIMIT 1

# next cell
result.DataFrame()

Magic to Dataframe 2

Next Steps

Once you have Rockset with SQL Magic configured with your Jupyter notebook, you can consider using the Rockset Python client which will let you build queries iteratively and use Python syntax throughout.