This page describes how to setup a Jupyter notebook and configure it for use with Rockset.
The steps below are intended for use with Python 3. They require that you have the latest Rockset Python SDK installed. Refer to the Python page for instructions on installing the Python SDK.
Jupyter notebooks are an effective tool for data scientists to iterate on their work and share it with other data scientists. Rockset has deep integration with the Jupyter notebook workflow. There is deep SQL Magic and ipython-sql integration that lets you run SQL queries directly in your notebooks, turn the results into Pandas dataframes. There is also a Rockset Python SDK integration to do more advanced operations including creating collections and adding, removing, or modifying documents within collections.
You must first install the Rockset Python SDK. After that, you can install the Jupyter package by following the official guide. Ensure that you install the version of Jupyter for Python 3. In addition to Jupyter, you will need a couple of extra packages that can be installed as follows:
$ python3 -m pip install ipython-sql
If you're using system Python, you may need to
sudo before you can run the above commands. It is
recommended to instead create a virtualenv.
Once you have installed Jupyter and the ipython-sql dependency, you should be able to run it by starting it from the terminal or command prompt.
$ jupyter notebook
Once you're in the notebook environment, open up a new notebook and test that the installed version of the Python interpreter is 3.x.
import sys sys.version
If the version does not say 3.x, you may need to install a Python 3 kernel for your Jupyter notebook. This can be done using:
$ python3 -m pip install ipykernel $ python3 -m ipykernel install --user
When it is installed, you should restart your Jupyter notebook and check that the new Python 3 kernel is available to be selected.
If you're using Conda enviroments with Jupyter Notebooks, then you can install Rockset using
$ conda install --yes --channel conda-forge --channel rockset rockset
If you have multiple Conda environments then you can pass in the name of the conda environment with
conda install --name option.
$ conda env list $ conda install --name YOUR-CONDA-ENV-NAME --yes --channel conda-forge --channel rockset rockset
You can run these commands from your Jupyter notebook using the (!) shell escape:
#Running in Docker
You can also run a dockerized Jupyter installation. If you don't have docker installed, install docker using the instructions on the official page. Once you have docker installed, you can use any of the images listed in Jupyter Docker Stacks.
If you're using the image: jupyter/minimal-notebook, you can bring that up as follows.
$ docker run -p 8888:8888 jupyter/minimal-notebook
By default, you will lose your notebook when the docker container is terminated. In order to run it
with persistent storage, you should mount a host volume
into the container. You can pass the
-v commandline argument to
$ chmod 777 /some/host/folder/for/work $ docker run -p 8888:8888 jupyter/minimal-notebook -v /some/host/folder/for/work:/home/jovyan/work
Once the notebook is running, you can run the following command in the first cell in order to prepare it for use with Rockset.
$ pip install rockset ipython-sql
#Testing your Setup
Once you have installed Jupyter, you can run the following commands in a cell to ensure that you
have it setup correctly. Replace
your-api-key with your own API key that you have generated from
the Rockset console. (Refer to the quickstart if you
are unfamiliar with using the Rockset console.)
from rockset import Client, Q, F %load_ext sql %config SqlMagic.autocommit = False %sql rockset://firstname.lastname@example.org/ %sql SELECT 1
Check the output of the cell for any errors. If no exceptions are thrown, you're all set and Rockset is correctly configured for use with your notebook.
Prior to using the
%%sql magic syntax to run SQL statements within your Jupyter
notebook, you must execute a few statements to establish a connection to Rockset.
%load_ext sql %config SqlMagic.autocommit = False %sql rockset://email@example.com/
Once the connection is successfully established, you can run queries using either
in a cell.
%sql SELECT * from _events LIMIT 10
%%sql SELECT * FROM _events LIMIT 10
Both of the above examples are identical except that the latter allows running multi-line SQL statements. You can run queries against your own collections in this manner.
You can also interact with Rockset using the Python SDK and query builder. Using the Python SDK allows you to read results using cursors and higher level objects, as well as allowing you to create/delete collections, add/remove/modify documents in a collection, etc. The query builder provides an abstraction over SQL syntax to build queries using Python constructs instead of raw SQL statements.
from rockset import Client, Q, F rs = Client() result = rs.sql(Q('_events').select(F['details']).limit(10)) for r in result: print(r)
For detailed information on using the Python Query Builder, you can refer to the Python reference.
You can also turn your results into a
Pandas Dataframe for
easy wrangling and/or visualization. If you are using the one-line
%sql syntax, you can do the
result = %sql SELECT * from _events LIMIT 1 result.DataFrame()
If you're using the
%%sql syntax, you can do this instead:
%%sql result << SELECT * from "_events" LIMIT 1 # next cell result.DataFrame()
Once you have Rockset with SQL Magic configured with your Jupyter notebook, you can consider using the Rockset Python SDK which will let you build queries iteratively and use Python syntax throughout.