Usage

Rockset’s entire Python Client is contained within a single Python module called rockset.

Note

If you haven’t installed the Rockset Python Client, follow instructions in the Quickstart doc.

The two most commonly used part of Rockset’s Python Client are:

  1. Client object: Client

    Used to connect securely to Rockset.

  2. Query builder: Q, F

    Used to build powerful and complex Rockset queries using Python expressions.

Example

Write a Python program to select all documents from collection hello_world which has a field called name whose value is equal to "Jim Gray":

# connect to Rockset
from rockset import Client, Q, F
rs = Client()

# build a query object
q = Q('hello_world').where(F['name'] == 'Jim Gray')
results = rs.sql(q)

Refer to the `Query`_ module for more detailed documentation on how to construct more complex queries using the query builder.


Connect

Usage

Client objects allow you to connect securely to the Rockset service. All other API calls require a valid Client object.

In order to create a Client object, you will need a valid Rockset API key. If you have access to the Rockset Console, then you can use the console to create an API key. If not, please contact the Rockset team at support@rockset.io

from rockset import Client

# connect securely to Rockset production API servers
client = Client(api_server='api.rs2.usw2.rockset.com',
                api_key='XKQL6YCU0zDUglhWHPMDDmDYyMxDHrASGk5apCnn3A07twh')

You can manage your api_key credentials using the rock command-line tool. Run the rock configure tool to setup one or more api_key credentials and select the one that you want all rock commands and the Python Rockset Client to use. Once setup, you should expect the following to work.

from rockset import Client

# connect to the active credentials profile
# you can see see the active profile by running ``rock configure ls``
rs = Client()

# connect to credentials profile 'prod' as defined by ``rock configure``
rs = Client(profile='prod')

Example

Connect to Rockset API server and then subsequently use the client object to retrieve collections.

from rockset import Client

# connect securely to Rockset dev API server
rs = Client(api_server='api-us-west-2.rockset.io',
            api_key='adkjf234rksjfa23waejf2')

# list all collections in the account that I have access to
all_collections = rs.Collection.list()

# create a new collection; returns a collection object
new_collection = rs.Collection.create('customer_info')

# get details of an existing collection as a collection object
users = rs.retrieve('users')

Load Data

Usage

Collection objects repesents a single Rockset collection. These objects are generally created using a Rockset Client_ object using methods such as:

from rockset import Client

# connect to Rockset
rs = Client(api_key=...)

# create a new collection
user_events = rs.Collection.create('user-events')

# retrieve an existing collection
users = rs.retrieve('users')

You can add documents to the collection using the add_docs() method. Each document in a collection is uniquely identified by its _id field.

If documents added have _id fields that match existing documents, then their contents will be merged. Otherwise, the new documents will be added to the collection.

You can remove documents from a collection using the remove_docs() method.

Refer to the Query_ module for documentation and examples on how to query collections.

Example

from rockset import Client, Q, F

# connect securely to Rockset
rs = Client()

# retrieve the relevant collection
emails = rs.Collection.retrieve('emails')

# look for all emails to johndoe that contains the term 'secret'
johndoe_secret_q = Q('emails').where(
    (F["to"].startswith('johndoe@')) & (F["body"][:] == 'secret')
)

# query the collection
docs = rs.query(query=johndoe_secret_q).results()

Create a new collection

Creating a collection using the Client_ object is as simple as calling client.Collection.create("my-new-collection"):

from rockset import Client
rs = Client()
new_collection = rs.Collection.create("my-new-collection")

# create a collection in a workspace
event-data-collection = rs.Collection.create("leads",
                                             workspace="marketing")

# create a collection and map timestamp field to event-time
event-data-collection = rs.Collection.create("event-data-collection",
                        event_time_field="timestamp",
                        event_time_format="milliseconds_since_epoch",
                        event_time_default_timezone="UTC")

Creating a collection with a retention duration of 10 days:

from rockset import Client
rs = Client()
new_collection_with_retention = rs.Collection.create("my-event-collection",
                                                retention_secs=10*24*60*60)

List all collections

List all collections in a workspace using the Client_ object using:

from rockset import Client
rs = Client()

# list all collections in the commons workspace
collections = rs.list()

# list all collections in another workspace
collections = rs.list(workspace='marketing')

Retrieve an existing collection

Retrive a collection to run various operations on that collection such as adding or removing documents or executing queries:

from rockset import Client
rs = Client()
users = rs.retrieve('users')

# retrieve a collection in a workspace
users = rs.retrieve('users', workspace='marketing')

Describe an existing collection

The describe method can be used to fetch all the details about the collection such as what data sets act as the collection’s sources, various performance and usage statistics:

from rockset import Client
rs = Client()
users = rs.retrieve('users')
print(users.describe())

Drop a collection

Use the drop() method to remove a collection permanently from Rockset.

Note

This is a permanent and non-recoverable operation. Beware.

from rockset import Client
rs = Client()
users = rs.retrieve('users')
users.drop()

Add documents to a collection

Python dicts can be added as documents to a collection using the add_docs method. Documents are uniquely identified by the _id field. If an input document does not have an _id field, then an unique id will be assigned by Rockset.

If the _id field of an input document does not match an existing document, then a new document will be created.

If the _id field of an input document matches an existing document, then the new document will be merged with the existing document:

from rockset import Client
import json

rs = Client()
users = rs.Collection.retrieve('users')
with open('my-json-array-of-dicts.json') as data_fh:
    ret = users.add_docs(json.load(data_fh))

Delete documents from a collection

Remove documents from a collection using the remove_docs method:

from rockset import Client

rs = Client()
users = rs.Collection.retrieve('users')
users_to_remove = ['user007', 'user042', 'user435']
docs_to_remove = [{'_id': u} for u in users_to_remove]
ret = users.remove_docs(docs_to_remove)

Query using Query Builder

Usage

Query module contains a set of APIs that allows you to compose powerful queries over collections.

This module comprises of two major components:

  • QQuery Builder

    Used to compose complex and powerful queries.

>>> from rockset import Q
>>> q = Q('hello-world').limit(10)
>>> (sqltxt, sqlargs) = q.sql()
>>> print(sqltxt)
SELECT *
FROM "hello-world"
LIMIT 10
>>>
  • FField Reference

    Used to construct field expressions that refer to particular fields within a document.

>>> from rockset import F
>>> (F['answer'] == 42).sqlexpression()
'"answer" = 42'
>>>

Example

from rockset import Client, Q, F

# connect to Rockset
api_key = 'adkjf234rksjfa23waejf2'
rs = Client(api_key=...)

# fetch user whose "_id" == "u42"
u = rs..query(Q('users').where(F["_id"] == "u42"))

# fetch the 100 oldest users in the 'users' collection
q = Q('users').highest(100, F["age"])
old100 = rs.query(q)

# find the average rating of all songs by "The Beatles"
q = Q('songs').where(F["artist"] == "The Beatles").select(F["rating"].avg())
avg_rating = rs.query(q)

Query Operators: Overview

rockset.Q(query, alias=None)[source]

All query objects are constructed using the Q(<collection-name>) query builder construct and are then followed by a chain of query operators to build the full query expression.

  • Constructor

    • Q: Specify the collection to be queried

    >>> # return all documents in the logins collection.
    >>> q = Q('logins')
    
  • Filter queries

    • where: Classic selection operator to only return documents that match the given criteria. Use F to construct field expressions to specify the selection criteria.

    >>> # return all docs in logins where field "user_id" is equal to "u42"
    >>> q = Q('logins').where(F['user_id'] == 'u42')
    
  • Projection

    • select: Specify the list of desired fields to be returned

    >>> # will return the fields "user_id" and "login_ip" from all docs in logins
    >>> q = Q('logins').select(F['user_id'], F['login_ip'])
    
  • Pagination

    • limit : Specify limit with skip support

    >>> # return 10 documents from logins after skipping the first 40 results
    >>> q = Q('logins').limit(10, skip=40)
    
  • Sorting

    • highest, lowest: Find the top N or the bottom N

    >>> # will return 10 documents with the most recent "login_time"
    >>> q = Q('logins').highest(10, F['login_time'])
    
  • Aggregation

    • aggregate: Group by and aggregate fields

    >>> # will aggregate all documents in logins by "user_id",
    >>> # and return "user_id", max("login_time") and count(*) after aggregation.
    >>> Q('logins').aggregate(F['user_id'], F['login_time'].max(), F.count())
    
  • Joins

    • join: Regular JOIN operator

    • TODO: docs coming soon

    • lookup: LEFT OUTER JOIN operator

    • TODO: docs coming soon

    • apply: Graph traversal operator

    • TODO: docs coming soon

Field Expressions Overview

rockset.F = <rockset.query.FieldRef object>

F is a field reference object that helps in building query expressions natively using Python language expressions. F uses Python operator overloading heavily and operations on field references generate Query_ objects that can be used in conjunction with Q to build compose complex queries.

  • Value comparators

    • ==, !=, <, <=, >, >=:

    >>> # match all docs where "first_name" is equal to "Jim"
    >>> F["first_name"] == "Jim"
    >>> # match all docs where "rating" is greater than or equal to 4.5
    >>> F["rating"] >= 4.5
    >>> # match all docs where "title" text is lexographcially greater than "Star Wars"
    >>> F["title"] >= "Star Wars"
    
  • String functions

    • startswith, like: Prefix and classic SQL LIKE expressions

    >>> # match all docs where "title" starts with "Star Wars"
    >>> F["title"].startswith("Stars Wars")
    >>> # match all docs where "title" contains the word "Wars"
    >>> F["title"].like("% Wars %")
    
  • Boolean compositions

    • &, |, ~: AND, OR and NOT expressions

    >>> # match all records with "rating" >= 4.5 AND "title" starts with "Star Wars"
    >>> e1 = (F["rating"] >= 4.5) & F["title"].startswith("Star Wars")
    >>> # match all records with "director" == "George Lucas" OR "title" starts with "Star Wars"
    >>> e2 = (F["director"] == 'George Lucas') | F["title"].startswith("Star Wars")
    >>> # match all records that are not included in expressions e1 or e2
    >>> e1e2_complement = ~(e1 | e2)
    
  • Field aggregations

    • avg, collect, count, countdistinct, max, min, sum

    >>> # count(*)
    >>> F.count()
    >>> # min(login_time)
    >>> F["login_time"].min()
    >>> # max(login_time) as last_login_time
    >>> F["login_time"].max().named('last_login_time')
    
  • Nested documents and arrays

    • []: The [] notation can be used to refer to fields within nested documents and arrays.

    • Consider a collection where documents looked like this example below.

    {
      "_id": {"u42"},
      "name": {
          "first": "James",
          "middle": "Nicholas",
          "last": "Gray" },
      "tags": [
          "ACID",
          "database locking",
          "two phase commits",
          "five-minute rule",
          "data cube",
          "turing award" ]
     }
    
    • Example field references to access nested documents and arrays:

    >>> # expression to find all documents where field "name" contains a
    >>> # nested field "middle" with value equal to "Nicholas"
    >>> F["name"]["middle"] == "Nicholas"
    >>>
    >>> # similarly, for array fields, you can specify the array offset.
    >>> # expression to find all documents where the first "tags" field
    >>> # is equal to "ACID"
    >>> F["tags"][0] == "ACID"
    
    • In order to match against any element within an array field, you can use Python’s empty slice [:] notation.

    >>> # expression to find all documents where the "tags" array field
    >>> # contains "ACID" as one of the elements
    >>> F["tags"][:] == "ACID"
    >>> # find all documents where one of the "tags" is "turing award"
    >>> F["tags"][:] == "turing award"
    

Query Operator: Filters

Where operator

Syntax: <Query>.where(<Query>)

where allows you to chain a new query object as a conjuntion. In most cases, field reference expressions are sufficient, but where comes in especially handy when you want to sub-select documents following another operation such as a sort or an aggregation.

Examples:

# find all "Jim"s who are in the top 100 highest scorers
Q('players')  \
.highest(100, F["score"])  \
.where(F["first_name"] == "Jim")
Query.where(query)[source]

Returns a new query object that when executed will only return documents that match the current query object AND the query object provided as input.

Parameters

query (Query) – the conjunct query object

Returns

new query object that returns documents in self AND query

Return type

Query

Query Operator: Projection

Select operator

Syntax: <Query>.select(<field_ref> [, <field_ref> [, ...]])

Allows you to specify the fields that you wish to include in the query results.

Examples:

Q('authors') \
.where(F["last_name"] == "Gray")  \
.select(F["first_name"], F["last_name"], F["age"])
Query.select(*fields)[source]

Returns a new query object that when executed will only include the list of fields provided as input.

Parameters

fields (list of FieldRef) – fields you wish to select

Returns

new query object that includes the desired field selection

Return type

Query

Query Operator: Pagination

Limit operator

Syntax: <Query>.limit(<max_results> [, <skip_count>])

Limit operator allows you to perform pagination queries and positional filters.

Examples:

# find the "_id" field of the 5 most recently uploaded documents
# since the default sorting is more recently updated first,
# this query will simply be:
Q('uploads').limit(5)

# fetch a third batch of 100 results, for all users older than 18
# i.e., skip the first 200 results
Q('uploads').where(F["age"] >= 18).limit(100, skip=200)
Query.limit(limit, skip=0)[source]

Returns a new query object that when executed will only return a subset of the results. The query when executed will return no more than limit results after skipping the first skip number of results. The limit operator is most commonly used for pagination.

Parameters
  • limit (int) – maximum number of results to return

  • skip (int) – the number of results to skip

Returns

new query object that only returns the desired subset

Return type

Query

Query Operator: Sorting

Highest, Lowest operators

Syntax: <Query>.highest(N, <field_ref> [, <field_ref> [, ...]]), <Query>.lowest(N, <field_ref> [, <field_ref> [, ...]])

Examples:

Q(F["last_name"] == "Gray").highest(5, F["score"], F["first_name"])
Q(F["last_name"] == "Gray").lowest(10, F["salary"])
Query.highest(limit, *fields)[source]

Returns a new query object that when executed will sort the results from the current query object by the list of fields provided as input in descending order and return top N defined by the limit parameter.

Parameters
  • limit (int) – top N results you wish to fetch

  • fields (list of FieldRef) – fields you wish to sort

  • by (descending) –

Returns

new query object that returns top N descending

Return type

Query

Query.lowest(limit, *fields)[source]

Returns a new query object that when executed will sort the results from the current query object by the list of fields provided as input in ascending order and return top N defined by the limit parameter.

Parameters
  • limit (int) – top N results you wish to fetch

  • fields (list of FieldRef) – fields you wish to sort

  • by (ascending) –

Returns

new query object that returns top N ascending

Return type

Query

Query Operator: Aggregation

Aggregate operator and field ref aggregate functions

Syntax: <Query>.aggregate(<field_ref> [, <field_ref> [, ...]])

Field reference objects can also include any of the following aggregation functions:

  • min

  • max

  • avg

  • sum

  • count

  • countdistinct

  • approximatecountdistinct

  • collect

You can also optionally provide a field name alias in the field reference using the named function. This comes in especially handy for the aggregated fields.

Examples:

# find min and max salaries broken down by age
Q('employees').aggregate(F["age"], F["salary"].min(), F["salary"].max())
# will return documents such as:
# {"age", "18", "min(salary)": 50000, "max(salary)": 150000}
# {"age", "19", "min(salary)": 50000, "max(salary)": 152000}

# example using field name alias
Q('employees').aggregate(F["age"], F["salary"].avg().named("avg_salary"))
# will return documents such as:
# {"age", "18", "avg_salary": 82732}
Query.aggregate(*fields)[source]

Returns a new query object that when executed will aggregate results from the current query object by the list of fields provided as input.

Field reference objects can include one of the supported aggregate functions such as max, min, avg, sum, count, countdistinct, approximatecountdistinct, collect as follows: <field_ref>.max(), <field_ref>.min(), … .

The list of fields provided as input can contain a mix of field references that include an aggregate function and field references that does not.

Parameters

fields (list of FieldRef) – fields you wish to aggregate by

Returns

new query object that includes the desired field aggregations

Return type

Query

FieldRef.min()[source]

Returns a new FieldRef that represents a min() aggregation of the given field.

Returns

FieldRef object that represents the desired min aggregation.

Return type

AggFieldRef

FieldRef.max()[source]

Returns a new FieldRef that represents a max() aggregation of the given field.

Returns

FieldRef object that represents the desired max aggregation.

Return type

AggFieldRef

FieldRef.avg()[source]

Returns a new FieldRef that represents an avg() aggregation of the given field.

Returns

FieldRef object that represents the desired avg aggregation.

Return type

AggFieldRef

FieldRef.sum()[source]

Returns a new FieldRef that represents a sum() aggregation of the given field.

Returns

FieldRef object that represents the desired sum aggregation.

Return type

AggFieldRef

FieldRef.count()[source]

Returns a new FieldRef that represents a count() aggregation of the given field.

Returns

FieldRef object that represents the desired count aggregation.

Return type

AggFieldRef

FieldRef.countdistinct()[source]

Returns a new FieldRef that represents a countdistinct() aggregation of the given field.

Returns

FieldRef object that represents the desired countdistinct aggregation.

Return type

AggFieldRef

FieldRef.approximatecountdistinct()[source]

Returns a new FieldRef that represents a approximatecountdistinct() aggregation of the given field.

Returns

FieldRef object that represents the desired approximatecountdistinct aggregation.

Return type

AggFieldRef

FieldRef.collect()[source]

Returns a new FieldRef that represents a collect() aggregation of the given field.

Returns

FieldRef object that represents the desired collect aggregation.

Return type

AggFieldRef

Query Operator: Joins

Lookup operator

Syntax: <Query>.lookup(<local_field_ref> [, <target_field_ref>] [, <target_query_ref>] [, <new_field_ref>])

Lookup operator allows you to perform a LEFT OUTER JOIN between results of the current <Query> object and the results of the <target_query_ref>.

The LEFT OUTER JOIN operation will be performed on the <local_field_ref> from the results of the current <Query> object and the <target_field_ref> in <target_query_ref>.

The output of the JOIN operation will be presented as an array value within the <new_field_ref> in the post JOIN results.

Examples:

# Assume you have following 2 collections:
#
# "employees" collection:
#   {"_collection": "emp", "_id": "e42",
#       "name": "Jim Gray", "deptId": "d7" }
#   {"_collection": "emp", "_id": "e43",
#       "name": "Peter Parker", "deptId": "d8" }
#   {"_collection": "emp", "_id": "e44",
#       "name": "Jane Doe", "deptId": null }
#   {"_collection": "emp", "_id": "e45",
#       "name": "John Smith" }
#
# "dept" collection (only has "d7" but no "d8"):
#   {"_collection": "dept", "_id": "d7", "name": "eng" }
#

# fetch all employees along with their relevant dept records
rs = Client()
q = Q("emp").lookup(F["deptId"], target_field=F["_id"],
        target_query=Q("dept"))
everyone = rs.query(q)

# the above query will return the following results:
#
# [ {"_collection": "emp", "_id": "e42",
#       "name": "Jim Gray", "deptId": "d7",
#       "deptId:lookup":
#           [ {"_collection": "dept", "_id": "d7", "name": "eng"} ]
#   },
#   {"_collection": "emp", "_id": "e43",
#       "name": "Peter Parker", "deptId": "d8",
#       "deptId:lookup": [ ]
#   },
#   {"_collection": "emp", "_id": "e44",
#       "name": "Jane Doe", "deptId": null,
#       "deptId:lookup": null
#   },
#   {"_collection": "emp", "_id": "e45",
#       "name": "John Smith"
#   }
# ]

Another example:

# another query on the same two collections described above
# example to fetch name of employee "u42" along with their dept name
#
# first define target_query to only select dept id (JOIN field) and name
target_query = Q("dept").select(F["_id"], F["name"])

#
# LEFT OUTER JOIN between employee "u42" and target_query
# Note: default target_field '_id' works here and is not specified
jim = Q("emp").where(F["_id"] == "u42").lookup(
          local_field=F["deptId"],
          target_query=target_query,
          new_field=F["dept"])

#
# only select relevant fields post JOIN
jim = jim.select(F["name"], F["dept"])

# execute the query
rs = Client()
everyone = rs.query(jim)

# the above query will return the following results:
# [ { "name": "Jim Gray", "dept": [ {"_id": "d7", "name": "eng"} ]
#   }
# ]
Query.lookup(local_field, target_field=None, target_query=None, new_field=None)[source]

Lookup allows you to perform a LEFT OUTER JOIN between the results of the current query object and the results of the target_query provided as input. The LEFT OUTER JOIN operation will be performed on the local_field from the results of the current query object and the target_field field in target_query. All results from the target_query whose target_field value matches the local_field value, will be presented as an array value within the new_field in the post JOIN results.

local_field: For every result document in the current query object, the value of the local_field is JOINed against the target_field in the target_query. All the documents that match from the target_query are presented in a new field whose field name defined by the new_field parameter.

This field is mandatory.

target_field: The field from the target_query results against which the local_field should be JOINed with.

This field is optional.

Default value for target_field is F["_id"] field.

target_query: Defined as a Query object, whose results will be used to JOIN against the local_field. The results from the target_query should include the target_field.

All fields selected from the target_query will be present within the new_field in the post JOIN results.

This field is optional.

Default value for target_query is Q(<source-collection>), which will return all documents in the current collection.

new_field: New field in every result document will contain an array value of all the matching results from the target_query.

If the local field is undefined in a result document, then the new field will also be undefined in that result document.

If the local field value is null in a result document, then the new field will also be null in that result document.

If the local field value is defined and not null in a result document, then the new field will have an array of all the documents from the target query results whose target field matches the local field value.

The new field will be an empty array if there are no matches.

This new field name is optional and defaults to local field name concatenated with :lookup.

Note

If the target_query does not contain target_field, then there will not be any matches with the local_field value, and thus the new_field will be an empty array for all results.

Parameters
  • local_field (FieldRef) – local field you wish to perform the LEFT OUTER JOIN. This is a required parameter.

  • target_field (FieldRef) – target field in the target query against which you wish to perform the LEFT OUTER JOIN. This is an optional parameter. Default value: F['_id'] field

  • target_query (Query) – defines a Query object, whose results will be used to match against the local_field. This is an optional parameter. Defaults to Q(<source-collection>)

  • new_field (FieldRef) – defines a new field where the results of LEFT OUTER JOIN will be present. This is an optional parameter. Defaults to local field name concatenated with “:lookup”.

Returns

new query object that includes the desired LEFT OUTER JOIN

Return type

Query

Apply operator

Syntax: <Query>.apply(<field_ref>)

Apply operator matches all documents whose <field_ref> value matches the results of the given <Query>.

Apply operators enable graph queries that allows you to perform graph traversals across related fields in different collections without requiring any special data shaping or schema modeling or graph specific indexing.

The common pattern for using apply will look as follows:

Step 1: Search, process and filter for relevant documents

Step 2: Select the field that defines the source vertex

Step 3: Apply over the field that defines the destination vertex

Step 4: Go back to Step 1 for another graph traversal hop, if required.

Examples:

# Assume you have following 2 collections:
#
# "bad_ips" collection:
#   {"_collection": "bad_ips", "ip_address": "106.6.6.6", "last_seen": ... }
#   {"_collection": "bad_ips", "ip_address": "107.6.6.6", "last_seen": ... }
#
# "login_attempts" collection:
#   {"_collection": "login_attempts", "login_ip": "72.43.99.108", ... }
#   {"_collection": "login_attempts", "login_ip": "106.6.6.6", ... }
#
# You can model the above 2 collections as a graph, where there exists
# an edge between every login_attempts.login_ip and the corresponding
# bad_ips.ip_address
#
# The following query will find all login_attempts from any of the
# bad_ips:
rs = Client()
results = rs.query( Q("bad_ips")  \
                         .select(F["ip_address"])  \
                         .apply(F["login_ip"], Q("login_attempts")) )

# the slow and inefficient way to achive the same would be to
# materialize the list of all bad_ips and then building out a
# new query object which could potentially be quite big.
# DO NOT USE THIS. USE APPLY OPERATOR INSTEAD.
bad_ips = rs.query(Q("bad_ips"))
suspicious_logins = None
for bad_ip in bad_ips:
    is_login_ip_bad = (F["login_ip"] == bad_ip["ip_address"])
    if suspicious_logins:
        suspicious_logins |= is_login_ip_bad
    else:
        suspicious_logins = is_login_ip_bad

# execute the query against "login_attempts"
results = rs.query(Q("login_attempts").where(suspicious_logins))

Another example:

# Assume you have following 3 collections:
# "users"
# {"_id": "u42", "name": "Patrick", "zip": "94107"}
# {"_id": "u67", "name": "Paul", "zip": "94306"}
#
# "merchants"
# {"_id": "m2345", "name": "Muddy Waters", "zip": "53706"}
# {"_id": "m8442", "name": "Beli Deli", "zip": "94002"}
#
# "payments"
# {"_id": "p23rsdkjkapw3", "payer": "u42", "merchant": "m2345", "amount": 17.45, "time": 1488244768}
# {"_id": "qase8432akdfa", "payer": "u67", "merchant": "m8442", "amount": 42.72, "time": 1488244807}
# {"_id": "r23raskdfa235", "payer": "u42", "merchant": "m8442", "amount": 11.83, "time": 1488244942}
# {"_id": "s06slkgeka92s", "payer": "u42", "merchant": "m8442", "amount": 10000, "time": 1488244944}
#

# find all merchants in zip code 53706,
# then find all of their payments,
# then find the top 100 users by total amount spent
merchants = Q("merchants").where(F["zip"] == "53706")
payments  = merchants.select(F["_id"]).apply(F["merchant"], Q("payments"))
all_users = payments.aggregate(F["payer"], F["amount"].sum().named("total_amount"))
top_users = all_users.highest(10, F["total_amount"])

# execute the top_users query
rs = Client()
results = rs.query(top_users)
Query.apply(to_field, target_query=None)[source]

Returns a new query object that when executed will match all documents where values of the to_field provided as input will match any of the results from the current query object.

The current query object is expected to have a single field in them, and is commonly achieved using the select operator.

Read more about the apply operator in the Graph queries section.

Parameters
  • to_fields (FieldRef) – field you wish to match the results with

  • target_query (Query) – Optional. Use this for cross-collection graph queries, where the apply needs to work on a different collection than what the current query object is referring to.

Example

Find all login IP addresses for user ‘u42’, and then find all activity logs from any of those IP addresses:

Q('logins')
  .where(F['user'] == 'u42')
  .select(F['source_ip'])
  .apply(F['ip_address'], Q('activity_logs'))
Returns

new query object that includes the desired apply operation

Return type

Query


Field Expression: Value Comparators

Equality operator: ==

Supported types: int, float, bool, str

Syntax: <field_ref> == <value>

Examples:

F["first_name"] == "Jim"
F["year"] == 2017
F["score"] == 5.0
F["tags"][:] == 'critical'

Value comparators: <, <=, >=, >

Supported types: int, float, str

Syntax: <field_ref> < <value>, <field_ref> <= <value>, <field_ref> >= <value>, <field_ref> > <value>

Examples:

F["year"] < 2000
F["year"] >= 2007
F["rating"] >= 4.5
F["title"] >= "Star Wars"

Prefix operator: startswith

Supported types: str

Syntax: <field_ref>.startswith(<prefix>)

Examples:

F["first_name"].startswith("Ben")

Like operator: like

Supported types: str

Syntax: <field_ref>.like(<pattern>)

Examples:

F["address"].like("%State St%")

Field alias: named

Supported types: All field references

Syntax: <field_ref>.named(<new-field-name>)

Examples:

F["full_name"].named("name")
F["login_time"].max().named("last_login_time")

Field existence: is_defined

Supported types: All

Syntax: <field_ref>.is_defined()

Field existence tested with <field_ref>.is_defined() will match all documents where the field is defined, even if it has a null value.

Null comparison: is_not_null

Supported types: All

Syntax: <field_ref>.is_not_null()

Field expression <field_ref>.is_not_null() will match all documents where the field is defined and has a non-null value.

Tip

There is no is_null() because of the potential confusion of calling is_null() on an undefined field. Use ~<field_ref>.is_not_null() or <field_ref>.is_defined() & ~<field_ref>.is_not_null() depending on your use case.

Nested operator

Syntax: <field_ref>.nested(<Query>)

nested operator makes it easy to work with nested array of documents.

Example:

# find all books authored by 'Jim Gray'
F["authors"].nested((F["first_name") == "Jim") & (F["last_name"] == "Gray"))

# find all users who logged in from given IP on June 06, 2006
F["logins"].nested((F["ipv4"] == "10.6.6.6") & (F["login_date"] == "2006-06-06"))

Field Expression: Boolean Compositions

Three different boolean operators (&, |, and ~) are overloaded to allow easy composition of boolean operators.

Note

The boolean operators are NOT and, or, and not, as those are special and cannot be overridden in Python.

AND operator: & (intersection)

Syntax: <Query object> & <Query object>

Examples:

# find all documents where field tags contains the "turing award"
# and the age is greater than 40
(F["tags"][:] == "turing award") & (F["age"] > 40)

OR operator: | (union)

Syntax: <Query object> | <Query object>

Examples:

# find all documents where the first_name is "jim"
# or last_name is "gray"
(F["first_name"] == "jim") | (F["last_name"] == "gray")

NOT operator: ~ (negation)

Syntax: ~<Query object>

Examples:

# find all documents whose title does not contain the term "confidential"
~F["title"][:] == "confidential"