❗Private Preview: Contact Rockset support to enable this feature.
The ES() table function allows you to embed Elasticsearch's DSL query natively within a Rockset SQL query. When you are migrating from Elasticsearch to Rockset, you can use the ES() table function to run all of your Elasticsearch queries without having to manually translate them from Elasticsearch DSL to native Rockset SQL.
Syntax
The ES() table function takes two input parameters: the collection name as a string parameter and the Elasticsearch DSL query as an object/map. Similar to any other table function, the ES() function needs to be used in the FROM clause of your SQL query.
SELECT
*
FROM
ES('<collection-name>', <Elasticsearch DSL Query>)
Examples
SELECT
*
FROM
ES('_events',
{'query': {'term': {'userEmail': {'value': '[email protected]'}}}}
)
Collection names have to be passed in as a string. You can use the dot-notation to specify the workspace name and use nested double-quotes to escape special characters (eg: -
) inside workspace names or collection names.
Here are some examples with workspace names and collection names:
SELECT * FROM ES('_events', ...)
SELECT * FROM ES('commons._events', ...)
SELECT * FROM ES('"pre-prod"."user-logins"', ...)
If you have Elasticsearch DSL query as an map or an object in your application, then encode that as a JSON string and then use the JSON_PARSE() function in your SQL query.
Example parameterized SQL when your Elasticsearch DSL is a JSON-encoded string:
SELECT
*
FROM
ES('_events',
JSON_PARSE('{"query": {"term": {"userEmail": {"value": "[email protected]"}}}}')
)
Please note that in the example above how single-quotes and double-quotes are used. In JSON encoded strings, double quotes are used to denote strings. In SQL, single-quotes denote strings. So, please keep that in mind when constructing these types of queries by hand or programmatically.
Elasticsearch query example
If you have an ElasticSearch term
query such as the one below:
GET /_search
{
"query": {
"term": {
"user_id": {
"value": "kimchy"
}
}
}
}
You can simply run that exact same query in Rockset as follows assuming the same data is stored in a Rockset collection named foo
SELECT
*
FROM
ES('foo',
{
'query': {
'term': {
'user_id': {
'value': 'kimchy'
}
}
}
}
)
Please note the single-quotes to denote strings in the second parameter.
SQL Template for App Integration
When you want to automatically translate Elasticsearch DSL queries from your application to Rockset SQL, you can use the following parameterized query as your SQL template. This query takes two parameters, one for the collection name and the other for the Elasticsearch's DSL query encoded as a JSON string.
SELECT * FROM ES(:collection, JSON_PARSE(:esquery))
Performance
ES() function takes the input Elasticsearch DSL query and compiles it directly into Rockset’s internal IR code. This is the exact same IR code that all Rockset SQL queries gets compiled into before Rockset’s query optimization and query planning is done. So, queries that run through the ES() function will suffer no performance penalties when compared against the equivalent SQL query that natively runs on Rockset.
For example the query execution plan and hence the query performance of the following two queries will be exactly the same.
Query using the ES() function:
SELECT
*
FROM
ES('foo',
{
'query': {
'term': {
'user_id': {
'value': 'kimchy'
}
}
}
}
)
Same query written natively in SQL
SELECT
*
FROM
`foo` f
WHERE
f.user_id = 'kimchy'
When you run EXPLAIN on both those queries it will yield the same execution plan as below
select *:$2
reshuffle on_aggregators(1)
add fields on commons.foo: fields($2=*)
index filter on commons.foo: fields($1=user_id), query($1:string['kimchy'])
Elasticsearch Join Queries
Elasticsearch join queries (nested, has_child, has_parent
) are not supported out of the box. Such queries will have to be broken down into smaller ES fragments and then combined with native SQL JOIN functionality available in Rockset.
Here is an example query in Rockset that JOINs the results of two Elasticsearch term
queries in a single SQL query.
-- Find all purchases of product_id = 42 from users who live in Palo Alto
SELECT
*
FROM
ES('users',
{
'query': {
'term': {
'city': {
'value': 'Palo Alto'
}
}
}
}
)
JOIN
ES('purchases',
{
'query': {
'term': {
'product_id': {
'value': 42
}
}
}
}
) ON users.user_id = purchases.customer_id
Elasticsearch Sort Search Result Queries
sort
keyword is supported and the results are guaranteed to be in the desired order even if the SQL wrapper is missing an explicit ORDER BY clause.
For example, the following three queries will return the exact same results:
Query using sort
keyword within Elasticsearch DSL
SELECT
*
FROM
ES('foo',
{
'sort' : [
{ 'post_date' : {'order' : 'asc'}},
'user',
{ 'name' : 'desc' },
{ 'age' : 'desc' }
],
'query' : {
'term' : { 'user' : 'kimchy' }
}
}
)
Equivalent query using an explicit ORDER BY clause instead of the sort
keyword within Elasticsearch DSL
SELECT
*
FROM
ES('foo',
{
'query' : {
'term' : { 'user' : 'kimchy' }
}
}
)
ORDER BY
post_date ASC, user, name DESC, age DESC
Equivalent native SQL query:
SELECT
*
FROM
foo
WHERE
user = 'kimchy'
ORDER BY
post_date ASC, user, name DESC, age DESC
Elasticsearch Aggregations
ES() function has limited support for aggs
keyword. Only 1 top level aggregation is supported with only 1 level of sub-aggregation. Multiple sub-aggregations are allowed.
Here is aggregation query example that is supported by ES()
SELECT
*
FROM
ES('foo',
{
'aggs': {
'my-agg-name': {
'terms': {
'field': 'my-field'
},
'aggs': {
'avg-other-field': {
'avg': {
'field': 'my-other-field'
}
},
'max-third-field': {
'max': {
'field': 'my-third-field'
}
}
}
}
}
}
)
The query above will return the exact results as the following SQL query.
SELECT
COUNT(*) AS _count,
AVG("my-other-field") AS "avg-other-field",
MAX("my-third-field") AS "max-third-field"
FROM
foo
GROUP BY
"my-field"
Elasticsearch Term-level Queries
- Supported keywords:
exists, ids, prefix, range, regexp, term, wildcard
- Unsupported keywords:
fuzzy, term_set
Elasticsearch Compound Queries
bool
: supported without scoringmust
is supported the same way asfilter
should
is only supported withminimum_should_match=1
boosting, constant_score, dis_max, function_score
are not supported.
Elasticsearch Full Text Queries
- Limited support:
match, match_phrase, multi_match
- Requires tokenizing string data during ingestion.
multi_match
requires listing the match fields explicitly.
Known Limitations
ES
function doesn’t supportscore
, which means any keywords involves calculating scoring are not supported, such asboosting
,constant_score
,dis_max
, orfunction_score
.- Fuzzy search is not supported.
match
requires tokenizing the string data during ingest time. Or usingprefix
orregex
function as replacement.multi_match
includesmatch
limitation, and requires explicitly listing the fields to be matched.query_string
andsimple_query_string
are not supported for Lucene queries.should
is only supported withminimum_should_match=1