- Getting Started
- What is Rockset?
What is Rockset?
Rockset is a real-time analytics solution enabling low latency search, aggregations, and joins on massive semi-structured data, without operational burden. Rockset automatically indexes your data – structured, semi-structured, geo and time series data – for real-time search and analytics at scale.
Using Rockset, you can create personalized user experiences, build real-time decision systems, serve IoT applications, and more, with a real-time indexing database that can power sub-second queries at massive scale.
Rockset has native integrations pre-built for MongoDB, DynamoDB, Kafka, Kinesis, S3 and GCS. By following our step-by-step tutorials, Rockset will automatically load your data within seconds and you can begin making SQL queries immediately (new data is queryable with a p95 of 2 seconds). Rockset initially bulk loads data and then switches to continuous ingest to stay in sync with your source, ingesting millions of events per second. No ETL tools are required.
You can read more about how each individual data source integration works below:
Note that even if your desired data source is not currently supported or if you prefer to build your own custom connector, you can always use the Write API to manually load your data.
Learn more about how real-time updates are architected in Rockset here.
Rockset does not require your data to have a schema ahead of time, as your data is schemalessly ingested. Smart schemas are then automatically generated schemas based on the exact fields and types present in the ingested data. The schema represents semi-structured data, nested objects and arrays, mixed types and nulls, enabling relational SQL queries over all these constructs. You can also define your own transformations by using Field Mappings to create new fields by applying SQL expressions on incoming data.
Learn more about how smart schemas are generated in Rockset here.
#Full SQL Support
Support for full SQL, including aggregations, filtering, windowing and joins over all types of fields including heavily nested objects and arrays, is available on any semi-structured data with Rockset. This enables the full expressiveness and flexibility of SQL queries over data in supported data sources, even if they don't natively support SQL.
See our SQL Reference for the full list of all functions available when writing SQL queries on Rockset.
Query Lambdas are named, parameterized SQL queries stored in Rockset that can be executed from a dedicated REST endpoint. With Query Lambdas, you can save and enforce version control for your SQL queries and integrate them into your CI/CD workflows.
Using the Rockset CLI, you can also create, manage, and deploy your Query Lambdas directly from your local computer. Query Lambdas are also fully supported in Rockset’s official client libraries and the Rockset API.
Watch our tutorial on how to build applications using Query Lambdas here.
#How does it work?
All fields, including deeply nested fields, are automatically indexed in a Converged Index™ as each record is ingested. They include three indexes: an inverted index, columnar index and row index. A Converged Index™ allows analytical queries on large datasets to return in milliseconds. Using Rockset, you will never have to manually define or create your indexes, or update them over time. Rockset can also be customized for efficiently cost-optimizing massive scale applications.
You can read more about how Rockset builds a Converged Index™ and other design concepts in Rockset’s Architecture Whitepaper.
#Scale Compute and Storage Independently
Using Rockset, you can scale compute and storage resources independently for the best price-performance. As your data size grows, you can choose exactly the right amount of compute for the query performance you need at any given time. Hot storage and ingest costs are charged at a fixed rate, while compute resources based on your Virtual Instance Type.
See Rockset’s full pricing model here.
#Serverless Auto-Scaling in the Cloud
Rockset uses a modern, cloud-native Aggregator Leaf Tailer (ALT) architecture which auto-scales in the cloud and automates cluster provisioning and index management. You will never need to provision capacity or manage servers, significantly minimizing any operational overhead.
You can read more about how Rockset uses ALT architecture and its performance benchmarks in Evaluating Data Latency for Real-Time Databases white paper.
All data is encrypted using AES-256 at rest and SSL in transit. In addition, you can mask sensitive information using field mappings at the time of ingest. You can read more about our security features including SAML, OAuth, and Okta for single sign-on in the Security section (/iam) of our documentation. See our Data Privacy Addendum here.
Support for AWS VPC deployments is also available by contacting firstname.lastname@example.org.
To get started, you can create a Rockset account here. See our Quick Start guide to get a taste of Rockset by running queries on some sample data. Or, learn how to start loading your data by connecting your data source here!