[LlamaIndex](🔗) is an open-source framework for developing applications powered by language models. LlamaIndex offers tools that facilitate data ingestion, structuring, and storage for LLM-backed apps.

We'll walk through a demonstration of how to use Rockset as a vector store in LlamaIndex.

# Tutorial

In this example, we'll use [OpenAI's](🔗) `text-embedding-ada-002` model to generate embeddings and Rockset as vector store to store embeddings. We'll ingest text from a file and ask questions about the content.

## Setting Up Your Environment

  1. Create and save a new API Key by navigating to the [API Keys tab of the Rockset Console](🔗). Find your API server [here](🔗) and set the `ROCKSET_API_SERVER` environment variable. Set the `OPENAI_API_KEY` environment variable.

  2. Install the dependencies.


  1. LlamaIndex allows you to ingest data from a variety of sources. For this example, we'll read from a text file named `constitution.txt`, which is a transcript of the American Constitution, found [here](🔗).

## Data ingestion

  1. Use LlamaIndex's `SimpleDirectoryReader` class to convert the text file to a list of `Document` objects.


  1. Instantiate the LLM and service context.


  1. Instantiate the vector store and storage context.


  1. Add documents to the `llamaindex_demo` collection and create an index.



## Querying

  1. Ask a question about your document and generate a response.


  1. Run the program.



# Metadata Filtering

Metadata filtering allows you to retrieve documents that match specific filters.

  1. Add nodes to your vector store and create an index.


  1. Define metadata filters.


  1. Retrieve relevant documents that satisfy the filters.



# Indexing from Collections

If nodes already exist in a collection, you can create an index from the collection.

  1. Instantiate the vector store.


  1. Instantiate the LLM and service context.


  1. Create the index.


  1. Ask a question.



# Configuration

  • [<<glossary:Collection>>:](🔗) Name of the collection to query (required).


  • [<<glossary:Workspace>>](🔗): Name of the workspace containing the collection. Defaults to `"commons"`.


  • [api_key](🔗): The API key to use to authenticate Rockset requests. Ignored if `client` is passed in. Defaults to the `ROCKSET_API_KEY` environment variable.


  • [api_server](🔗): The API server to use for Rockset requests. Ignored if `client` is passed in. Defaults to the `ROCKSET_API_KEY` environment variable or `"https://api.use1a1.rockset.com"` if the `ROCKSET_API_SERVER` is not set.


  • **client**: Rockset client object to use to execute Rockset requests. If not specified, a client object is internally constructed with the `api_key` parameter (or `ROCKSET_API_SERVER` environment variable) and the `api_server` parameter (or `ROCKSET_API_SERVER` environment variable).


  • **embedding_col**: The name of the database field containing embeddings. Defaults to `"embedding"`.


  • **metadata_col**: The name of the database field containing node data. Defaults to `"metadata"`.


  • [distance_func](🔗): The metric to measure vector relationship. Defaults to cosine similarity.