[LangChain](🔗) is an open source framework for developing applications powered by language models. LangChain offers a series of modular, easy to use components that can be pieced together into a chain for building language based applications.

LangChain components can be used to preprocess data or break it into chunks, embed the chunks using LLM algorithms, and run similarity search on those embeddings with vector databases. There are a number of features that LangChain offers to make managing and optimizing the use of language models easy:

  • Access to pre-trained LLMs from OpenAI, Hugging Face, Cohere, and more

  • Tools for preprocessing text and code

  • Vector stores, including Rockset, for application serving

  • Off-the-shelf chains to build applications

As a real-time search and analytics database, Rockset uses indexing to deliver scalable and performant personalization, product search, semantic search, chatbot applications, and more. Since Rockset is purpose-built for real-time, you can build these responsive applications on constantly updating, streaming data. By integrating Rockset with LangChain, you can easily use LLMs on your own real-time data for production-ready vector search applications.

We'll walk through a demonstration of [how to use Rockset as a vector store in LangChain](🔗). To get started, make sure you have access to a Rockset account and an API key available.

## Setting Up Your Environment

  1. Leverage the Rockset console to create a [<<glossary:Collection>>](🔗) with the [<<glossary:Write API>>](🔗) as your source. In this walkthrough, we create a collection named `langchain_demo`. Configure the following [<<glossary:Ingest Transformation>>](🔗) with [`VECTOR_ENFORCE`](🔗)to define your embeddings field and take advantage of performance and storage optimizations:


  1. Create and save a new [API Key](🔗) by navigating to the [API Keys tab of the Rockset Console](🔗). For this example, we assume you are using the `Oregon(us-west-2)` region.

  2. Install the [Rockset Python client](🔗) and additional dependencies to work with LangChain and OpenAI.


  1. This tutorial uses [OpenAI](🔗) to create embeddings. You will need to create an OpenAI account and get an API key. Set the API key as `OPENAI_API_KEY` environment variable.

## Using Rockset as a Vector Store

The following sections outline how to generate and store vector embeddings in Rockset and search across embeddings to find similar documents to your search queries.

### 1. Define Key Variables



### 2. Prepare Documents



### 3. Embed and Insert Documents



### 4. Search for Similar Documents


(Script tags will be stripped)


### 5. Search for Similar Documents with Metadata Filtering


(Script tags will be stripped)


### 6. Delete Inserted Documents [Optional]

You must have the unique ID associated with each document to delete them from your collection. Define IDs when inserting documents with `Rockset.add_texts()`. Rockset will otherwise generate a unique ID for each document. Regardless, `Rockset.add_texts()` returns the IDs of inserted documents.

To delete these documents, simply use the `Rockset.delete_texts()` function.



## Using Rockset as a Data Source

LangChain document loaders expose a `load` method for loading data as documents from a source, and Rockset can be configured as a data source. The following sections demonstrate how to use Rockset as a document loader in LangChain.

### Executing Queries

The [`RocksetLoader`](🔗) class allows you to create LangChain documents from Rockset collections through SQL queries.

Start by initializing a `RocksetLoader` with the following sample code:



Here, you can see that the following query is run:



The `text` column in the collection is used as the page content, and the `author` and `date` columns associated with the author are used as metadata If you do not specify `metadata_keys`, the whole Rockset document will be used as metadata.

To execute the query and access an iterator over the resulting `Document`s, run the following:



To execute the query and access all resulting `Document`s at once, run the following:



Here is an example response of `loader.load()`:



### Content Columns

You can choose to use multiple columns as content:



If the "sentence1" field is "This is the first sentence." and the "sentence2" field is "This is the second sentence.", the `page_content` of the resulting `Document` would be:



You can define you own function to join content columns by setting the `content_columns_joiner` argument in the `RocksetLoader` constructor. `content_columns_joiner` is a method that takes in a `List[Tuple[str, Any]]` as an argument, which represents a list of tuples of (column name, column value). By default, this method joins each column value with a new line.

For example, if you wanted to join sentence1 and sentence2 with a space instead of a new line, you could set `content_columns_joiner` like so:



The `page_content` of the resulting `Document` would be:



Oftentimes you want to include the column name in the `page_content`. You can do this too by running:



This would result in the following `page_content`:



## Using Rockset for Chat History

Rockset can be used to store chat history. LangChain's [`RocksetChatMessageHistory`](🔗) class is responsible for remembering chat interactions that can be passed into a model.

Construct a `RocksetChatMessageHistory` object.



If collection `langchain_demo` does not exist in the `commons` workspace, it will be created by LangChain.

Add chat messages:



Get message history:



Clear chat history: