CREATE [ [ SIMILARITY | DISTANCE ] INDEX ] <name>
[ ON [ FIELD <collection_name>:<field_name> ] [DIMENSION <dimension> ] ]
[ AS [ <factory_string> [ METRIC (metric_type) ]
[ TRAIN SHARDED ON TABLESAMPLE sample_type(<percent>) ] ]
DDL command that creates a resource like an index on another resource like a collection field. The unique RRN for the created resource is returned as field rrn
.
name
Name of the newly created resource.collection_name
Collection that this resource is being built against.field_name
Field that this resource is being built against.dimension
Dimension of vectors being indexed when creating a vector index.factory_string
Input string used to create a vector index. String is of the form<Provider>:<Parameters>:<ProviderConfig>
.faiss
is the only supported Provider- Default
nprobe
for the index may be set as a parameter. Parameters have the formparameter_name=value[,...]
. - Provider config string for for provider
faiss
is the specified FAISS factory string. Only Inverted File Index (IVF) formats are supported.
metric_type
Distance metric type that can be used for constructing a vector index. May bel2
orinner_product
. Defaults toinner_product
for a similarity index andl2
for a distance index.sample_type
Sampler used during creation. Currently onlyBERNOULLI
is supported.percent
Percent of collection sampled as part of creation. May be a floating point number between [0.0, 1.0] representing the portion or an integer between [0, 100] representing the percentage value.
This query creates a basic similarity index on the field embedding_field
in collection vector_search
. The metric used for index creation will be inner_product
and the vector fields must be of length 100.
CREATE SIMILARITY INDEX foo ON FIELD vector_search:embedding_field dimension 100 as
'faiss::IVF1,Flat'
Similar to above, this query creates a basic distance index on the field embedding_field
in collection vector_search
. The metric used for index creation will be l2
(euclidean distance). It expects vectors to be of length 100.
CREATE DISTANCE INDEX foo ON FIELD vector_search:embedding_field dimension 100 as
'faiss::IVF1,Flat'
This query creates a similarity index on field embedding_field
in collection vector_search
. We specify a default nprobe for the similarity index of 3. We specify that the index should only be trained using at most 5 percent of the data.
CREATE SIMILARITY INDEX foo ON FIELD vector_search:embedding_field dimension 100 as
'faiss:nprobe=3:IVF1,Flat' TRAIN SHARDED ON TABLESAMPLE BERNOULLI (5)
Similarity to above, except here we want to train on 90% of the data.
CREATE SIMILARITY INDEX foo ON FIELD vector_search:embedding_field dimension 100 as
'faiss:nprobe=3:IVF1,Flat' TRAIN SHARDED ON TABLESAMPLE BERNOULLI (0.9)