CREATE

CREATE [ [ SIMILARITY | DISTANCE ] INDEX ] <name>
[ ON [ FIELD <collection_name>:<field_name> ] [DIMENSION <dimension> ] ]
[ AS [ <factory_string> [ METRIC (metric_type) ]
     [ TRAIN SHARDED ON TABLESAMPLE sample_type(<percent>) ] ]

DDL command that creates a resource like an index on another resource like a collection field. The unique RRN for the created resource is returned as field rrn.

name Name of the newly created resource.
collection_name Collection that this resource is being built against.
field_name Field that this resource is being built against.
dimension Dimension of vectors being indexed when creating a vector index.
factory_string Input string used to create a vector index. String is of the form <Provider>:<Parameters>:<ProviderConfig>.
- faiss is the only supported Provider
- Default nprobe for the index may be set as a parameter. Parameters have the form parameter_name=value[,...].
- Provider config string for for provider faiss is the specified FAISS factory string. Only Inverted File Index (IVF) formats are supported.
metric_type Distance metric type that can be used for constructing a vector index. May be l2 or inner_product. Defaults to inner_product for a similarity index and l2 for a distance index.
sample_type Sampler used during creation. Currently only BERNOULLI is supported.
percent Percent of collection sampled as part of creation. May be a floating point number between [0.0, 1.0] representing the portion or an integer between [0, 100] representing the percentage value.

This query creates a basic similarity index on the field embedding_field in collection vector_search. The metric used for index creation will be inner_product and the vector fields must be of length 100.

 CREATE SIMILARITY INDEX foo ON FIELD vector_search:embedding_field dimension 100 as
      'faiss::IVF1,Flat'

+--------------------------------------+ | rrn | +--------------------------------------+ | cf0e193d-9ee0-4c6a-9f2b-cab9893e97e3 | +--------------------------------------+

Similar to above, this query creates a basic distance index on the field embedding_field in collection vector_search. The metric used for index creation will be l2(euclidean distance). It expects vectors to be of length 100.

 CREATE DISTANCE INDEX foo ON FIELD vector_search:embedding_field dimension 100 as
      'faiss::IVF1,Flat'

+--------------------------------------+ | rrn | +--------------------------------------+ | cf0e193d-9ee0-4c6a-9f2b-cab9893e97e3 | +--------------------------------------+

This query creates a similarity index on field embedding_field in collection vector_search. We specify a default nprobe for the similarity index of 3. We specify that the index should only be trained using at most 5 percent of the data.

 CREATE SIMILARITY INDEX foo ON FIELD vector_search:embedding_field dimension 100 as
      'faiss:nprobe=3:IVF1,Flat' TRAIN SHARDED ON TABLESAMPLE BERNOULLI (5)

+--------------------------------------+ | rrn | +--------------------------------------+ | cf0e193d-9ee0-4c6a-9f2b-cab9893e97e3 | +--------------------------------------+

Similarity to above, except here we want to train on 90% of the data.

 CREATE SIMILARITY INDEX foo ON FIELD vector_search:embedding_field dimension 100 as
      'faiss:nprobe=3:IVF1,Flat' TRAIN SHARDED ON TABLESAMPLE BERNOULLI (0.9)

+--------------------------------------+ | rrn | +--------------------------------------+ | cf0e193d-9ee0-4c6a-9f2b-cab9893e97e3 | +--------------------------------------+