This page describes how to create a collection with field mappings in Rockset.
A field mapping allows you to specify transformations to be applied on all documents inserted into a collection. This can be used for type coercion, anonymization, tokenization, etc.
The transformations are defined in a YAML file specified at collection creation time, such as the one below.
type: COLLECTION name: c1 field_mappings: - name: anonymize_name # name of the mapping input_fields: - field_name: 'name' # SQL qualified name # Behaviour if field is NULL or missing: # - skip: skip the update, drop fields are still dropped (default) # - pass: pass NULL to the update function if_missing: 'SKIP' is_drop: true # drop this field from the doc param: 'name' # exported name/alias for the field. This can be referred to in the SQL transformation output_field: field_name: 'name_anon' value: sql: 'TO_HEX(SHA256(:name))' # Any SQL expression # Error behavior: # - skip: Skip this output field (default) # - fail: Fail the update # Note that fields with "is_drop: true" are always dropped # Error behavior must be "fail" for special output fields (_id, _event_time) on_error: "FAIL"
Any field may be used as the output of a transformation. Special fields require the result of the transformation to be of a specific type:
_event_time: The transformation must return a
_id: The transformation must return a
stringthat is used as a primary key (so must be unique across all documents in a collection).
If you are using
SHA256 or any other hashing function, be aware that these
bytes rather than
string, so, if you need the output field
string (such as if the output field is
_id), you may convert to a hex
TO_HEX (see string functions).
To create the collection based on the YAML specification, use this command:
$ rock create -f mappings.yaml Collection "c1" was created successfully.
You can specify field mappings in Python as shown below.
from rockset import Client rs = Client() field_mappings = [ rs.FieldMapping.mapping( name="anonymize_name", input_fields=[ rs.FieldMapping.input_field( field_name="name", if_missing="SKIP", is_drop=True, param="name" ) ], output_field=rs.FieldMapping.output_field( field_name="name_anon", sql_expression="SHA256(:name)", on_error="FAIL" ) ) ] fm_collection=rs.Collection.create("c1")