Data Format > Field Mappings

Field Mappings

This page describes how to create a collection with field mappings in Rockset.

A field mapping allows you to specify transformations to be applied on all documents inserted into a collection. This can be used for type coercion, anonymization, tokenization, etc.

Using CLI

The transformations are defined in a YAML file specified at collection creation time, such as the one below.

type: COLLECTION
name: c1
field_mappings:
- name: anonymize_name        # name of the mapping
  input_fields:
  - field_name: 'name'        # SQL qualified name

    # Behaviour if field is NULL or missing:
    # - skip: skip the update, drop fields are still dropped (default)
    # - pass: pass NULL to the update function 
    if_missing: 'SKIP'
    is_drop: true             # drop this field from the doc
    param: 'name'             # exported name/alias for the field. This can be referred to in the SQL transformation         
  output_field:
    field_name: 'name_anon'
    value:
      sql: 'TO_HEX(SHA256(:name))'    # Any SQL expression

    # Error behavior:
    # - skip: Skip this output field (default)
    # - fail: Fail the update
    # Note that fields with "is_drop: true" are always dropped
    # Error behavior must be "fail" for special output fields (_id, _event_time)
    on_error: "FAIL"

Any field may be used as the output of a transformation. Special fields require the result of the transformation to be of a specific type:

  • _event_time: The transformation must return a timestamp.
  • _id: The transformation must return a string that is used as a primary key (so must be unique across all documents in a collection).

If you are using SHA256 or any other hashing function, be aware that these functions return bytes rather than string, so, if you need the output field to be string (such as if the output field is _id), you may convert to a hex string using TO_HEX (see string functions).

To create the collection based on the YAML specification, use this command:

$ rock create -f mappings.yaml

Collection "c1" was created successfully.

Using Python

You can specify field mappings in Python as shown below.

from rockset import Client
rs = Client()

field_mappings = [
    rs.FieldMapping.mapping(
            name="anonymize_name",
            input_fields=[
            rs.FieldMapping.input_field(
                field_name="name",
                if_missing="SKIP",
                is_drop=True,
                param="name"
            )
    ],
    output_field=rs.FieldMapping.output_field(
        field_name="name_anon",
        sql_expression="SHA256(:name)",
        on_error="FAIL"
        )
    )
]

fm_collection=rs.Collection.create("c1")