Whitelisting Fields

This page describes how to implement field whitelisting using field mappings.


Field whitelisting is used to retain certain fields in a document, and drop all the other fields that do not match. This is useful for ingesting data where the schema is unknown, but where only certain fields are of interest.

The syntax for whitelisting is very similar to the field mapping syntax, except for the addition of a new property: is_drop_all_fields.

is_drop_all_fields will drop all fields in each ingested document, except for the fields we specify in output fields.

By default, all fields will be dropped if is_drop_all_fields is true, except for the _id and _event_time fields.

#Whitelisting via the CLI

Field whitelisting can be defined in a YAML file specified at collection creation time, such as the one below.

name: users
  - name: drop_all
    # `is_drop_all_fields` should be specified only once.
    # Additionally, the mapping specifying `is_drop_all_fields` should not have an output or input field.
    is_drop_all_fields: true

  - name: whitelist_name
      - field_name: 'name'
        param: 'name'
      field_name: 'name'
        # An "identity" SQL expression that retains the `name` field.
        sql: ':name'
  - name: whitelist_age
      - field_name: 'age'
        param: 'age'
      field_name: 'age'
        # A SQL expression that retains the `age` field while simultaneously casting it to an integer.
        sql: 'CAST(:age as INTEGER)'

In the example above, we whitelist the name and age field, while also applying a SQL expression on the latter. Now, any document that comes in will have all its fields dropped, except for the two whitelisted fields.

Join us on Slack!
Building on Rockset? Come chat with us!