## Anonymization and Casting

This query keeps all input fields by default, except for `extra_data`, `more_extra_data`, and `email_address`. It adds two new fields to the final document: `last_updated_at` and `email_address_hash`. If there are any input fields named `last_updated_at` or `email_address_hash` in the input document, their values in the final document will be the values of the specified expressions, not their values in the input document. It also filters out all input documents where `email_address` is `null`.

## Data Clustering

This query drops all input fields by default since there is no `*` in the `SELECT` clause, and instead it explicitly renames several input fields to their cleaner final names `country`, `continent`, `capital`, `gross_domestic_product` and computes `pouplation_millions` from the input field `population`. The `WHERE` clause filters out any documents corresponding to countries with a population less than 1M. And [data clustering](🔗) is configured on the [<<glossary:Collection>>](🔗) with clustering key `(continent, country)` which will speed up queries that filter on `continent`, or `continent` and `country` significantly.

## CDC Control Records and \_op

This query maps `user_id` from the source documents to [`_id`](🔗) in Rockset and interprets [`_op`](🔗) based on whether a field is set in the header. In the case of a new record coming in, `header.delete` will be false (or null) and both `IF` statements will execute their second branch which will set `_op = 'UPSERT'` and will parse [`_event_time`](🔗) from `_created_at_ts`. All fields except the mapped ones and the header will then be inserted into Rockset. When a delete comes through and `header.delete` is true, then the first branch of the `IF` statements will execute which will set `_op = DELETE` and leave `_event_time` as null. This will issue a delete for the document corresponding to `_id`.

## Precomputing Fields

This query drops all input fields by default and only keeps the input fields `twitter_handle` and `follower_count`. It also adds three new fields to our final document: `tweet_text_tokens`, `email_address_domain`, and `geo`, each of which can be used to speed up queries that would otherwise have to run expensive operations at query time on the raw input fields.


This query keeps all input fields by default and adds one new field to the final document: `email_address`.

If `user` is `null`, the `email_address` field will be set to `undefined` in the final document.

Setting a field’s value to `undefined` in an ingest transformation means that the field will not be present in the final document. Thus, `email_address` will only be present in the final document if the input document’s `user` field is not `null`. See [documentation on undefined](🔗) for more details.