Regular Expression Functions

Rockset uses RE2 syntax for specifying regular expressions. Named capturing groups is not supported, only numbered groups.

REGEXP_EXTRACT

REGEXP_EXTRACT(string, pattern[, group]) Returns the first match of pattern in string, or null if the pattern does not match. If group is specified and greater than zero, returns the groupth capturing group; if group is not specified or is zero, returns the full match.

 SELECT REGEXP_EXTRACT('foo bar baz', '[a-z]+')
'foo'
 SELECT REGEXP_EXTRACT('foo bar baz', '.*bar')
'foo bar'
 SELECT REGEXP_EXTRACT('foo bar baz', ' [a-z]+ ')
' bar '
 SELECT REGEXP_EXTRACT('foo bar baz', '(.*) bar (.*)', 1)
'foo'
 SELECT REGEXP_EXTRACT('foo bar baz', '(.*) bar (.*)', 2)
'baz'

REGEXP_EXTRACT_ALL

REGEXP_EXTRACT_ALL(string, pattern[, group]) Returns an array containing all matches of pattern in string. If group is specified and greater than zero, extracts the groupth capturing group from each match; if group is not specified or is zero, returns the full matches.

 SELECT REGEXP_EXTRACT_ALL('foo bar baz', '[a-z]+')
["foo", "bar", "baz"]
 SELECT REGEXP_EXTRACT_ALL('foo bar baz', 'ba([a-z])', 1)
["r", "z"]
 SELECT REGEXP_EXTRACT_ALL('foo bar baz', '([a-z])([a-z]+)', 2)
["oo", "ar", "az"]

REGEXP_LIKE

REGEXP_LIKE(string, pattern) Returns true if string matches the regular expression pattern, false otherwise.

 SELECT REGEXP_LIKE('foo bar baz', '.*')
true
 SELECT REGEXP_LIKE('foo bar baz', '^$')
false
 SELECT REGEXP_LIKE('foo bar baz', '^foo')
true
 SELECT REGEXP_LIKE('foo bar baz', '[a-zA-Z0-9 ]+')
true
 SELECT REGEXP_LIKE('foo bar baz', '[a-z0-9]@[a-z0-9]+\.[a-z]+')
false
 SELECT REGEXP_LIKE('[email protected]', '[a-z0-9]@[a-z0-9]+\.[a-z]+')
true

REGEXP_REPLACE

REGEXP_REPLACE(string, pattern[, replacement]) Returns string with all places where pattern matches replaced with replacement (or erased if replacement is not specified). replacement may use \1 .. \9 escape sequences to refer to capturing groups, or \0 to refer to the entire match.

 SELECT REGEXP_REPLACE('foo bar baz', ' ', '')
'foobarbaz'
 SELECT REGEXP_REPLACE('foo bar baz', 'b[a-z]+ ', '')
'foo baz'
 SELECT REGEXP_REPLACE('foo bar baz', '.* b', 'b')
'baz'

  SELECT REGEXP_REPLACE('John A. Doe', '([A-Z][a-z]+) ([A-Z]\.?) ([A-Z][a-z]+)', '\3, \1 \2')
'Doe, John A.'

REGEXP_SPLIT

REGEXP_SPLIT(string, pattern) Returns an array with the components of string when split by pattern.

 SELECT REGEXP_SPLIT('foo bar baz', ' ')
["foo", "bar", "baz"]
 SELECT REGEXP_SPLIT('foo bar baz', ' +')
["foo", "bar", "baz"]
 SELECT REGEXP_SPLIT('foo123bar456baz', '[0-9]+')
["foo", "bar", "baz"]

SPLIT_PART

SPLIT_PART(string, delimiter, index) Splits string on delimiter and returns the field index. Field indexes start with 1. If the index is larger than than the number of fields, then null is returned.

 SELECT SPLIT_PART('a:b:c:d', ':', 2)
'b'
 SELECT SPLIT_PART('a:b:c', ',', 3)
null
 SELECT SPLIT_PART('a:b:c:d', ':', 10)
null
 SELECT SPLIT_PART('a::c:d', ':', 2)
''
 SELECT SPLIT_PART('a:b:c', ',', 1)
'a:b:c'

CHR

CHR(n) Returns a single character string made up of the character whose Unicode (in decimal) points to n. This is also known as the ASCII value.

 SELECT CHR(42)
'*'

ASCII

ASCII(string) Returns the Unicode code point value of the first character in string, or zero if the string is empty.

 SELECT ASCII('*')
42

REVERSE

REVERSE(string) Returns string with its characters in reverse order.

 SELECT REVERSE('abcd')
'dcba'