SQL Overview
The SQL grammar of HStreamDB (opens new window) is based on a subset of SQL-92 with some extensions to support stream operations.
SQL inputs are made up of a series of statements. Each statement is made up of a series of tokens and ends in a semicolon ().
A token can be a keyword argument, an identifier, a literal, an operator, or a special character. The details of the rules can be found in the BNFC grammar file (opens new window). Normally, tokens are separated by whitespace.
The following examples are syntactically valid SQL statements:
Keywords
Some tokens such as SELECT
, INSERT
and WHERE
are reserved keywords, which have specific meanings in SQL syntax. Keywords are case insensitive, which means that SELECT
and select
are equivalent. A keyword can not be used as an identifier.
For a complete list of keywords, see the appendix.
Identifiers are tokens that represent user-defined objects such as streams, fields, and other ones. For example, my_stream
can be used as a stream name, and temperature
can represent a field in the stream.
By now, identifiers are case sensitive, which means that my_stream
and MY_STREAM
are different identifiers.
Literals (Constants)
Literals are objects with known values before being executed. There are six types of constants: integers, floats, strings, dates, time, and intervals so far.
Integers are in the form of digits
, where digits
are one or more single-digit integers (0 through 9). Negatives such as -1
are also supported. Note that scientific notation is not supported yet.
Floats
Floats are in the form of digits . digits
. Negative floats such as -11.514
are supported. Note that
- Forms such as
1.
and.99
are not supported yet.
Strings are arbitrary character series surrounded by double quotes ("
), such as "JSON"
.
Dates
Dates represent a date exact to a day in the form of DATE <year>-<month>-<day>
, where <year>
, <month>
and are all integer constants. Note that the leading DATE
should not be omitted.
Example: DATE 2021-01-02
Example: TIME 11:45:14
Intervals
Intervals represent a time section in the form of INTERVAL <num> <time_unit>
, where <num>
is an integer constant and <time_unit>
is one of YEAR
, MONTH
, WEEK
, DAY
, MINUTE
and SECOND
. Note that the leading INTERVAL
should not be omitted.
Example: INTERVAL 5 SECOND
Functions are special keywords that mean some computation, such as SUM
and MIN
. And operators are infix functions composed of special characters, such as and <>
.
For a complete list of functions and operators, see the appendix.
Special Characters
There are some special characters in the SQL syntax with particular meanings:
- Parentheses (
()
) are used outside an expression for controlling the order of evaluation or specifying a function application. - Brackets (
[]
) are used with maps and arrays for accessing their substructures, such assome_map[temp]
andsome_array[1]
. Note that it is not supported yet. - Commas (
,
) are used for delineating a list of objects. - The semicolons (
;
) represent the end of a SQL statement. - The asterisk (
*
) represents “all fields”, such asSELECT * FROM my_stream;
. - The back quote (
`
) represents an “raw column name” in theSELECT
clause to distinguish a column name with functions from actual function applications. For example,SELECT SUM(a) FROM s EMIT CHANGES;
means applyingSSUM
function on the columna
from streams
. However if the streams
actually contains a column calledSUM(a)
and you want to take it out, you can use back quotes likeSELECT `SUM(a)` FROM s EMIT CHANGES;
.
A single-line comment begins with //
: