Geo search
CrateDB can be used to store and query geographical information of many kinds using the and geo_shape types. With these it is possible to store geographical locations, ways, shapes, areas and other entities. These can be queried for distance, containment, intersection and so on, making it possible to create apps and services with rich geographical features.
are stored using special indices. Geographic points are represented by their coordinates. They are represented as columns of the respective datatypes.
Geographic indices for columns are used in order to speed up geographic searches even on complex shapes. This indexing process results in a representation that is not exact (See geo_shape for details). CrateDB does not operate on vector shapes but on a kind of a grid with the given precision as resolution.
Creating tables containing geographic information is straightforward:
This table will contain the shape of a country and the location of its capital alongside with other metadata. The shape is indexed with a maximum precision of 100 meters using a geohash
index (For more information, see ).
Let’s insert Austria:
cr> INSERT INTO country (name, country_code, shape, capital, capital_location)
... VALUES (
... 'Austria',
... 'at',
... {type='Polygon', coordinates=[
... [[16.979667, 48.123497], [16.903754, 47.714866],
... [16.340584, 47.712902], [16.534268, 47.496171],
... [16.202298, 46.852386], [16.011664, 46.683611],
... [15.137092, 46.658703], [14.632472, 46.431817],
... [13.806475, 46.509306], [12.376485, 46.767559],
... [12.153088, 47.115393], [11.164828, 46.941579],
... [11.048556, 46.751359], [10.442701, 46.893546],
... [9.932448, 46.920728], [9.47997, 47.10281],
... [9.632932, 47.347601], [9.594226, 47.525058],
... [9.896068, 47.580197], [10.402084, 47.302488],
... [10.544504, 47.566399], [11.426414, 47.523766],
... [12.932627, 47.467646], [13.025851, 47.637584],
... [12.884103, 48.289146], [13.243357, 48.416115],
... [13.595946, 48.877172], [14.338898, 48.555305],
... [14.901447, 48.964402], [15.253416, 49.039074],
... [16.029647, 48.733899], [16.499283, 48.785808],
... [16.960288, 48.596982], [16.879983, 48.470013],
... [16.979667, 48.123497]]
... ]},
... [16.372778, 48.209206]
... );
INSERT OK, 1 row affected (... sec)
Caution
Geoshapes has to be fully valid by ISO 19107. If you have problems importing geo data, they may not be fully valid. In most cases they could be repaired using this tool:
Note
When using a polygon shape that resembles a rectangle, and that rectangle is wider than 180 degrees, the CrateDB geoshape validator will convert it into a multipolygon consisting of 2 rectangular shapes covering the narrower area between the 4 original points split by the dateline (+/- 180deg).
This is due to CrateDB operating in the geospatial context of the earth.
Geographic shapes can be inserted as object literal or parameter as seen above and as string.
When it comes to get some meaningful insights into your geographical data CrateDB supports different kinds of geographic queries.
Fast queries that leverage the geographic index are done using the MATCH predicate:
The MATCH
predicate can be used to perform multiple kinds of searches on indices or indexed columns. While it can be used to perform on analyzed indices of type text, it is also handy for operating on geographic indices, querying for relations between geographical shapes and points.
MATCH (column_ident, query_term) [ using match_type ]
The MATCH
predicate for geographical search supports a single column_ident
of a geo_shape
indexed column as first argument.
The second argument, the query_term
is taken to match against the indexed geo_shape
.
The matching operation is determined by the match_type
which determines the spatial relation we want to match. Available match_types
are:
intersects
(Default) If the two shapes share some points and/or area, they are intersecting and considered matching using this match_type
. This also precludes containment or complete equality.
disjoint
If the two shapes share no single point or area, they are disjoint. This is the opposite of intersects
.
within
If the indexed column_ident
shape is completely inside the query_term
shape, they are considered matching using this match_type
.
Note
One MATCH
predicate cannot combine columns of both relations of a join.
Additionally, MATCH
predicates cannot be used on columns of both relations of a join if they cannot be logically applied to each of them separately. For example:
This is allowed:
But this is not:
FROM t1, t2
WHERE match(t1.shape, 'POINT(1.1 2.2)')
OR match(t2.shape, 'POINT(3.3 4.4)')``
Having a table countries
with a column geo
, indexed using geohash
, you can query that column using the MATCH
predicate with different match types as described above:
cr> SELECT name from countries
... WHERE match("geo",
... 'LINESTRING (13.3813 52.5229, 11.1840 51.5497, 8.6132 50.0782, 8.3715 47.9457, 8.5034 47.3685)'
... );
+---------+
| name |
| Germany |
+---------+
SELECT 1 row in set (... sec)
Exact queries are done using the following :
They are exact, but this comes at the price of performance.
They do not make use of the index but work on the GeoJSON that was inserted to compute the shape vector. This access is quite expensive and may significantly slow down your queries.
For fast querying, use the .
But executed on a limited result set, they will help you get precise insights into your geographic data:
cr> SELECT within(capital_location, shape) AS capital_in_country
... FROM country;
+--------------------+
| capital_in_country |
+--------------------+
| TRUE |
+--------------------+
SELECT 1 row in set (... sec)
cr> SELECT distance(capital_location, 'POINT(0.0 90.0)') as from_northpole
... FROM country ORDER BY country_code;
+-------------------+
| from_northpole |
+-------------------+
| 4646930.675034644 |
SELECT 1 row in set (... sec)
Nonetheless these scalars can be used everywhere in a SQL query where scalar functions are allowed.