Fulltext queries

    ArangoDB allows to run queries on text contained in document attributes. To usethis, a must be defined for the attribute of the collection thatcontains the text. Creating the index will parse the text in the specifiedattribute for all documents of the collection. Only documents will be indexedthat contain a textual value in the indexed attribute. For such documents, thetext value will be parsed, and the individual words will be inserted into thefulltext index.

    When a fulltext index exists, it can be queried using a fulltext query.

    queries the fulltext index

    The fulltext simple query functions performs a fulltext search on the specifiedattribute and the specified query.

    Details about the fulltext query syntax can be found below.

    Examples

    Show execution results

    Hide execution results

    Syntax

    In the simplest form, a fulltext query contains just the sought word. Ifmultiple search words are given in a query, they should be separated by commas.All search words will be combined with a logical AND by default, and only suchdocuments will be returned that contain all search words. This default behaviorcan be changed by providing the extra control characters in the fulltext query,which are:

    • +: logical AND (intersection)
    • : negation (exclusion)

    Examples:

    • "banana": searches for documents containing “banana”
    • "banana,apple": searches for documents containing both “banana” AND “apple”
    • "banana,-apple": searches for documents that contains “banana” but NOT “apple”.

    Each search word can optionally be prefixed with complete: or prefix:, withcomplete: being the default. This allows searching for complete words or forword prefixes. Suffix searches or any other forms are partial-word matching arecurrently not supported.

    Examples:

    • "complete:banana": searches for documents containing the exact word “banana”
    • : searches for documents with words that start with prefix “head”

    Complete match and prefix search options can be combined with the logicaloperators.

    Please note that only words with a minimum length will get indexed. This minimumlength can be defined when creating the fulltext index. For words tokenization,the libicu text boundary analysis is used, which takes into account the defaultas defined at server startup (—server.default-language startupoption). Generally, the word boundary analysis will filter out punctuation butwill not do much more.

    Especially no word normalization, stemming, or similarity analysis will beperformed when indexing or searching. If any of these features is required, itis suggested that the user does the text normalization on the client side, andprovides for each document an extra attribute containing just a comma-separatedlist of normalized words. This attribute can then be indexed with a fulltextindex, and the user can send fulltext queries for this index, with the fulltextqueries also containing the stemmed or normalized versions of words as requiredby the user.