Different data models and scalability

    The key/value store data model is the easiest to scale. In ArangoDB,this is implemented in the sense that a document collection always has a primary key attribute and in the absence of further secondaryindexes the document collection behaves like a simple key/value store.

    For the document store case even in the presence of secondary indexesessentially the same arguments apply, since an index for a shardedcollection is simply the same as a local index for each shard. Therefore,single document operations still scale linearly with the size of thecluster, unless a special sharding configuration makes lookups orwrite operations more expensive.

    Nevertheless, for certain complicated joins, there are limits asto what can be achieved.

    However, if the vertices and edges along the occurring paths aredistributed across the cluster, then a lot of communication isnecessary between nodes, and performance suffers. To achieve goodperformance at scale, it is therefore necessary to get thedistribution of the graph data across the shards in the clusterright. Most of the time, the application developers and users ofArangoDB know best, how their graphs are structured. Therefore, ArangoDB allows users to specify, according to which attributesthe graph data is sharded. A useful first step is usually to makesure that the edges originating at a vertex reside on the samecluster node as the vertex.