建立系统 Cube

为了更好的支持自我监控，在系统 project 下创建一组系统 Cubes，叫做 “KYLIN_SYSTEM”。现在，这里有五个 Cubes。三个用于查询指标，”METRICS_QUERY”，”METRICS_QUERY_CUBE”，”METRICS_QUERY_RPC”。另外两个是 job 指标，”METRICS_JOB”，”METRICS_JOB_EXCEPTION”。

在 KYLIN_HOME 目录下创建一个配置文件 SCSinkTools.json。

例如：

1. 生成 Metadata

在 KYLIN_HOME 文件夹下运行一下命令生成相关的 metadata：


-inputConfig SCSinkTools.json \
-output <output_forder>

通过这个命令，相关的 metadata 将会生成且其位置位于 <output_forder> 下。细节如下，system_cube 就是我们的 <output_forder>：

2. 建立数据源

运行下列命令生成 hive 源表：

3. 为 System Cubes 上传 Metadata

然后我们需要通过下列命令上传 metadata 到 hbase：

最终，我们需要在 Kylin web UI 重载 metadata。

然后，一组系统 Cubes 将会被创建在系统 project 下，称为 “KYLIN_SYSTEM”。

5. 系统 Cube build

当系统 Cube 被创建，我们需要定期 build Cube。

例如，像接下来这样添加一个 cron job：

0 */2 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_QA 3600000 1200000
20 */2 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_QUERY_CUBE_QA 3600000 1200000
30 */4 * * * sh ${KYLIN_HOME}/bin/system_cube_build.sh KYLIN_HIVE_METRICS_JOB_QA 3600000 1200000

普通 Dimension

对于这些 Cube，admins 能够用四个时间粒度查询。从高级别到低级别，如下：

METRICS_QUERY

这个 Cube 用于在最高级别收集查询 metrics。细节如下：

Dimension
HOST	the host of server for query engine
PROJECT
REALIZATION	in Kylin，there are two OLAP realizations: Cube，or Hybrid of Cubes
REALIZATION_TYPE
QUERY_TYPE	users can query on different data sources，CACHE，OLAP，LOOKUP_TABLE，HIVE
EXCEPTION	when doing query，exceptions may happen. It’s for classifying different exception types

Measure
COUNT
MIN，MAX，SUM of QUERY_TIME_COST	the time cost for the whole query
MAX，SUM of CALCITE_SIZE_RETURN	the row count of the result Calcite returns
MAX，SUM of STORAGE_SIZE_RETURN	the row count of the input to Calcite
MAX，SUM of CALCITE_SIZE_AGGREGATE_FILTER	the row count of Calcite aggregates and filters
COUNT DISTINCT of QUERY_HASH_CODE	the number of different queries

这个 Cube 用于在最低级别收集查询 metrics。对于一个查询，相关的 aggregation 和 filter 能够下推到每一个 rpc 目标服务器。Rpc 目标服务器的健壮性是更好查询性能的基础。细节如下：

Measure
COUNT
MAX，SUM of CALL_TIME	the time cost of a rpc all
MAX，SUM of COUNT_SKIP	based on fuzzy filters or else，a few rows will be skiped. This indicates the skipped row count
MAX，SUM of SIZE_SCAN	the row count actually scanned
MAX，SUM of SIZE_RETURN	the row count actually returned
MAX，SUM of SIZE_AGGREGATE	the row count actually aggregated
MAX，SUM of SIZE_AGGREGATE_FILTER	the row count actually aggregated and filtered，= SIZE_SCAN - SIZE_RETURN

METRICS_QUERY_CUBE

这个 Cube 用于在 Cube 级别收集查询 metrics。最重要的是 cuboids 相关的，其为 Cube planner 提供服务。细节如下：

Dimension
CUBE_NAME
CUBOID_SOURCE	source cuboid parsed based on query and Cube design
CUBOID_TARGET	target cuboid already precalculated and served for source cuboid
IF_MATCH	whether source cuboid and target cuboid are equal
IF_SUCCESS	whether a query on this Cube is successful or not

METRICS_JOB

在 Kylin 中，主要有三种类型的 job：
- “BUILD”，为了从 HIVE 中 building Cube segments。
- “MERGE”，为了在 HBASE 中 merging Cube segments。
- “OPTIMIZE”，为了在 HBASE 中基于 base cuboid 动态调整预计算 cuboid tree。

这个 Cube 是用来收集 job 指标。细节如下：

Dimension
PROJECT
CUBE_NAME
JOB_TYPE
CUBING_TYPE	in kylin，there are two cubing algorithms，Layered & Fast(InMemory)

Measure
COUNT
MIN，MAX，SUM of DURATION	the duration from a job start to finish
MIN，MAX，SUM of TABLE_SIZE	the size of data source in bytes
MIN，MAX，SUM of CUBE_SIZE	the size of created Cube segment in bytes
MIN，MAX，SUM of PER_BYTES_TIME_COST	= DURATION / TABLE_SIZE
MIN，MAX，SUM of WAIT_RESOURCE_TIME	a job may includes serveral MR(map reduce) jobs. Those MR jobs may wait because of lack of Hadoop resources.

METRICS_JOB_EXCEPTION

Measure
COUNT