HBase and Schema Design

    The documentation on the Cloud Bigtable website, Designing Your Schema, is pertinent and nicely done and lessons learned there equally apply here in HBase land; just divide any quoted values by ~10 to get what works for HBase: e.g. where it says individual values can be ~10MBs in size, HBase can do similar — perhaps best to go smaller if you can — and where it says a maximum of 100 column families in Cloud Bigtable, think ~10 when modeling on HBase.

    当修改列族时,表必须先disabled,例如:

    33. Table Schema Rules Of Thumb 表的经验模式

    有许多不同的数据集,带有不同的访问模式和服务水平期望。所以,经验模式只是一个概括。Read the rest of this chapter to get more details after you have gone through this list.

    • 计划region大小在10-50GB
    • 如果使用mob(Storing Medium-sized Objects),计划cells大小不要超过10MB,或50MB ?否则考虑将你的cell数据存于HDFS并存一个指针在HBase中
    • 一个具有1或2个列族的表最好有50-100个regions。记住region时列族的连续分段
    • 列族名应该尽可能短,每个值的列族名都要被存储(The column family names are stored for every value ?)。它们不应该自我记录和描述像是在典型的RDBMS中。
    • 如果只有一个列族忙于写操作,只有这个列族会增加内存。在分配资源时要注意写模式。