数据库内核月报－ 2018/03 - MySQL · 源码分析 · 原子DDL的实现过程 - 《数据库内核月报》

上面情况出现的主要原因就是MySQL不支持原子的DDL。从图1可以看出，MySQL8.0以前，metadata在Server layer是存储在MyISAM引擎的系统表里，对于事务型引擎Innodb则自己存储一份metadata。这也导致MySQL存在如下一些弊端：

两份系统表存储的信息有所不同，访问Server layer以及存储引擎需要使用不同API，这种设计导致了不能很好的统一对系统metadata的访问。另外两份API，也同时增加了代码的维护量。
由于Server layer的metadata存储在非事务引擎（MyISAM)里，所以在进行crash recovery的时候就不能维持原子性。
DDL的非原子性使得Replication处理异常情况变得更加复杂。比如DROP TABLE t1, t2; 如果DROP t1成功，但是DROP t2失败，Replication就无法保证主备一致性了。

图1： MySQL Data Dictionary before MySQL8.0

图2: MySQL Data Dictionary in MySQL8.0

图2我们可以看到，Server layer(后面简称SL）以及Storage Engine（后面简称SE）使用同一份data dictionary(后面简称DD）用来存储metadata。SL和SE将各自需要的metadata存入DD中。由于DD使用Innodb作为存储引擎，所以crash recovery的时候，DD可以安全的进行事务回滚。

接下来我们以CREATE TABLE为例从源码上简单看一下MYSQL8.0是如何实现原子DDL的。

CREATE TABLE实现的流程图如下：


  该函数将会为Innodb存储引擎创建它自己需要的系统列。实际上就是把原来Innodb自己的系统表统一到DD中。
*/
int
ha_innobase::get_extra_columns_and_keys(
  const HA_CREATE_INFO*,
  const List<Create_field>*,
  const KEY*,
  uint,
  dd::Table*  dd_table)
{
  DBUG_ENTER("ha_innobase::get_extra_columns_and_keys");
  THD*      thd     = ha_thd();
  dd::Index*    primary     = nullptr;
  bool      has_fulltext    = false;
  const dd::Index*  fts_doc_id_index  = nullptr;
  /* 检查各个定义的索引是否合法。*/
  for (dd::Index* i : *dd_table->indexes()) {
    /* The name "PRIMARY" is reserved for the PRIMARY KEY */
    ut_ad((i->type() == dd::Index::IT_PRIMARY)
          == !my_strcasecmp(system_charset_info, i->name().c_str(),
          primary_key_name));
    if (!my_strcasecmp(system_charset_info,
           i->name().c_str(), FTS_DOC_ID_INDEX_NAME)) {
      ut_ad(!fts_doc_id_index);
      ut_ad(i->type() != dd::Index::IT_PRIMARY);
      fts_doc_id_index = i;
    }
    /* 验证索引算法是否有效 */
    switch (i->algorithm()) {
    ...
  }
  /* 验证并处理全文索引 */
  if (has_fulltext) {
    ...
  }
  /* 如果当前没有定义主键，Innodb将自动增加DB_ROW_ID作为主键。 */
  if (primary == nullptr) {
    dd::Column* db_row_id = dd_add_hidden_column(
      dd_table, "DB_ROW_ID", DATA_ROW_ID_LEN,
      dd::enum_column_types::INT24);
    if (db_row_id == nullptr) {
      DBUG_RETURN(ER_WRONG_COLUMN_NAME);
    }
    primary = dd_set_hidden_unique_index(
      dd_table->add_first_index(),
      primary_key_name,
      db_row_id);
  }
  /* 为二级索引增加主键列 */
        ut_allocator<const dd::Index_element*>> pk_elements;
    if (index == primary) {
      continue;
    }
    pk_elements.clear();
    for (const dd::Index_element* e : primary->elements()) {
      if (e->is_prefix() ||
         std::search_n(index->elements().begin(),
               index->elements().end(), 1, e,
               [](const dd::Index_element* ie,
            const dd::Index_element* e) {
                 return(&ie->column()
                  == &e->column());
               }) == index->elements().end()) {
        pk_elements.push_back(e);
      }
    }
    for (const dd::Index_element* e : pk_elements) {
      auto ie = index->add_element(
        const_cast<dd::Column*>(&e->column()));
      ie->set_hidden(true);
      ie->set_order(e->order());
    }
  }
  /* 增加系统列 DB_TRX_ID, DB_ROLL_PTR. */
  dd::Column* db_trx_id = dd_add_hidden_column(
    dd_table, "DB_TRX_ID", DATA_TRX_ID_LEN,
    dd::enum_column_types::INT24);
  if (db_trx_id == nullptr) {
    DBUG_RETURN(ER_WRONG_COLUMN_NAME);
  }
  dd::Column* db_roll_ptr = dd_add_hidden_column(
    dd_table, "DB_ROLL_PTR", DATA_ROLL_PTR_LEN,
    dd::enum_column_types::LONGLONG);
  if (db_roll_ptr == nullptr) {
    DBUG_RETURN(ER_WRONG_COLUMN_NAME);
  }
  dd_add_hidden_element(primary, db_trx_id);
  dd_add_hidden_element(primary, db_roll_ptr);
  /* Add all non-virtual columns to the clustered index,
  unless they already part of the PRIMARY KEY. */
  for (const dd::Column* c : const_cast<const dd::Table*>(dd_table)->columns()) {
    if (c->is_hidden() || c->is_virtual()) {
      continue;
    }
    if (std::search_n(primary->elements().begin(),
          primary->elements().end(), 1,
          c, [](const dd::Index_element* e,
          const dd::Column* c)
          {
            return(!e->is_prefix()
             && &e->column() == c);
        == primary->elements().end()) {
    }
  }
  DBUG_RETURN(0);
}
template <typename T>
Dictionary_client::store(T* object)
{
    ...
    /* 调用下面函数完成存储 */
    if (Storage_adapter::store(m_thd, object))
       return true;
    ...
}
/* 该函数负责将DD对象写入对应的系统表中。 */
template <typename T>
bool Storage_adapter::store(THD *thd, T *object)
{
  // 如果是测试或者未到真正需要建表的阶段，只存入缓存，不进行持久化存储。
  if (s_use_fake_storage ||
      bootstrap::DD_bootstrap_ctx::instance().get_stage() <
          bootstrap::Stage::CREATED_TABLES)
  {
    instance()->core_store(thd, object);
    return false;
  }
  // 这里会验证DD对象的有效性
  if (object->impl()->validate())
  {
    DBUG_ASSERT(thd->is_system_thread() || thd->killed || thd->is_error());
    return true;
  }
  // 切换上下文，包括更新系统表的时候关闭binlog、修改auto_increament_increament增量、设置一些相关变量等与修改DD相关的上下文。
  Update_dictionary_tables_ctx ctx(thd);
  ctx.otx.register_tables<T>();
  DEBUG_SYNC(thd, "before_storing_dd_object");
  // object->impl()->store 这里会将DD对象存入相关的系统表。具体比如表，列， 表空间是如何持久化到系统表中的，由于篇幅有限，我们将在以后的月报中继续剖析。
  if (ctx.otx.open_tables() || object->impl()->store(&ctx.otx))
  {
    DBUG_ASSERT(thd->is_system_thread() || thd->killed || thd->is_error());
    return true;
  }
  // Do not create SDIs for tablespaces and tables while creating
 // dictionary entry during upgrade.
  if (bootstrap::DD_bootstrap_ctx::instance().get_stage() >
          bootstrap::Stage::CREATED_TABLES &&
      dd::upgrade_57::allow_sdi_creation() &&
      sdi::store(thd, object))
    return true;
  return false;

综上篇章简要的描述了MySQL8.0实现原子DDL的背景以及一些重点的数据结构，并对CREATE TABLE过程，以及创建过程中用到的几个重要函数进行了分析。但是原子DDL的实现是一个非常大的工程，本篇月报由于篇幅问题，只是挖了冰山一角。以后的月报会继续对原子DDL的实现进行分析，希望大家持续关注。