• 前两个字段INDEX_MAGIC_HEADER和INDEX_FORMAT_VERSION分别是magic number和索引版本号
    • 第三个字段USE_64BIT表示是否使用64位的document和word id(默认是使用).
    • 然后是写入docinfo,这个字段也就是配置中的docinfo字段(index block中)
    • 接下来将会写入schema,也就是索引的schema信息,比如当前索引的字段名,当前需要建立的属性名等等.
    1. {
    2. // schema
    3. fdInfo.PutDword ( tSchema.m_dFields.GetLength() );
    4. ARRAY_FOREACH ( i, tSchema.m_dFields )
    5. WriteSchemaColumn ( fdInfo, tSchema.m_dFields[i] );
    6. fdInfo.PutDword ( tSchema.GetAttrsCount() );
    7. for ( int i=0; i<tSchema.GetAttrsCount(); i++ )
    8. WriteSchemaColumn ( fdInfo, tSchema.GetAttr(i) );
    9. }
    • 然后是写入当前索引集的最小doc id(m_uMinDocid)
    • 接下来是根据docinfo(也就是属性存储)的配置来选择是否写入行信息(当docinfo为inline的话,表示attribute value 将会存储在spd文件中).
    • 然后是写入wordlist的checkpoint.
    • 接下来是写入对应的索引配置信息
    • 写入对应的tokenizer的配置信息,
    1. void SaveTokenizerSettings ( CSphWriter & tWriter, ISphTokenizer * pTokenizer, int iEmbeddedLimit )
    2. {
    3. const CSphTokenizerSettings & tSettings = pTokenizer->GetSettings ();
    4. tWriter.PutByte ( tSettings.m_iType );
    5. tWriter.PutString ( tSettings.m_sCaseFolding.cstr () );
    6. tWriter.PutDword ( tSettings.m_iMinWordLen );
    7. bool bEmbedSynonyms = pTokenizer->GetSynFileInfo ().m_uSize<=(SphOffset_t)iEmbeddedLimit;
    8. tWriter.PutByte ( bEmbedSynonyms ? 1 : 0 );
    9. if ( bEmbedSynonyms )
    10. pTokenizer->WriteSynonyms ( tWriter );
    11. tWriter.PutString ( tSettings.m_sSynonymsFile.cstr () );
    12. tWriter.PutString ( tSettings.m_sIgnoreChars.cstr () );
    13. tWriter.PutDword ( tSettings.m_iNgramLen );
    14. tWriter.PutString ( tSettings.m_sNgramChars.cstr () );
    15. tWriter.PutString ( tSettings.m_sBlendChars.cstr () );
    16. tWriter.PutString ( tSettings.m_sBlendMode.cstr () );
    17. }
    • 写入dictionary的配置信息(比如stop word之类).
    • 然后是写入killlist的size(m_uKillListSize)
    • 写入m_iMinMaxIndex,这个选项也就是表示document size.
    1. CSphFixedVector<CSphRowitem> dMinRow ( tNewSchema.GetRowSize() );
    2. ...............
    3. int iNewStride = DOCINFO_IDSIZE + tNewSchema.GetRowSize();
    4. int64_t iNewMinMaxIndex = m_iDocinfo * iNewStride;
    5. tBuildHeader.m_iMinMaxIndex = iNewMinMaxIndex;
    • 写入regex相关配置(regexp_filter)
    • 最后是写入对应的schema field长度.