InnoDB basic intro

Posted on
mysql database


  • default engine when creating table, SHOW ENGINE INNODB STATUS

  • pro:

    • DML operations follows ACID, can commit or rollback
    • and has crash-recovery
    • Row-level locking, consistent read increase performance
    • primary key: fast lookups
    • foreign key constraints
  • crash recovery?

    • The cleanup activities that occur when MySQL is started again after a crash. For InnoDB tables, changes from incomplete transactions are replayed using data from the redo log. Changes that were committed before the crash, but not yet written into the data files, are reconstructed from the doublewrite buffer. When the database is shut down normally, this type of activity is performed during shutdown by the purge operation.

      ​ During normal operation, committed data can be stored in the change buffer for a period of time before being written to the data files. There is always a tradeoff between keeping the data files up-to-date, which introduces performance overhead during normal operation, and buffering the data, which can make shutdown and crash recovery take longer.

    • automatically finalizes any changes that were committed before the time of the crash, and undoes any changes that were inprocess but not committed.

    • for uncommitted: replay with redo log

    • for committed but not written to data file: reconstructed from doublewrite buffer

    • purge when normal shutdown

  • use buffer pool to cache index and table data in memory when data is accessed. buffer pool is divided into pages, and a linked list of pages. LRU algorithm.

  • checksum mechanism helps when reading corrupted file from disk to buffer pool.

  • primary key is faster in where, order by, group by, and join(need more info)

  • insert, update, delete are optimized using changing buff, for the low random I/O operation.一起写,降低随机磁盘IO的低效率

  • Adaptive hash index: speed lookups = and in operations. In buffer pool

  • compress data and index?

  • truncating file-per-table is faster and can reuse disk usage for operation systems

  • dynamic row format. BLOB and text is stored outside the page of row data, and it will not go into buffer pool so often. default options

best practice

  • set primary key
  • use join and foreign keys. note the data type in foreign keys need to be same type
  • turn off autocommit?
  • group set of related DML statements to transactions
  • not using lock table, use select ... for update to mannual lock one row intend to update.
  • enable innodb_file_per_table.(this is default)


A: transaction, rollback, commit

C: protect data from crash. double write buffer and crash recovery

I: four isolation level.RU, RC , RR , SERIEL

D: 持久性(数据是否落盘),复杂


query rows that are being updated by other transactions, and see the values from before those updates occurred.


rollback segments(include undo logs): 1. for transaction rollback 2 . for consistent read

InnoDB add three fields to each row

  1. ​ DB_TRX_ID: 6 bytes. latest transaction for insert or update’s ID. Delete is looked as update too, although there is a mark for deletion
  2. DB_ROLL_PTR: 7 bytes. roll pointer point to undo log in rollback segments.
  3. DB_ROW_ID: 6 bytes. row id. when not specify primary id, this is the index.

undo log has update undo log and insert undo log. Insert undo log only need when rollback for insert and can be discard when committed. Update undo log also used in consistent read, only after there is no transaction present for which InnoDB has assigned a snapshot that in a consistent read could need the information in the update undo log to build an earlier version of a database row.

commit regularly, prevent the big rollback segments files.

undo log use less space than inserted or updated row, which can be used to calculate rollback segments space.

so when you delete a row, it not physically removed immediately. Only after InnoDB discards the update undo log for the deletion, the row and it’s index records is removed. removed by purge, 可以理解为一个在后台定时GC的任务,主要是把标记删除的index 都移除掉。

已相同的较小速率add and delete , 会导致 Mysql 变慢。因为 删除会有一个过程,且会写到undo log, purge 过程一直在积压

Secondary index in MVCC

​ Records in a clustered index are updated in-place, and their hidden system columns point undo log entries from which earlier versions of records can be reconstructed. Unlike clustered index records, secondary index records do not contain hidden system columns nor are they updated in-place. ????? 主键索引会带上 DB_ROLL_PTR, 次级索引则不会


File System

IN-MOMORY structure

Buffer Pool

  • in main memory, cash data and index after it is accessed. Get once from disk, many from memory.
  • divided into page, and made up of linked list of pages.
  • 使用的时候需要注意,否则在没有使用 where 的情况下查询会更新掉 buffer pool 里面的, read-ahead 也会如此
  • set as large a value as possible of memory.


  • special data structure cached secondary index changed when those pages are not in buffer pool. changes merged later when loaded to buffer poll by read operations.
  • In memory, the chan


  • special data structure cached secondary index changed when those pages are not in buffer pool.merged later to buffer pool when loadsed.
  • In memory, the change buffer occupies part of the buffer pool. On disk, the change buffer is part of the system tablespace, where index changes are buffered when the database server is shut down. ? In momory and disk at the same time?
  • Change buffering is not supported for a secondary index if the index contains a descending index column or if the primary key includes a descending index column.



an area in memory hold data to be saved to disk log file. the content is periodically written to disk. mainly for redo logs, reduce disk io

ON-DISK structure


when using file_per_table, mysql create .ibd file under data directory.

  • row format: dynamic, compressed.


b+ or b tree






mvcc 和 行锁如何支持高性能查询