Distribute Transaction: basic of 2PC

Distributed Transaction

需求:要么全部提交,要么全都不提交。事务的各个成分都是原子的。

假设:

  1. 每个 Node 记录 Log,但是得不到全局的 Log
  2. 允许 coordinator process 的存在,它可能是事务的发起者,其他的 Node 被视为 participants.
  3. 发送消息的时候,可能需要记录到日志中

2PC 有两个阶段:

  • The commit-request phase (or voting phase)
    • 这个阶段coordinator 试图 prepare. 结果有以下两种
    • “Yes”: commit (if the transaction participant’s local portion execution has ended properly)
    • “No”: abort (if a problem has been detected with the local portion)
  • The commit phase 根据之前收集到的结果来决定是否进行最终的 Commit 或者 abort, 并且把这个消息 broadcast.

上面是 2PC 的初级印象,但是细想起来其实问题蛮多的:

  • participants “可以提交” 是一个什么样的状态?
  • 对于 abort 2PC 应该如何处理这个日志?
  • 并行的 2PC 是什么样子的?对 Lock Manager 有什么影响?

下面详细解释一下。

2PC 的结构

3149FDE05902E361F776579665A3F442

  1. coordinator
    1. <prepare TXN> WAL
    2. broadcast prepare
  2. participants 收到 prepare
    1. 执行事务,写 undo log 和 redo log
    2. 如果操作是成功的,那么事务进入一个 pre-commit 的状态:
      1. 即使 abort 了也能够恢复这个状态
      2. 事务会写入一个 <prepare TXN> 的 log
    3. 如果操作是失败的,那么写入 <don't commit TXN> 并且 abort
    4. 如果以上操作都成功,返回一个 agreement, 否则返回 abort

对于 commit 阶段,实际上逻辑如下

  1. coordinator
    1. 如果都是 agreement, 那么写 <Commit Txn> 的 Log, 否则写 <Abort Txn>
    2. 广播
  2. participants 收到后,执行对应逻辑,完成后返回 ACK

14B5C5E1-CB99-4552-A969-A5A88D95D6F3

2PC 的恢复

  1. 如果 Txn 有 <don't commit TXN> <commit TXN> <abort TXN> 的 Log, 情况应该非常明显了。
  2. 如果是 <ready> 的 log, 需要和其他节点一起决定状况(这意味着协调者的状态是 wait, 而各个节点都是 ready
  3. 如果没有相关的状态,那么至少 abort 它是安全的。

在 wiki 上也可以找到:

Presumed abort or Presumed commit are common such optimizations.[2][3][5] An assumption about the outcome of transactions, either commit, or abort, can save both messages and logging operations by the participants during the 2PC protocol’s execution. For example, when presumed abort, if during system recovery from failure no logged evidence for commit of some transaction is found by the recovery procedure, then it assumes that the transaction has been aborted, and acts accordingly. This means that it does not matter if aborts are logged at all, and such logging can be saved under this assumption. Typically a penalty of additional operations is paid during recovery from failure, depending on optimization type. Thus the best variant of optimization, if any, is chosen according to failure and transaction outcome statistics.

Timeout

The greatest disadvantage of the two-phase commit protocol is that it is a blocking protocol. If the coordinator fails permanently, some participants will never resolve their transactions: After a participant has sent an agreement message to the coordinator, it will block until a commit or rollback is received.

如果协调者收到了一个 abort, 其实还蛮欢喜的:那肯定大家一起 abort 。但是如果哪个机器憋着没消息了,哦豁,你得一直 block 着。

所以 2PC 一定程度上可能需要引入 timeout:

对于协调者来说如果在指定时间内没有收到所有参与者的应答,则可以自动退出 WAIT 状态,并向所有参与者发送 rollback 通知。对于参与者来说如果位于 READY 状态,但是在指定时间内没有收到协调者的第二阶段通知,则不能武断地执行 rollback 操作,因为协调者可能发送的是 commit 通知,这个时候执行 rollback 就会导致数据不一致。

此时,我们可以介入互询机制,让参与者 A 去询问其他参与者 B 的执行情况。如果 B 执行了 rollback 或 commit 操作,则 A 可以大胆的与 B 执行相同的操作;如果 B 此时还没有到达 READY 状态,则可以推断出协调者发出的肯定是 rollback 通知;如果 B 同样位于 READY 状态,则 A 可以继续询问另外的参与者。只有当所有的参与者都位于 READY 状态时,此时两阶段提交协议无法处理,将陷入长时间的阻塞状态。

3PC 和 Pre Commit

我们之前说到,全是ready log 是一个奇怪的状态。这个时候一定程度上可以引入 3PC:

https://en.wikipedia.org/wiki/Three-phase_commit_protocol

A two-phase commit protocol cannot dependably recover from a failure of both the coordinator and a cohort member during the Commit phase. If only the coordinator had failed, and no cohort members had received a commit message, it could safely be inferred that no commit had happened. If, however, both the coordinator and a cohort member failed, it is possible that the failed cohort member was the first to be notified, and had actually done the commit. Even if a new coordinator is selected, it cannot confidently proceed with the operation until it has received an agreement from all cohort members, and hence must block until all cohort members respond.

The three-phase commit protocol eliminates this problem by introducing the Prepared to commit state. If the coordinator fails before sending preCommit messages, the cohort will unanimously agree that the operation was aborted. The coordinator will not send out a doCommit message until all cohort members have ACKed that they are Prepared to commit. This eliminates the possibility that any cohort member actually completed the transaction before all cohort members were aware of the decision to do so (an ambiguity that necessitated indefinite blocking in the two-phase commit protocol).

Prepared to commit 这个状态表示, precommit 成功了就只能 commit 了,否则是应该 abort 的。

Reference