Paxos Made Simple 论文

这是 Paxos Made Simple 的主要部分，描述一致性问题并讲解为了保证一致性而应该遵循约束，进而提出算法的主要流程以及实现中可以优化的点。The next section shows that this consensus algorithm follows almost unavoidably from the properties we want it to satisfy.

2.1 The Problem

假设一组进程可以 propose value。

A consensus algorithm ensures that a single one among the proposed values is chosen. 一致性算法保证其中唯一一个值会被选中。

如果没有 propose value ，也没有值会被选中。如果一个值被选中，所有的进程都会接收到这个值。一致性的最基本要求如下：

Only a value that has been proposed may be chosen
Only a single value is chosen, and
A process never learns that a value has been chosen unless it actually has been. 如果没有被选中，进程不能获知一个值已经被选中。

We wont try to specify precise liveness requirements. 我们不会试图去精确的指定节点存活的要求。

However, the goal is to ensure that some proposed value is eventually chosen and, if a value has been chosen, then a process can eventually learn the value.

We let the three roles in the consensus algorithm be performed by three classes of agents: proposers, acceptors, and learners.

In an implementation, a single process may act as more than one agent, but the mapping from agents to processes does not concern us here. 在算法的实现中，一个进程可以扮演多个角色，但是角色与进程的映射关系在这里不需要我们关心。

Assume that agents can communicate with one another by sending messages.

We use the customary asynchronous, non-Byzantine model, in which:

Agents operate at arbitrary speed, may fail by stopping, and may restart. 角色的操作可以是任意速率，也可能中断，可能重置。

Since all agents may fail after a value is chosen and then restart, a solution is impossible unless some information can be remembered by an agent that has failed and restarted. 由于所有的角色可能在选中值之后失效重启，一些信息在角色失效或者重启之后必须仍然存在。
Messages can take arbitrarily long to be delivered, can be duplicated, and can be lost, but they are not corrupted.

2.2 Choosing a Value

The easiest way to choose a value is to have a single acceptor agent. A proposer sends a proposal to the acceptor, who chooses the first proposed value that it receives. Although simple, this solution is unsatisfactory because the failure of the acceptor makes any further progress impossible.

So, let’s try another way of choosing a value. Instead of a single acceptor, let’s use multiple acceptor agents. A proposer sends a proposed value to a set of acceptors. An acceptor may accept the proposed value. The value is chosen when a large enough set of acceptors have accepted it. How large is large enough?

To ensure that only a single value is chosen, we can let a large enough set consist of any majority of the agents. Because any two majorities have at least one acceptor in common, this works if an acceptor can accept at most one value. 为了保证只有一个值被选中，那么我们选择 majority 的角色。因为在一个 acceptor 只能接收至多一个值的情况下，两个 majorities 至少有一个 acceptor 是共有的。

(There is an obvious generalization of a majority that has been observed in numerous papers, apparently starting with [3].)

In the absence of failure or message loss, we want a value to be chosen even if only one value is proposed by a single proposer. 在故障节点缺失和消息丢失的情况下，我们希望只有一个值会被选中，就算是只有一个 Proposer 提议了一个值。

This suggests the requirement:

P1. An acceptor must accept the first proposal that it receives. 必须接收第一个 Proposal。

But this requirement raises a problem. Several values could be proposed by different proposers at about the same time, leading to a situation in which every acceptor has accepted a value, but no single value is accepted by a majority of them. 多个值同时被多个 Proposer 提出，没有一个达到 majority。

Even with just two proposed values, if each is accepted by about half the acceptors, failure of a single acceptor could make it impossible to learn which of the values was chosen.

P1 and the requirement that a value is chosen only when it is accepted by a majority of acceptors imply that an acceptor must be allowed to accept more than one proposal. P1 和一个选定的值必须被 majority 接收表明一个 acceptor 允许接收多个 Proposal。

We keep track of the different proposals that an acceptor may accept by assigning a (natural) number to each proposal, so a proposal consists of a proposal number and a value.

To prevent confusion, we require that different proposals have different numbers.

How this is achieved depends on the implementation, so for now we just assume it. 怎样达到这个效果取决于实现，我们只是假设可以这样。

A value is chosen when a single proposal with that value has been accepted by a majority of the acceptors. In that case, we say that the proposal (as well as its value) has been chosen. 一个 Proposal 被 acceptor 中的大多数 accept，那么可以说这个Proposal（也就是 value）被 chosen。

We can allow multiple proposals to be chosen, but we must guarantee that all chosen proposals have the same value. 我们允许多个 Proposal 被选中，但是必须保证多个 Proposal 中的值是一样的。

By induction on the proposal number, it suffices to guarantee: 基于对 Proposal 序数的归纳，容易保证以下条件

P2. If a proposal with value v is chosen, then every higher-numbered proposal that is chosen has value v. 如果一个值为 v 的 Proposal 被选中，那么更高序号的被选中的 Proposal 都有值 v.

Since numbers are totally ordered, condition P2 guarantees the crucial safety property that only a single value is chosen.

To be chosen, a proposal must be accepted by at least one acceptor. So, we can satisfy P2 by satisfying:

P2a . If a proposal with value v is chosen, then every higher-numbered proposal accepted by any acceptor has value v.

We still maintain P1 to ensure that some proposal is chosen.

Because communication is asynchronous, a proposal could be chosen with some particular acceptor c never having received any proposal. Suppose a new proposer “wakes up” and issues a higher-numbered proposal with a different value.

P1 requires c to accept this proposal, violating P2a . Maintaining both P1 and P2a requires strengthening P2a to:

P2b . If a proposal with value v is chosen, then every higher-numbered proposal issued by any proposer has value v.

Since a proposal must be issued by a proposer before it can be accepted by an acceptor, P2b implies P2a , which in turn implies P2. 因为一个 Proposal 在被 Acceptor 接收前必须由一个 Proposor 提出，P2b 隐含 P2a，P2a 隐含 P2.

To discover how to satisfy P2b , let’s consider how we would prove that it holds. 为了找到如何满足 P2b，我们来证明它是成立的。

We would assume that some proposal with number m and value v is chosen and show that any proposal issued with number n > m also has value v. 我们假设序号为 m，值为 v 的 Proposal 已经被 chosen，并且任何序号 n > m 的 Proposal 同样包含值 v。

We would make the proof easier by using induction on n, so we can prove that proposal number n has value v under the additional assumption that every proposal issued with a number in m . .(n − 1) has value v, where i . . j denotes the set of numbers from i through j. 对 n 归纳可以简化证明，根据条件：每个提出的议案（编号从 m 到 n-1）的决议都是 v，我们可以证明编号为 n 的议案的决议是v。

For the proposal numbered m to be chosen, there must be some set C consisting of a majority of acceptors such that every acceptor in C accepted it. 对于选择的议案（编号为 m），必定存在一个集合 C（acceptor的多数派），C 中的每个 acceptor 都批准了该议案。

Combining this with the induction assumption, the hypothesis that m is chosen implies: 结合归纳假设，m 被选择这一前提意味着：

Every acceptor in C has accepted a proposal with number in m . .(n − 1), and every proposal with number in m . .(n − 1) accepted by any acceptor has value v.

Since any set S consisting of a majority of acceptors contains at least one member of C , we can conclude that a proposal numbered n has value v by ensuring that the following invariant is maintained:

P2c . For any v and n, if a proposal with value v and number n is issued, then there is a set S consisting of a majority of acceptors such that either (a) no acceptor in S has accepted any proposal numbered less than n, or (b) v is the value of the highest-numbered proposal among all proposals numbered less than n accepted by the acceptors in S. (a) 要么 S 中没有 acceptor 接收过序数小于 n 的 proposal，(b) 要么 v 是在 S 中的 acceptor 接收的所有小于 n 的序数最大的 proposal 的值。

We can therefore satisfy P2b by maintaining the invariance of P2c .

To maintain the invariance of P2c , a proposer that wants to issue a proposal numbered n must learn the highest-numbered proposal with number less than n, ,if any, that has been or will be accepted by each acceptor in some majority of acceptors. 为了维护 P2c，Proposer 想要提出一个序数为 n 的 Proposal 时，必须知道小于 n 的最大序号的 proposal，如果有，那么这个 Proposal 已经或者将要被 chosen。

Learning about proposals already accepted is easy enough; predicting future acceptances is hard.

Instead of trying to predict the future, the proposer controls it by extracting a promise that there won`t be any such acceptances. Proposer 通过获得一个 Promise 保证后面不会出现那种获得多数 accept 的情况（也就是保证 P2c 中的 a）.

In other words, the proposer requests that the acceptors not accept any more proposals numbered less than n. This leads to the following algorithm for issuing proposals. 换句话说，Proposer 要求 Acceptors 不会接收序号小于 n 的 Proposal。

This leads to the following algorithm for issuing proposals.

A proposer chooses a new proposal number n and sends a request to each member of some set of acceptors, asking it to respond with: Proposer 选择一个新的 Proposal 并将请求发送给 acceptor 的时候，要求 acceptor 响应如下：

(a) A promise never again to accept a proposal numbered less than n, and 不在接受小于该 Proposal 序数的请求
(b) The proposal with the highest number less than n that it has accepted, if any. 如果曾经 accept 任意序数大于当前请求的 Proposal，就将其返回。

I will call such a request a prepare request with number n 这个可以称为 prepare request

If the proposer receives the requested responses from a majority of the acceptors, then it can issue a proposal with number n and value v, where v is the value of the highest-numbered proposal among the responses, or is any value selected by the proposer if the responders reported no proposals. 如果 Proposer 接收到多数 acceptor 的响应，那么它就可以提出一个序数 n 以及值为 v 的 Proposal，v 可能是所有响应里序数最大的那一个的值，如果没有响应之前的 Proposal，也可以是自己提出的值。

A proposer issues a proposal by sending, to some set of acceptors, a request that the proposal be accepted. (This need not be the same set of acceptors that responded to the initial requests.) Let’s call this an accept request. Proposer 发送 Proposal 到一组 acceptor（不一定是响应初始请求的那一组），请求他们接受 Proposal。我们称之为 accept request.

This describes a proposer’s algorithm. What about an acceptor? It can receive two kinds of requests from proposers: prepare requests and accept requests.

An acceptor can ignore any request without compromising safety. acceptor 可以忽略任何请求，不必让步于 safety（safety 可以认为是保障达成一致性，不太好翻译）。

So, we need to say only when it is allowed to respond to a request. It can always respond to a prepare request. It can respond to an accept request, accepting the proposal, iff it has not promised not to. In other words:

P1a . An acceptor can accept a proposal numbered n iff it has not responded to a prepare request having a number greater than n. acceptor 可以接受序数为 n 的 proposal，如果它之前没有回复任何序数大于 n 的 prepare request.

Observe that P1a subsumes P1

We now have a complete algorithm for choosing a value that satisfies the required safety properties—assuming unique proposal numbers. 假设我们实现了不重复的 Proposal 序数，现在已经有一个可以获取一个值的完整算法。

The final algorithm is obtained by making one small optimization. 最终的算法只是在这基础之上做了一个小小的优化。

Suppose an acceptor receives a prepare request numbered n, but it has already responded to a prepare request numbered greater than n, thereby promising not to accept any new proposal numbered n. There is then no reason for the acceptor to respond to the new prepare request, since it will not accept the proposal numbered n that the proposer wants to issue. So we have the acceptor ignore such a prepare request. We also have it ignore a prepare request for a proposal it has already accepted.

With this optimization, an acceptor needs to remember only the highestnumbered proposal that it has ever accepted and the number of the highestnumbered prepare request to which it has responded. Because P2c must be kept invariant regardless of failures, an acceptor must remember this information even if it fails and then restarts. Note that the proposer can always abandon a proposal and forget all about it—as long as it never tries to issue another proposal with the same number.

Putting the actions of the proposer and acceptor together, we see that the algorithm operates in the following two phases.

Phase 1. (a) A proposer selects a proposal number n and sends a prepare request with number n to a majority of acceptors. (b) If an acceptor receives a prepare request with number n greater than that of any prepare request to which it has already responded, then it responds to the request with a promise not to accept any more proposals numbered less than n and with the highest-numbered proposal (if any) that it has accepted.

Phase 2. (a) If the proposer receives a response to its prepare requests (numbered n) from a majority of acceptors, then it sends an accept request to each of those acceptors for a proposal numbered n with a value v, where v is the value of the highest-numbered proposal among the responses, or is any value if the responses reported no proposals. (b) If an acceptor receives an accept request for a proposal numbered n, it accepts the proposal unless it has already responded to a prepare request having a number greater than n.

A proposer can make multiple proposals, so long as it follows the algorithm for each one. It can abandon a proposal in the middle of the protocol at any time. (Correctness is maintained, even though requests and/or responses for the proposal may arrive at their destinations long after the proposal was abandoned.) It is probably a good idea to abandon a proposal if some proposer has begun trying to issue a higher-numbered one. Therefore, if an acceptor ignores a prepare or accept request because it has already received a prepare request with a higher number, then it should probably inform the proposer, who should then abandon its proposal. This is a performance optimization that does not affect correctness. 一个 proposer 可以生成多个 proposal，只要它能满足算法的要求。它可以在协议的任意时刻放弃 proposal。（正确性依然是得到满足的，即使请求或者回复在对应的 proposal 被放弃了很久之后才到达目的地）当其他的 proposer 已经开始发出更高 numbe r的 proposal 时，最好放弃当前的 proposal。因此。如果 acceptor 因为它已经接受了更高 number 的 prepare request 而忽略了其他的 prepare 或者 accept request，那么它应该通知对应的 proposer 放弃该 proposal。这是对性能优化，并不会影响正确性。

2.3 Learning a Chosen Value

To learn that a value has been chosen, a learner must find out that a proposal has been accepted by a majority of acceptors. The obvious algorithm is to have each acceptor, whenever it accepts a proposal, respond to all learners, sending them the proposal. This allows learners to find out about a chosen value as soon as possible, but it requires each acceptor to respond to each learner—a number of responses equal to the product of the number of acceptors and the number of learners. 为了获取一个已经被选中的 value，learner 必须要确定已经有一个 proposal 被 majority 接受了。最显而易见的算法就是让每个acceptor 在接受了一个 proposal 之后向所有的 learner 发送这个 proposal。这能让 learner 尽快地找到被选中的 value，但这需要 acceptor 对每个 learner 进行回复——回复的数量为 acceptor 和 learner 数量的乘积。

The assumption of non-Byzantine failures makes it easy for one learner to find out from another learner that a value has been accepted. 在 non-Byzantine 的假设，一个 learner 可以从另外一个 learn 得知一个值已经被接收。

We can have the acceptors respond with their acceptances to a distinguished learner, which in turn informs the other learners when a value has been chosen. This approach requires an extra round for all the learners to discover the chosen value. It is also less reliable, since the distinguished learner could fail. But it requires a number of responses equal only to the sum of the number of acceptors and the number of learners. 我们可以让 acceptor 将批准状况返回给一个主 learner，然后再由此 learner 通知其他 learner 有一个值已经被选定。这个方案要求额外的一轮通讯。这个过程同样不可靠的，因为主 learner 可能会挂掉。但是这样只要求等同于 acceptors 和 learner 数之和的相应。

More generally, the acceptors could respond with their acceptances to some set of distinguished learners, each of which can then inform all the learners when a value has been chosen. Using a larger set of distinguished learners provides greater reliability at the cost of greater communication complexity. 更通用的情况，acceptor 可以与一组主 learner 通讯，每一个都可以通知其他 learner 值被选中的消息。主 learner 的数量越多，通讯越复杂，过程越可靠。

Because of message loss, a value could be chosen with no learner ever finding out. The learner could ask the acceptors what proposals they have accepted, but failure of an acceptor could make it impossible to know whether or not a majority had accepted a particular proposal. In that case, learners will find out what value is chosen only when a new proposal is chosen. If a learner needs to know whether a value has been chosen, it can have a proposer issue a proposal, using the algorithm described above. 由于消息丢失，可能没有 learner 知道选择了一个决议。Learner 可以向 acceptor 询问批准的议案，但是由于 acceptor 的失效，可能难以得知多数派是否批准了一个议案。这样，learner 只能在新的议案被选择时才能知道 acceptor 选择的决议。如果 learner 需要知道是否已经选择了一个决议，它可以让 proposer 根据上面的算法提出一个议案【提出请求就有回应，并且新的提案的决议就是当前选择的决议】。

2.4 Progress

It’s easy to construct a scenario in which two proposers each keep issuing a sequence of proposals with increasing numbers, none of which are ever chosen. Proposer p completes phase 1 for a proposal number n1. Another proposer q then completes phase 1 for a proposal number n2 > n1. Proposer p’s phase 2 accept requests for a proposal numbered n1 are ignored because the acceptors have all promised not to accept any new proposal numbered less than n2. So, proposer p then begins and completes phase 1 for a new proposal number n3 > n2, causing the second phase 2 accept requests of proposer q to be ignored. And so on.

To guarantee progress, a distinguished proposer must be selected as the only one to try issuing proposals. If the distinguished proposer can communicate successfully with a majority of acceptors, and if it uses a proposal with number greater than any already used, then it will succeed in issuing a proposal that is accepted. By abandoning a proposal and trying again if it learns about some request with a higher proposal number, the distinguished proposer will eventually choose a high enough proposal number.

If enough of the system (proposer, acceptors, and communication network) is working properly, liveness can therefore be achieved by electing a single distinguished proposer. The famous result of Fischer, Lynch, and Patterson [1] implies that a reliable algorithm for electing a proposer must use either randomness or real time—for example, by using timeouts. However, safety is ensured regardless of the success or failure of the election.

2.5 The Implementation

The Paxos algorithm [5] assumes a network of processes. In its consensus algorithm, each process plays the role of proposer, acceptor, and learner. The algorithm chooses a leader, which plays the roles of the distinguished proposer and the distinguished learner. The Paxos consensus algorithm is precisely the one described above, where requests and responses are sent as ordinary messages. (Response messages are tagged with the corresponding proposal number to prevent confusion.) Stable storage, preserved during failures, is used to maintain the information that the acceptor must remember. An acceptor records its intended response in stable storage before actually sending the response.

All that remains is to describe the mechanism for guaranteeing that no two proposals are ever issued with the same number. Different proposers choose their numbers from disjoint sets of numbers, so two different proposers never issue a proposal with the same number. Each proposer remembers (in stable storage) the highest-numbered proposal it has tried to issue, and begins phase 1 with a higher proposal number than any it has already used.

Blog

Paxos Made Simple 论文