https://cloud.google.com/blog/products/databases/strict-serializability-and-external-consistency-in-spanner
> To be externally consistent, a transaction must see the effects of all the transactions that complete before it and none of the effects of transactions that complete after it, in the global serial order
My understanding why distributed transaction need synchronized clock to gain external consistency is the following
Assume NTP max drift is 250ms
real-world time = 100ms
node A local time = 100ms
node B local time = 0ms
the following transaction will run into conflicted sequence transactionA arrive nodeA at (real-world time 100ms, A time 100ms, B time 0ms)
transactionA commit by nodeA at (real-world time 110ms, A time 110ms, B time 10ms)
nodeB gets transactionA replicated at (real-world time 111ms, A time 111ms, B time 11ms) BUT with timestamp (110ms)
transactionB arrive nodeB at (real-world time 120ms, A time 120ms, B time 20ms)
transactionB is commit by nodeB at (real-world time 130ms, A time 130ms, B time 30ms)
now there is a conflict in transaction time between transactionA and transactionB transactionA commit time = 110ms (local of A)
transactionB commit time = 30ms (local of B)
transaction A is processed before transaction B , but because node B lag node A by 100ms transactionB is now considered to be processed before transactionASpanner have atomic clock that guarantees max time drift to be 7ms, and it waits this max uncertainty time of 7ms before commit. But how does that fundamentally solve the issue?
For example https://www.cockroachlabs.com/blog/living-without-atomic-clocks/
> So how does Spanner use TrueTime to provide linearizability given that there are still inaccuracies between clocks? It’s actually surprisingly simple. It waits. Before a node is allowed to report that a transaction has committed, it must wait 7ms. Because all clocks in the system are within 7ms of each other, waiting 7ms means that no subsequent transaction may commit at an earlier timestamp, even if the earlier transaction was committed on a node with a clock which was fast by the maximum 7ms. Pretty clever.
My interpretation from the articles is the following
real-world time = 7ms
node A local time = 7ms
node B local time = 0ms
trxA arrive nodeA at (real-world time 7ms, A time 7ms, B time 0ms)
wait 7ms before commit
trxA commit by nodeA at (real-world time 14ms, A time 14ms, B time 7ms)
trxB arrive nodeB at (real-world time 9ms, A time 9ms, B time 2ms)
wait 7ms before commit
trxB is commit by nodeB at (real-world time 16ms, A time 16ms, B time 9ms)
In this example 7ms wait doesn't solve the problem, trxB is still committed at 9ms (local nodeB time) trxA is committed at 14ms (local nodeA time)The conflict of trxB committed before trxA still exist, time(trxB) < time(trxA) even though trxB occur after trxA in real-world time
What am i missing here?
In your second example, trxA and trxB are concurrent because at real-world time 9ms, trxA hasn't yet committed.
And therein lies the reason why Spanner waited 7ms. Spanner forced those transactions to become concurrent by making trxA wait, thereby making the anomaly of trxA having a greater commit time than trxB due to clock skew between nodeA and nodeB correct as per the semantic guarantees of linearizability (or what Spanner calls "external consistency").
Had it not waited, trxA's timestamp (7ms nodeA time) would've been greater than trxB's timestamp (2ms nodeB time) even though trxA was acked back to the client before trxB was invoked by the client. This would be incorrect behavior.