Author: Zhenchi Zhong (Software Engineer Intern at PingCAP)
Transcreator: Charlotte Liu; Editor: Tom Dewan
TiKV is a distributed key-value storage engine, which is based on the designs of Google Spanner, F1, and HBase. However, TiKV is much simpler to manage because it does not depend on a distributed file system.
As introduced in A Deep Dive into TiKV and How TiKV Reads and Writes, TiKV applies a 2-phase commit (2PC) algorithm inspired by Google Percolator to support distributed transactions. These two phases are Prewrite
and Commit
.
In this article, I'll explore the execution workflow of a TiKV request in the prewrite phase, and give a top-down description of how the prewrite request of an optimistic transaction is executed within the multiple modules of the Region leader. This information will help you clarify the resource usage of TiKV requests and learn about the related source code in TiKV.
When TiKV is initialized, it creates different types of worker threads based on its configuration. Once these worker threads are created, they continuously fetch and execute tasks in a loop. These worker threads are generally paired with associated task queues. Therefore, by submitting various tasks to different task queues, you can execute some processes asynchronously. The following diagram provides a simple illustration of this work model:
A TiKV prewrite request begins with a gRPC prewrite request from the network appearing on the gRPC server thread. The following figure shows the workflow in this phase.
It may not be easy to understand just by looking at the figure, so below I will describe exactly what the gRPC thread does step by step. The number of each step links to the corresponding source code:
cid
that can be used to index the gRPC notify callback. The cid
is globally unique throughout the execution process of the prewrite request.cid
, Mutations, and the transaction scheduler.The gRPC thread has completed its mission. The rest of the story will continue in the raftstore thread.
Before I introduce this phase, I'd like to talk about the batch system. It is the cornerstone of TiKV's multi-raft implementation.
In TiKV, the raftstore thread and the apply thread are instances of the batch system. These two worker threads also operate in a fixed loop pattern, which is in line with the work model mentioned above.
The raftstore thread and the apply thread go through three phases in a loop: the collect messages phase, the handle messages phase, and the process I/O phase. The following sections describe these phases in detail.
Now, let's come back to the story.
When the Raft command is sent to the peer it belongs to, the command is stored in the peer's mailbox. In the collect message phase (the green part of the circle above), the raftstore thread collects several peers with messages in their mailboxes and processes them together in the handle messages phase (the purple part of the circle).
In this prewrite request example, the Raft command with the Raft Read Index request is stored in the peer's mailbox. After a raftstore thread collects the peer at step 1, the raftstore thread enters the handle messages phase.
The following are the corresponding steps that the raftstore thread performs in the handle message phase:
step
function.After the raftstore thread processes all the peers’ messages, it comes to the last phase of a single loop: the process I/O phase (the blue part of the circle). In this phase, the raftstore thread sends network messages stored in the message buffer to the other TiKV nodes of the cluster via the network interface at step 8.
This concludes the Read Propose phase. Before the prewrite request can make progress, it must wait for other TiKV nodes to respond.
After a “long” wait (a few milliseconds are actually long for a computer), the TiKV node that sends the network message finally receives responses from other follower nodes, and saves the reply messages in the peer's mailbox. Now the prewrite request enters the Read Apply phase. The following figure shows the workflow in this phase:
The hardworking raftstore thread notices that there is a message waiting to be processed in this peer's mailbox, so the thread's behavior in this phase is as follows:
step
function.cid
, and Mutations. Then, the raftstore thread sends the tasks to the transaction worker threads according to the information recorded by the transaction scheduler.This concludes the Read Apply phase. Next, it's the transaction worker's turn.
In this phase, when a transaction worker in the schedule worker pool receives the task sent by the raftstore thread at step 1, the worker starts processing the task at step 2 by splitting the task into the KV snapshot, the Mutations, and the cid
.
The main logic of the transaction layer now comes into play. It includes the following steps performed by the transaction worker thread:
cid
.The transaction layer logic ends here. This Raft command contains write operations. If the command runs successfully, the prewrite request is successful.
Now it's time for the raftstore thread to propose the write operations. The following figure shows how the raftstore thread processes the Raft command in this phase.
The first three steps in this phase are the same as those in the previous sections. I won't repeat them here. Let's go through the remaining steps:
The Write Propose phase is over. Now, as with the end of the Read Propose phase, the leader node must wait for responses from other TiKV nodes before it moves on to the next phase.
After another “long” wait, follower nodes respond to the Leader node and bring the prewrite request to the Write Commit phase.
With the Write Commit phase coming to an end, the raftstore thread completes all its tasks. Next, the baton is handed over to the apply thread.
This is the most critical phase for a prewrite request, in which the thread actually writes to the KV engine.
After the apply thread receives the apply task sent by the raftstore thread at steps 1 and 2, it continues with the following steps in the handle messages phase (the purple part of the circle):
Then, in the next phase (process I/O), the apply thread takes the following steps:
When a callback is invoked, the transaction scheduler sends the task with the cid
to the transaction worker at step 9, bringing us to the final part of the story.
This is the final phase of the prewrite process. TiKV returns the execution result of the prewrite request to the client.
The workflow in this phase is mainly performed by the transaction worker:
cid
sent by the transaction scheduler.cid
.This article introduces the eight phases of a successful prewrite request and focuses on the workflow within each phase. I hope this post can help you clarify the resource usage of TiKV requests and give you a deeper understanding of TiKV.
For more TiKV implementation details, see the TiKV documentation and deep dive. If you have any questions or ideas, feel free to join the TiKV Transaction SIG and share them with us!