Industry: E-commerce
Author: Xiao Huang, Senior DBA at Meituan, Leader of TiDB User Group in Beijing
Transcreator: Ran Huang; Editor: Tom Dewan
This article is based on a talk given by Xiao Huang at a TiDB User Group event.
Meituan is a world leading online-to-offline (O2O) local life service platform, connecting more than 240 million consumers and five million local merchants through a comprehensive array of e-commerce services and products. To support our highly concurrent online services, we need a distributed database that can handle large volumes of data.
As more and more distributed databases enter the market, choosing the right one for our applications is not easy. After thoughtful consideration, we chose TiDB, a distributed, scale-out MySQL alternative database.
In this post, I'll share the considerations we had in database selection and introduce how we use TiDB at Meituan to solve our pain points.
One of the biggest challenges our business faces is the extreme high concurrency of online services. To meet this requirement, we focus on four major aspects: stability, performance, cost, and security.
Users may accept an online service being slightly slow, but they would never tolerate frequent downtime. Stability, therefore, is our first priority. In this aspect, the database should:
We want to improve our user experience as much as we can. As long as the system is stable, the faster the system, the better the user experience. So the database must be highly efficient and support:
After stability and performance, the next consideration is the cost.
First, the cost of application. The database should be easily connected to the application, so we can minimize the burden on the application team. Ideally, this process should require little communication or training.
Second, the cost of CPU, memory, and disks. We once researched a scale-up database that required the machine to have 384 GB of memory. A high configuration machine like that would exponentially increase our hardware cost.
Third, the network bandwidth. Some networks connecting remote data centers are charged according to the bandwidth.
To prevent data breach and data loss, the database should have the following three features:
Therefore, we want a database that is stable, efficient, and secure, with low cost. We also hope it's an open source database so that we can get community support when a problem occurs and potentially contribute to the community ourselves.
After comparing the distributed databases on the market, including Amazon Aurora, Spanner, and CockroachDB, we chose TiDB, an open source, MySQL-compatible database that provides horizontal scalability and ACID-compliant transactions.
Currently, Meituan has over 1,700 TiDB nodes and several hundred clusters. The largest cluster has more than 40 nodes, and the largest table stores over 100 billion records. The number of queries each day exceeds 10 billion, with the peak QPS in a single cluster beyond 100 K.
We use TiDB mostly for three scenarios: horizontal scaling, financial-grade data consistency, and data hub.
We choose distributed databases because they can scale out and in easily, without manual sharding. Our business often meets sudden ups and downs. As the traffic surges, if we don't create more shards, latency will rise. When MySQL was still our main database, DBAs had to work with the application team to manually shard the database, which was time consuming and troublesome. After the traffic goes down, merging the shards using Data Transformation Services (DTS) and validating data are also daunting tasks.
The major cause of the sharding issues is that computation and storage are not separated. To solve this problem, we need a database that separates the computing layer from the storage layer, so we can scale the storage capacity and the computing resources by different amounts as needed. That's where TiDB comes in.
TiDB has a multi-layered architecture. It has three major components: TiDB for computing, Placement Driver (PD) for storing metadata and providing timestamp, and TiKV for distributed storage. In this architecture, storage and computing is separate so that each layer can scale independently to handle the fluctuating traffic, without affecting other components.
Financial services have different, if not more stringent, requirements on the database system. At Meituan, the database must support transactions with strong consistency.
Let's first look at a nagging problem we met in MySQL. MySQL 5.7 implements lossless semi-sync replication. In this implementation, MySQL first writes the transaction to the binary logs, which are then sent to a replica. After the replica returns an acknowledgement (ACK), the source database completes the transaction in the storage engine (InnoDB). In other words, the binary logs are already sent to a replica before the application knows the commit is completed. If the source database crashes at this moment and the transaction is committed in the replica, a risk occurs.
Moreover, lossless semi-sync replication doesn't solve data inconsistency. Even if we set the semi-sync timeout to an infinite value, data is not strongly consistent. For example, if the network goes down between the source database and replicas, no replica would be able to receive the ACK.
To resolve this issue, MySQL later introduced MySQL Group Replication (MGR), but its scalability is poor. A group can have at most nine members. In addition, MGR is sensitive to network jitter. A few seconds of network jitter can cause the cluster to change the primary. In addition, MGR's multi-primary mode has so many bugs that the MySQL community mostly uses the single-primary mode to avoid transaction conflict.
MySQL's semi-sync doesn't solve the consistency problem, but TiDB's Multi-Raft protocol does. In the storage layer, data is divided into Regions (a 96 MB range of data), each having multiple replicas to make up a Raft group. Each Raft group has a Leader that processes read and write requests. When a Leader crashes, other replicas in the group elect a new Leader. Even if a single node crashes, the Raft Group doesn't lose any data.
Next, I'll introduce how we use distributed transactions at Meituan for different financial scenarios.
When we perform cross-database data updates, we need cross-database transactions to guarantee data consistency. Take our food delivery order service as an example. Because the customer data and the restaurant data are stored in different databases, when the customer places an order, we need to update both databases. TiDB's cross-database transactions are helpful in this scenario.
In the money transferring service, the data of the sender and the receiver are likely to be located on different shards. When the sender transfers $100 to the receiver, if the transaction succeeds for one shard but fails for another, then money is debited from one account but not credited to another. Therefore, we need atomic transactions to guarantee data consistency.
Another use scenario is service-oriented architecture (SOA). In a microservice architecture, we need distributed transactions to keep data consistent across different services.
If we don't have distributed transactions, keeping data consistent across different services is complicated. Without distributed transactions, a single operation might need to be written into different modules, resulting in multiple writes. In food delivery, when the customer-side cluster goes down but the restaurant-side cluster survives, a validating service detects that an order only exists in the restaurant database but not in the customer database. Next, the order data can be added to the customer database. But such validate-and-add logic adds complexity to the application.
Also, for account services, distributed transactions are essential because:
Thankfully, TiDB solves these difficulties with its distributed transaction model.
Percolator is a system built by Google for incremental processing on a large data set. Inspired by Percolator, the TiDB transaction model is an optimized two-phase commit protocol, which supports pessimistic transactions. With TiDB's distributed transaction model, our R&D engineers are relieved of the complicated logic of sharding and validating and can focus on their own development.
If you plan to use TiDB's transaction model, we have two general tips:
We also use TiDB for the data hub. For massive data, the data usage gradually becomes diverse. Some data that used to be processed by OLAP databases or data warehouses now needs to be stored in TiDB and the results fetched in real time.
In a hotel booking application, we need to fetch lots of data to calculate if the hotel room pricing is competitive. The results must be given in real time, but the calculation can't affect the online service. However, if the data is stored in one database, the analytical engine continuously reads massive data from the database, which might cause OLTP transactions to respond slowly.
To address this problem, we use TiDB Binlog to replicate data to another TiDB cluster. Because TiDB's bottom storage engine, RocksDB, uses the log-structured merge-tree (LSM-tree) structure, which is highly efficient for write operations, we can quickly sync data to a TiDB replica. In this replica, we can perform massive calculations and frequent queries, generate reports, and even build a search engine.
We have many related data scattered across different systems. If we extract these data and pool them together into a single data hub, we can apply it to different services, such as operations reporting and data subscription.
We may also consider TiDB for the following scenarios:
At Meituan, TiDB has helped us build better applications and save costs. So far, we have integrated TiDB into only a small part of our system, but we look forward to exploring more scenarios to tap into its full potential.