![Page 1: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/1.jpg)
Conceptual Modeling on Tencent’s Distributed Database Systems
Pan Anqun, Wang Xiaoyu, Li HaixiangTencent Inc.
![Page 2: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/2.jpg)
Outline
• Introduction• System overview of TDSQL• Conceptual Modeling on TDSQL• Applications• Conclusion
![Page 3: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/3.jpg)
Outline
• Introduction• System overview of TDSQL• Conceptual Modeling on TDSQL• Applications• Conclusion
![Page 4: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/4.jpg)
About Tencent
Monthly active user accounts (“MAU”) of QQ reached 803.2 million
Combined MAU of Weixin and WeChat reached 1,057.7 million
More than 20,000 Applications
More than 1 million Database instances
Tencent
QQ WeChat/WeXin
Social Platform Digital Content Financial Service Tencent Cloud
WeChat Pay
QQWallet WeBankVideo
Games Film Music
News
![Page 5: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/5.jpg)
Tencent Billing System
The number of Digital Account increased by 30%-50% year-on-year
![Page 6: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/6.jpg)
Challenges
• Increasing data volume with growth rate of 30~50% per year
• Dealing with money, ACID, zero data loss;
• Core system, 7*24, more than 800 million Yuan per day
• High cost with more than X0,000 database servers
![Page 7: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/7.jpg)
What database does Tencent need?• High availability• Scalability• ACID Transactions• Low cost• High performance• Resource utilization
• Multi-model• SQL• Key-Value
![Page 8: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/8.jpg)
History of TDSQL
2007 2009 2012 2014 2016 2018
7*24
Distributed
Transaction in the
Application Layer
Key-Value
HA
Scalability (SN)
Key-Value
SQL (Limited)
HA
Scalability (SN)
Key-Value
SQL (Limited)
HA
Scalability (SN)
Used By WeBank
Key-Value
SQL
HA
ACID Transactions
Scalability (SN)
Key-Value
SQL
HA
Scalability (SN, SD)
![Page 9: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/9.jpg)
Customers of TDSQL
![Page 10: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/10.jpg)
Outline
• Introduction• System overview of TDSQL• Conceptual Modeling on TDSQL• Applications• Conclusion
![Page 11: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/11.jpg)
Architecture of TDSQLCompute Layer: SQL engine (MySQL compatible), KV Engine (such as Redis), Document Engine (such as MongoDB).
In charge of the following operations1. receiving the request;2. processing the sql related
logic;3. locating the storage address
for storing and computing data through scheduler;
4. exchanging data with storage layer;
5. returning the result.
The engine is Stateless.
![Page 12: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/12.jpg)
Architecture of TDSQL
Storage Layer: Data is organized by Replica Set.
Replica Set: default 3 replicas, Strong consistency replication based on Raft or Asynchronous Replication.
Multiple replica sets can be located in a node.
Table: distributed into multiple replica sets.
![Page 13: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/13.jpg)
Architecture of TDSQL
Scheduler:
Managing the metadata of the
cluster such as the replica set
location of a specific key.
Scheduling replica set in the
storage cluster, such as data
migration, scale out, backup.
Scheduler is a cluster, it needs
to be deployed to an odd
number of nodes.
![Page 14: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/14.jpg)
Outline
• Introduction• System overview of TDSQL• Conceptual Modeling on TDSQL• Applications• Conclusion
![Page 15: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/15.jpg)
High Availability
![Page 16: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/16.jpg)
Raft-based Replication
Log Replication�1. Client send messages;
2. Process messages;
3. Replicate messages;
4. Acknowledge messages;
5. Commit messages;
6. Reply.
Master
Slave Slave
1
2
3 34 4
5 5
5
6
![Page 17: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/17.jpg)
Multi-master or Single-master?
Master Master Master
Storage Storage Storage
Master Slave Slave
Storage Storage Storage
Multi-master Cluster
Weak ConsistencyHigh concurrency
Low latency
Single-master Cluster
Strong ConsistencyHigher latency
Lower concurrency
![Page 18: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/18.jpg)
Global Distribution
Global Region 1
Region 2
Zone 1
Zone 2
Zone 3
Zone 1
Zone 2
Zone 3
To meet the requirements of customers, we can customize the data distribution policy.
Normally, the replication between zones is strong consistency;
The replication between regions is week consistency.
![Page 19: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/19.jpg)
Horizontal Scalability
![Page 20: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/20.jpg)
Auto-Sharding
������� � � ����
� �
� ���
��� ����
������� �
�
�
�
���
����
����
���
�����
1. hash-based partitioning
2. The value of maximum throughput and disk space per set is configured
3. If one set can not server the requirements of application, the set can split into two sets.
![Page 21: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/21.jpg)
Distributed Transactions
Transaction Manager: 1. manage the lifecycle of transactions;2. stateless, transactions can be routed to any
node
Resource managers:1. execute the queries received from the TM2. return results to the TM
Commit Log:1. Global transaction Log
Local Transaction:1. Transaction involved only the single RM2. A distributed transaction may include more
than one LT
![Page 22: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/22.jpg)
Distributed Transactions
TM
Basic algorithm: 2PC
The following steps outline the lifecycle of a
distributed transaction
1. Receive request
2. Assign XID
3. XA start to all set
4. XA prepare to all set
5. Write transaction Commit Log
6. XA Commit to all set
7. Send response
![Page 23: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/23.jpg)
Distributed TransactionsAuto-detect and optimize single-set transactions vs distributed transactions
SINGLE-SET TRANSACTIONS:A transaction that impacts all the rows only are located in a single
replica set.
DISTRIBUTED TRANSACTIONS:A transaction that impacts a set of rows distributed across replica sets
on multiple nodes
Distributed deadlock detectionglobal wait-for-graphs are createddifficult: spot deadlocks since transaction waits for resources across the
networktimers: If a transaction does not finish within this time period, the timer
goes off, indicating a possible deadlock
![Page 24: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/24.jpg)
Low-cost
![Page 25: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/25.jpg)
Resource Scheduling
• Resource utilization• CPU• Memory• Disk
• Shared-nothing VS Shared-disk.
• Multi-level Storage
���
������ ���
������
![Page 26: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/26.jpg)
Shared-nothing vs Shared-Disk
ComputeEngine
ComputeEngine
ComputeEngine
Storage Storage Storage
ComputeEngine
ComputeEngine
ComputeEngine
Storage
Shared-NothingVery large applicationsHigh throughput writes
Need right partitioning strategyWrite-limited(2-phase commit)
Shared-DiskDisaggregated Storage and Compute Architecture
Medium and small applicationsWrite-limited (coordinate lock)
Higher resource utilization
![Page 27: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/27.jpg)
Resource Scheduling
• 3-level storage strategy
Hot
Warm
Cold
Standard Storage: on-line systems, require high availability and high frequency access, ensure 7*24 availability. e.g. Account data.
Infrequent access Storage: ensure medium-level availability. E.g. log data
Archive: planned visit, e.g. database backups.
![Page 28: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/28.jpg)
Resource Scheduling
• 3-level storage strategy
Storage Number of copies HardwareStandard storage 3 copies SSD
Infrequent access storage 2 copies SATAArchive 1.3 copies SATA
Hardware Customization: 1, customized high-density storage cabinets and low-speed disks2, programmable power-off disks with low-power board
Automatic conversion:Transfer data between different storage levels under different lifecycles
![Page 29: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/29.jpg)
Multi-model
![Page 30: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/30.jpg)
Why do we need multi-model?
• Hot Key Problems
• Corporate Account or Hot SKU: very high concurrency
• Relational database : Row lock contention, very low TPS
• Key-value: customization, queuing
![Page 31: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/31.jpg)
Multiple Data Model
SQL E KV E Doc E
Data Conversion
MySQL API Redis API MongoDB API
SQL: standardKV: highest performanceDoc: schemaless;
![Page 32: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/32.jpg)
AI+Database
![Page 33: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/33.jpg)
Data Security�������
� ��� � -
-Pr
oces
s co
ntro
l
����������������������� �
� ������ �
� ��� �
![Page 34: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/34.jpg)
Why do we need AI + Security�
• More than 1,000,000 personal developers and enterprise customers.
• Too many kinds of applications, such as financial technology, smart retail, smart cities, smart transport, mobile payment, etc.
• Rule-based security policy
![Page 35: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/35.jpg)
AI + Security
Risk: SQL injection, data security, etc.
Firewall: restrict access in accordance to the specified type of query, sql model, user, client IP, etc.
AI Training: collect all query logs online;tag all security events in the whole historymodel training by AI teams
![Page 36: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/36.jpg)
Why do we need AI + DBA?
• The growth of business VS the growth of DBA team
• fast troubleshooting
• Rapidly changing business VS Policy-based Monitoring
• Complex business scenarios VS volume performance estimating
• Bad SQL VS performance assurance
![Page 37: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/37.jpg)
AI-DBA
Resource StatusCPU�Mem�network, Disk, etc.
Running StatusActive thread�slow query�lock-
info�SQL time consumption�SQL digest� replication delay ,
etc.
DB parameterBuffer pool, etc.
DB Instance DB Instance DB Instance
Elastic Search
Prediction Online Diagnostics
DB Instance
DB informationTable structure, index info, etc.
Spark
Data Collection
Compute Layer
Diagnosis Layer
![Page 38: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/38.jpg)
Outline
• Introduction• System overview of TDSQL• Conceptual Modeling on TDSQL• Applications• Conclusion
![Page 39: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/39.jpg)
Deployment
Data Safety
Latency Cost
Payment: Data Safety, Latency, Cost
WeChat Moment Feeds: Latency, Cost, Data Safety
Game: Latency, Cost, Data Safety
![Page 40: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/40.jpg)
Deployment
1, When master data center is out of service, the cluster will be automatically switched to the standby data center.Recovery Point Objective = 0Recovery Time Objective less than 40s.
2, When only one server is out of service
a) masterb) slavec) watcher
3, When slave data center is out of service
Region #1 Region #2
![Page 41: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/41.jpg)
Deployment1, Triple DZ in the same city simplify
the synchronization policy and
provide high data availability and
consistency.
2, The failure of one data center
does not affect data services.
3, The production cluster with triple-
DZ provides multi-active services.
4, The entire city can be manually
switched.
Region #1
Region #2
![Page 42: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/42.jpg)
Resource scheduling
Order module: trading orders, data volume can be controlled, need high performance
Settlement Data: all historical data, very large volume, low cost, rarely accessed
Account Data: small volume, very high performance, randomly accessed
![Page 43: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/43.jpg)
Hybrid data analysis
Account Module
Risk Control Module
SQL Key-Value
Hybrid Data Access Protocol (SQL)
Personal Account Data: Relational model
Corporate Account Data: Key-value model
Risk Control Module: analyze both PAD and CAD
![Page 44: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/44.jpg)
Outline
• Introduction• System overview of TDSQL• Conceptual Modeling on TDSQL• Applications• Conclusion
![Page 45: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/45.jpg)
Conclusion
• One Size fits None• HTAP, OLTP, OLAP• Multi-model
• AI + Database
![Page 46: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk](https://reader030.vdocument.in/reader030/viewer/2022041100/5ed7af5086e8a75e3f298f09/html5/thumbnails/46.jpg)
Thanks