conceptual modeling on tencent's distributed database ... · very large applications high...

46
Conceptual Modeling on Tencent’s Distributed Database Systems Pan Anqun, Wang Xiaoyu, Li Haixiang Tencent Inc.

Upload: others

Post on 30-May-2020

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Conceptual Modeling on Tencent’s Distributed Database Systems

Pan Anqun, Wang Xiaoyu, Li HaixiangTencent Inc.

Page 2: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Outline

• Introduction• System overview of TDSQL• Conceptual Modeling on TDSQL• Applications• Conclusion

Page 3: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Outline

• Introduction• System overview of TDSQL• Conceptual Modeling on TDSQL• Applications• Conclusion

Page 4: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

About Tencent

Monthly active user accounts (“MAU”) of QQ reached 803.2 million

Combined MAU of Weixin and WeChat reached 1,057.7 million

More than 20,000 Applications

More than 1 million Database instances

Tencent

QQ WeChat/WeXin

Social Platform Digital Content Financial Service Tencent Cloud

WeChat Pay

QQWallet WeBankVideo

Games Film Music

News

Page 5: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Tencent Billing System

The number of Digital Account increased by 30%-50% year-on-year

Page 6: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Challenges

• Increasing data volume with growth rate of 30~50% per year

• Dealing with money, ACID, zero data loss;

• Core system, 7*24, more than 800 million Yuan per day

• High cost with more than X0,000 database servers

Page 7: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

What database does Tencent need?• High availability• Scalability• ACID Transactions• Low cost• High performance• Resource utilization

• Multi-model• SQL• Key-Value

Page 8: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

History of TDSQL

2007 2009 2012 2014 2016 2018

7*24

Distributed

Transaction in the

Application Layer

Key-Value

HA

Scalability (SN)

Key-Value

SQL (Limited)

HA

Scalability (SN)

Key-Value

SQL (Limited)

HA

Scalability (SN)

Used By WeBank

Key-Value

SQL

HA

ACID Transactions

Scalability (SN)

Key-Value

SQL

HA

Scalability (SN, SD)

Page 9: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Customers of TDSQL

Page 10: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Outline

• Introduction• System overview of TDSQL• Conceptual Modeling on TDSQL• Applications• Conclusion

Page 11: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Architecture of TDSQLCompute Layer: SQL engine (MySQL compatible), KV Engine (such as Redis), Document Engine (such as MongoDB).

In charge of the following operations1. receiving the request;2. processing the sql related

logic;3. locating the storage address

for storing and computing data through scheduler;

4. exchanging data with storage layer;

5. returning the result.

The engine is Stateless.

Page 12: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Architecture of TDSQL

Storage Layer: Data is organized by Replica Set.

Replica Set: default 3 replicas, Strong consistency replication based on Raft or Asynchronous Replication.

Multiple replica sets can be located in a node.

Table: distributed into multiple replica sets.

Page 13: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Architecture of TDSQL

Scheduler:

Managing the metadata of the

cluster such as the replica set

location of a specific key.

Scheduling replica set in the

storage cluster, such as data

migration, scale out, backup.

Scheduler is a cluster, it needs

to be deployed to an odd

number of nodes.

Page 14: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Outline

• Introduction• System overview of TDSQL• Conceptual Modeling on TDSQL• Applications• Conclusion

Page 15: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

High Availability

Page 16: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Raft-based Replication

Log Replication�1. Client send messages;

2. Process messages;

3. Replicate messages;

4. Acknowledge messages;

5. Commit messages;

6. Reply.

Master

Slave Slave

1

2

3 34 4

5 5

5

6

Page 17: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Multi-master or Single-master?

Master Master Master

Storage Storage Storage

Master Slave Slave

Storage Storage Storage

Multi-master Cluster

Weak ConsistencyHigh concurrency

Low latency

Single-master Cluster

Strong ConsistencyHigher latency

Lower concurrency

Page 18: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Global Distribution

Global Region 1

Region 2

Zone 1

Zone 2

Zone 3

Zone 1

Zone 2

Zone 3

To meet the requirements of customers, we can customize the data distribution policy.

Normally, the replication between zones is strong consistency;

The replication between regions is week consistency.

Page 19: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Horizontal Scalability

Page 20: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Auto-Sharding

������� � � ����

� �

� ���

��� ����

������� �

���

����

����

���

�����

1. hash-based partitioning

2. The value of maximum throughput and disk space per set is configured

3. If one set can not server the requirements of application, the set can split into two sets.

Page 21: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Distributed Transactions

Transaction Manager: 1. manage the lifecycle of transactions;2. stateless, transactions can be routed to any

node

Resource managers:1. execute the queries received from the TM2. return results to the TM

Commit Log:1. Global transaction Log

Local Transaction:1. Transaction involved only the single RM2. A distributed transaction may include more

than one LT

Page 22: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Distributed Transactions

TM

Basic algorithm: 2PC

The following steps outline the lifecycle of a

distributed transaction

1. Receive request

2. Assign XID

3. XA start to all set

4. XA prepare to all set

5. Write transaction Commit Log

6. XA Commit to all set

7. Send response

Page 23: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Distributed TransactionsAuto-detect and optimize single-set transactions vs distributed transactions

SINGLE-SET TRANSACTIONS:A transaction that impacts all the rows only are located in a single

replica set.

DISTRIBUTED TRANSACTIONS:A transaction that impacts a set of rows distributed across replica sets

on multiple nodes

Distributed deadlock detectionglobal wait-for-graphs are createddifficult: spot deadlocks since transaction waits for resources across the

networktimers: If a transaction does not finish within this time period, the timer

goes off, indicating a possible deadlock

Page 24: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Low-cost

Page 25: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Resource Scheduling

• Resource utilization• CPU• Memory• Disk

• Shared-nothing VS Shared-disk.

• Multi-level Storage

���

������ ���

������

Page 26: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Shared-nothing vs Shared-Disk

ComputeEngine

ComputeEngine

ComputeEngine

Storage Storage Storage

ComputeEngine

ComputeEngine

ComputeEngine

Storage

Shared-NothingVery large applicationsHigh throughput writes

Need right partitioning strategyWrite-limited(2-phase commit)

Shared-DiskDisaggregated Storage and Compute Architecture

Medium and small applicationsWrite-limited (coordinate lock)

Higher resource utilization

Page 27: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Resource Scheduling

• 3-level storage strategy

Hot

Warm

Cold

Standard Storage: on-line systems, require high availability and high frequency access, ensure 7*24 availability. e.g. Account data.

Infrequent access Storage: ensure medium-level availability. E.g. log data

Archive: planned visit, e.g. database backups.

Page 28: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Resource Scheduling

• 3-level storage strategy

Storage Number of copies HardwareStandard storage 3 copies SSD

Infrequent access storage 2 copies SATAArchive 1.3 copies SATA

Hardware Customization: 1, customized high-density storage cabinets and low-speed disks2, programmable power-off disks with low-power board

Automatic conversion:Transfer data between different storage levels under different lifecycles

Page 29: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Multi-model

Page 30: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Why do we need multi-model?

• Hot Key Problems

• Corporate Account or Hot SKU: very high concurrency

• Relational database : Row lock contention, very low TPS

• Key-value: customization, queuing

Page 31: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Multiple Data Model

SQL E KV E Doc E

Data Conversion

MySQL API Redis API MongoDB API

SQL: standardKV: highest performanceDoc: schemaless;

Page 32: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

AI+Database

Page 33: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Data Security�������

� ��� � -

-Pr

oces

s co

ntro

l

����������������������� �

� ������ �

� ��� �

Page 34: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Why do we need AI + Security�

• More than 1,000,000 personal developers and enterprise customers.

• Too many kinds of applications, such as financial technology, smart retail, smart cities, smart transport, mobile payment, etc.

• Rule-based security policy

Page 35: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

AI + Security

Risk: SQL injection, data security, etc.

Firewall: restrict access in accordance to the specified type of query, sql model, user, client IP, etc.

AI Training: collect all query logs online;tag all security events in the whole historymodel training by AI teams

Page 36: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Why do we need AI + DBA?

• The growth of business VS the growth of DBA team

• fast troubleshooting

• Rapidly changing business VS Policy-based Monitoring

• Complex business scenarios VS volume performance estimating

• Bad SQL VS performance assurance

Page 37: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

AI-DBA

Resource StatusCPU�Mem�network, Disk, etc.

Running StatusActive thread�slow query�lock-

info�SQL time consumption�SQL digest� replication delay ,

etc.

DB parameterBuffer pool, etc.

DB Instance DB Instance DB Instance

Elastic Search

Prediction Online Diagnostics

DB Instance

DB informationTable structure, index info, etc.

Spark

Data Collection

Compute Layer

Diagnosis Layer

Page 38: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Outline

• Introduction• System overview of TDSQL• Conceptual Modeling on TDSQL• Applications• Conclusion

Page 39: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Deployment

Data Safety

Latency Cost

Payment: Data Safety, Latency, Cost

WeChat Moment Feeds: Latency, Cost, Data Safety

Game: Latency, Cost, Data Safety

Page 40: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Deployment

1, When master data center is out of service, the cluster will be automatically switched to the standby data center.Recovery Point Objective = 0Recovery Time Objective less than 40s.

2, When only one server is out of service

a) masterb) slavec) watcher

3, When slave data center is out of service

Region #1 Region #2

Page 41: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Deployment1, Triple DZ in the same city simplify

the synchronization policy and

provide high data availability and

consistency.

2, The failure of one data center

does not affect data services.

3, The production cluster with triple-

DZ provides multi-active services.

4, The entire city can be manually

switched.

Region #1

Region #2

Page 42: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Resource scheduling

Order module: trading orders, data volume can be controlled, need high performance

Settlement Data: all historical data, very large volume, low cost, rarely accessed

Account Data: small volume, very high performance, randomly accessed

Page 43: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Hybrid data analysis

Account Module

Risk Control Module

SQL Key-Value

Hybrid Data Access Protocol (SQL)

Personal Account Data: Relational model

Corporate Account Data: Key-value model

Risk Control Module: analyze both PAD and CAD

Page 44: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Outline

• Introduction• System overview of TDSQL• Conceptual Modeling on TDSQL• Applications• Conclusion

Page 45: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Conclusion

• One Size fits None• HTAP, OLTP, OLAP• Multi-model

• AI + Database

Page 46: Conceptual Modeling on Tencent's Distributed Database ... · Very large applications High throughput writes Need right partitioning strategy Write-limited(2-phase commit) Shared-Disk

Thanks