PhxSQL is a high-availability and strong-consistency MySQL cluster built on Paxos and Percona.
Authors: Junchao Chen, Haochuan Cui, Duokai Huang, Ming Chen and Sifan Liu
Contact us: phxteam@tencent.com
#PhxSQL features:
This project includes
Projects on which this project depends are also published by Tencent( phxpaxos, phxrpc, libco ). You can download or clone them with --recurse-submodule.
phxpaxos: http://github.com/Tencent/phxpaxos
phxrpc: http://github.com/Tencent/phxrpc
libco: http://github.com/Tencent/libco
If you prefer pre-compiled binaries, just skip this part.
Name | Introduction |
---|---|
phxsqlproxy | surrogate between MySQL client and PhxSql |
phxbinlogsvr | server for global binlog synchronization and storage, as well as management of mastership and membership |
percona | Source code of percona5.6.31-77.0 |
phx_percona/plugin/phxsync_phxrpc | A plugin running in MySql that intercepts MySQL binlogs and forwards them to phxbinlogsvr |
phx_percona/plugin/semisync | A semisync compatible with our modified plugin APIs of MySQL |
third_party/glog | GLOG library |
third_party/leveldb | LevelDB library |
third_party/protobuf | Google Protobuf 3.0+ library |
third_party/phxpaxos | PhxPaxos library |
third_party/colib | Libco library |
third_party/phxrpc | Phxrpc library |
PhxSQL needs 6 third party libs(glog, leveldb, protobuf, phxpaxos, colib, phxrpc). Please install them in phxsql/third_party directory or just link to third_party.
NOTE: Please make sure -fPIC is added while executing configure in GLOG and Protobuf as well as specifying --prefix=/the/current/absolute/path.
For example: ./configure CXXFLAGS=-fPIC --prefix=/home/root/phxsql/third_party/glog
.
Then download percona-server-5.6.31-77.0.tar.gz
Move percona-server-5.6\_5.6.31-77.0
to PhxSQL directory, rename or link as 'percona' (NOTE: Only percona-server-5.6_5.6.31-77.0 is available)
./autoinstall.sh && make && make install
(NOTE: We put the binaries in install_package/sbin, configuration files in install_package/tools/etc_template, install scripts in install_package/tools. The 'make package' command will pack 'install_package' into 'phxsql-$version.tar.gz'. Please specify -prefix=/the/path/you/want/to/install while executing ./autoinstall.sh)
PhxSQL needs to run on more than 2 hosts. We suggest N >= 3 and N is an odd number, where N means the number of hosts.
Transfer phxsql.tar.gz to all of the hosts you want to install. Then do as the following steps:
Execute tar -xvf phxsq.tar.gz .
Enter phxsql/tools, Execute python install.py --help
to get the help of installation.
(For example:python2.7 install.py -i"your_inner_ip" -p 54321 -g 6000 -y 11111 -P 17000 -a 8001 -f/tmp/data/
)
After executing 'install.py' on all the hosts, Execute './phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"ip1,ip2,ip3" -p 17000' in any one hosts. 17000 should be replaced with the port on which phxbinlogsvr is listening.
The cluster is active while a message shows master initialization is finished.
You can execute some SQLs to check the status of cluster through mysql -uroot -h"your_inner_ip" -P$phxsqlproxy_port
test_phxsql.sh phxsqlproxy_port ip1 ip2 ip3
PhxSQL have 3 configuration files in total.
NOTE:Modify tools/etc_temlate/my.cnf
before installation, Modify etc/my.cnf
after installation
Section name | Key name | comment |
---|---|---|
AgentOption | AgentPort | Port for the connection of binlogsvr and MySQL |
EventDataDir | Directory where to store the binlogsvr data | |
MaxFileSize | File size per data of phxbinlogsvr, the unit is B | |
MasterLease | Lease length of master, the unit is second | |
CheckPointTime | The data before CheckPointTime will be deleted by phxbinlogsvr, but it will not be deleted if some other PhxSQL nodes have not learned yet, the unit is minute | |
MaxDeleteCheckPointFileNum | The maximum number of files deleted each time by phxbinlogsvr | |
FollowIP | Enabled if it is a follower node and will learn binlog from this FollowIP , this node will not vote |
|
PaxosOption | PaxosLogPath | Directory where to store paxos data |
PaxosPort | Port for paxos to connect each other | |
PacketMode | The maximum size of paxos log for PhxPaxos,1 means 100M, but the network timeout will be 1 minute, 0 means 50M and network timeout is 2s(changed in dynamic). | |
UDPMaxSize | Our default network use udp and tcp combination, a message we use udp or tcp to send decide by a threshold. Message size under UDPMaxSize we use udp to send. | |
Server | IP | IP for phxbinlogsvr to listen |
Port | Port for phxbinlogsvr to listen | |
LogFilePath | Directory to store log | |
LogLevel | Log level of phxbinlogsvr |
Section name | Key name | comment |
---|---|---|
Server | IP | IP for phxsqlproxy to listen |
Port | Port for phxsqlproxy to listen | |
LogFilePath | Directory to store log | |
LogLevel | Log level of phxbinlogsvr | |
MasterEnableReadPort | Enable readonly-port in master node. If set to 0, master will forwarding readonly-port requests to one of slaves. | |
TryBestIfBinlogsvrDead | After the local phxbinlogsvr is dead, phxsqlproxy will try to get master information from phxbinlogsvr on other machine, if this option set to 1. |
phxsqlproxy is the surrogate of PhxSQL, all requests will be sent to phxsqlproxy and then be forwarded to MySQL.
It is the port configured in phxsqlproxy.conf
. Every requests sent to this port will be forwarded to the master node to excute.
It is (MasterPort + 1). You can also specify it by setting SlavePort = xxxxx
in phxsqlproxy.conf
.
Every requests will be executed on the local MySQL. A master node will make a redirection to another slave nodes if MasterEnableReadPort = 0
(this will save the CPU/IO resource for write requests)
mysql -u$user -h$phxsqlproxyip -P$phxsqlproxyport -p$pwd
to connect to phxsqlproxy
$phxsqlproxyip
can be any one IP of hosts in a clusters and$phxsqlproxyport
can beMasterPort
orSlavePort
.
PhxSQL provides a tool phxbinlogsvr_tools_phxrpc
to help the mangerment of PhxSQL.
PhxSQL cluster needs 1 MySQL admin accounts and 1 synchronization account. The default admin account is (root
, ""
), the default synchronization account is ( replica
, replica123
), They can be modified( and only be modifyed ) via phxbinlogsvr_tools_phxrpc
. DON'T DO THIS MANUALLY.
Following is some commands you may used frequently.
phxbinlogsvr_tools -f GetMasterInfoFromGlobal -h <host> -p <port>
**Function:**Get the current master info from quorum nodes( IP and timeout ).
Arguments:
17000
phxbinlogsvr_tools -f SetMySqlAdminInfo -h <host> -p <port> -u <admin username> -d <admin pwd> -U <new admin username> -D <new admin pwd>
Function: Set the user and password of admin account.
Arguments:
17000
root
)""
)phxbinlogsvr_tools -f SetMySqlReplicaInfo -h <host> -p <port> -u <admin username> -d <admin pwd> -U <new replica username> -D <new replica pwd>
Function: Set the user and password of synchronization account.
Arguments:
17000
root
)""
)phxbinlogsvr_tools_phxrpc -f GetMemberList -h <host> -p <port>
Function: Membership of this cluster, all IPs and Ports included.
Arguments:
17000
Execute phxbinlogsvr_tools_phxrpc -f RemoveMember -h<host> -p<port> -m <ip_of_nodeA>
to delete node A. Once it is succesfully executed, A will not learn binlog after a small period.
phxbinlogsvr_tools -f AddMember -h<host> -p<port> -m <ip_of_nodeA>
to add node A into the membership.set global super_read_only = 0; set global read_only = 0
;Phxbinlogsvr
will pull the checkpoint in another node to reboot during reinstallation. It will self-kill after pulling is over(to make sure the consistency). You can reboot phxbinlogsvr
after a message like "All sm load state ok, start to exit"
appears.
phxbinlogsvr
will stop working if a data problem arise in MySQL. We suggest you to check the status of MySQL."err"
to check the abnormaly.CPU : Intel(R) Xeon(R) CPU E5-2420 0 @ 1.90GHz * 24
Memory : 32G
Disk : SSD Raid10
Master -> Slave : 3 ~ 4ms
Client -> Master : 4ms
sysbench --oltp-tables-count=10 --oltp-table-size=1000000 --num-threads=500 --max-requests=100000 --report-interval=1 --max-time=200
Client Threads | Clusters | Test sets | |||||
---|---|---|---|---|---|---|---|
insert.lua (100% write) | select.lua (0% write) | OLTP.lua (20% write) | |||||
QPS | Response time(MS) | QPS | Response time(MS) | QPS | Response time(MS) | ||
200 | PhxSQL | 5076 | 39.34/56.93 | 46334 | 4.21/5.12 | 25657 | 140.16/186.39 |
200 | MySQL semi-sync | 4055 | 49.27/66.64 | 47528 | 4.10/5.00 | 20391 | 176.39/226.76 |
500 | PhxSQL | 8260 | 60.41/83.14 | 105928 | 4.58/5.81 | 46543 | 192.93/242.85 |
500 | MySQL semi-sync | 7072 | 70.60/91.72 | 121535 | 4.17/5.08 | 33229 | 270.38/345.84 |
NOTE:The 2 Response times means average and 95% percentile