在CentOS7.2上部署Postgres-XL分散式資料庫
1. 下載安裝包
2. 節點分類及說明
【GTM】
全域性事務控制節點,保證叢集資料的一致性,與Coordinator節點和Datanode節點不斷通訊,是整個叢集的核心節點,只存在一個,可以存在一個GTM Standby節點,對GTM實時備份。GTM一旦故障,整個叢集立刻無法訪問,此時可以切換到GTM Standby節點上。如果部署了GTM Standby節點,就應該同時部署GTM Proxy,一般和Coordinator、Datanode部署在同一臺伺服器上。GTM Proxy的作用代理Coordinator和Datanode對GTM的訪問,起到減輕GTM負載的作用,另外一個重要的作用是幫助完成GTM的故障切換,當GTM節點發生故障後,GTM Standby成為新的GTM,此時Coordinator和Datanode節點並不需要重新指定GTM地址,只需要GTM Proxy重新連線到新的GTM地址即可。
【Coordinator】
接收資料訪問請求的節點,本質上是由PG後臺程序組成。接收的一條查詢後,Coordinator節點執行查詢計劃,然後會根據查詢資料涉及的資料節點將查詢分發給相關的資料節點。寫入資料時,也會根據不同的資料分佈策略將資料寫入相關的節點。可以說Coordinator節點上儲存著叢集的全域性資料位置。Coordinator節點可以任意擴充套件,各個節點之間除了訪問地址不同以外是完全對等的,通過一個節點更新的資料可以在另一個節點上立刻看到。每個Coordinator節點可以配置一個對應的standby節點,避免單點故障。
【Datanode】
實際存取資料的節點,接收Coordinator的請求並執行SQL語句存取資料,節點之間也會互相通訊。一般的,一個節點上的資料並不是全域性的,資料節點不直接對外提供資料訪問。一個表的資料在資料節點上的分佈存在兩種模式:複製模式和分片模式,複製模式下,一個表的資料在指定的節點上存在多個副本;分片模式下,一個表的資料按照一定的規則分佈在多個數據節點上,這些節點共同儲存一份完整的資料。這兩種模式的選擇是在建立表的時候執行CREATE TABLE語句指定的,具體語法如下:
CREATE TABLE table_name(...)
DISTRIBUTE BY
HASH(col)|MODULO(col)|ROUNDROBIN|REPLICATION
TO NODE(nodename1,nodename2...)
可以看到,如果DISTRIBUTE BY 後面是REPLICATION,則是複製模式,其餘則是分片模式,HASH指的是按照指定列的雜湊值分佈資料,MODULO指的是按照指定列的取摩運算分佈資料,ROUNDROBIN指的是按照輪詢的方式分佈資料。TO NODE指定了資料分佈的節點範圍,如果沒有指定則預設所有資料節點參與資料分佈。如果沒有指定分佈模式,即使用普通的CREATE TABLE語句,PGXL會預設採用分片模式將資料分佈到所有資料節點。
3. 主機規劃
GTM: server0: 192.168.51.140
Coordinator: server1: 192.168.51.141
Datanode1 Master: server2: 192.168.51.142
Datanode2 Master: server3: 192.168.51.143
Datanode1 Slave: server4: 192.168.51.144
Datanode2 Slave: server5: 192.168.51.145
—–Datanode3用於後面演示動態增刪節點—–
Datanode3 Master: server6: 192.168.51.146
【下面的步驟4-13,在所有節點上都要執行】
4. 修改主機hosts檔案
# vi /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.51.140 server0
192.168.51.141 server1
192.168.51.142 server2
192.168.51.143 server3
192.168.51.144 server4
192.168.51.145 server5
192.168.51.146 server6
5. 安裝依賴的軟體包
openssh-clients, flex, bison, readline-devel, zlib-devel, openjade, docbook-style-dsssl, gcc
用rpm命令檢視是否已安裝
# rpm -qa |grep xxx
如果沒裝可以用yum安裝
6. 解除安裝作業系統自帶的PostgreSQL
# rpm -qa |grep postgresql
# rpm -qa |grep postgresql | xargs rpm -e --nodeps
# rpm -qa |grep postgresql
7. 新增postgres使用者
建立組:
# groupadd postgres
建立使用者:
# useradd -m -d /home/postgres postgres -g postgres
初始化密碼:
# passwd postgres
輸入密碼:12345678(舉例)
注:如果需要刪除postgres使用者,可以以root使用者執行命令:
# userdel -r postgres
8. 配置免密登陸
root使用者
# ssh-keygen
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
postgres使用者
$ ssh-keygen
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server0
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server2
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server3
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server4
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server5
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server6
如果配置完成後,發現ssh並不免密,可按下面操作進行完善:
chmod 755 /home/postgres
chmod 700 /home/postgres/.ssh
chmod 644 /home/postgres/.ssh/authorized_keys
9. 修改核心引數
在/etc/sysctl.conf中新增引數kernel.sem
# vi /etc/sysctl.conf
kernel.sem = 50100 128256000 50100 2560
使引數生效
# sysctl -p
驗證引數是否生效。
# ipcs -ls
—— Semaphore Limits ——–
max number of arrays = 2560
max semaphores per array = 50100
max semaphores system wide = 128256000
max ops per semop call = 50100
semaphore max value = 32767
10. 配置防火牆
關閉防火牆
# systemctl stop firewalld
# systemctl disable firewalld
或在防火牆中開放埠
遇到防護牆不能關閉的情況,可使用該步驟!
具體需要開放的埠參加(主機規劃部分),以5432為例:
使用這些命令來永久開啟一個新埠(如TCP/5432)。
# sudo firewall-cmd --zone=public --add-port=5432/tcp --permanent
# sudo firewall-cmd --reload
注:檢視防火牆狀態命令:systemctl status firewalld
注:開啟防火牆:systemctl start firewalld
注:關閉防火牆:systemctl stop firewalld
11. 關閉SELinux
檢視SELinux狀態,執行:
# /usr/sbin/sestatus -v
如果SELinux status引數為enabled即為開啟狀態
永久關閉SELinux,執行:
# vi /etc/selinux/config
將 SELINUX=enforcing 改為 SELINUX=disabled
重啟後生效
12. 安裝Postgres-XL軟體
解壓
# tar -jxvf postgres-xl-9.5r1.6.tar.bz2
# chown -R postgres:postgres postgres-xl-9.5r1.6
切換使用者
# su - postgres
安裝
$ cd postgres-xl-9.5r1.6
$ ./configure --prefix=/home/postgres/pgxl9.5
$ make
$ make install
安裝擴充套件
$ cd contrib
$ make
$ make install
13. 配置環境變數
在檔案.bash_profile中新增如下內容:
# su - postgres
$ vi .bash_profile
export PGHOME=/home/postgres/pgxl9.5
export PGUSER=postgres
export LD_LIBRARY_PATH=$PGHOME/lib:$LD_LIBRARY_PATH
export PATH=$PGHOME/bin:$PATH
使環境變數生效:
$ source ~/.bashrc
驗證環境變數是否生效:
pg_ctl --version
顯示:pg_ctl (PostgreSQL) 9.5.8 (Postgres-XL 9.5r1.6)
14. 配置Postgres-XL叢集
在GTM節點上以postgres使用者執行:
$ pgxc_ctl ---初次執行,會提示Error說沒有配置檔案,忽略即可
PGXC prepare ---執行該命令將會生成一份配置檔案模板
PGXC exit --退出 pgxc_ctl互動窗
執行完成後,在postgres使用者根目錄下,會生成一個pgxc_ctl目錄,編輯其中的pgxc_ctl.conf檔案
$ vi pgxc_ctl.conf
修改為:
#!/usr/bin/env bash
# pgxcInstallDir variable is needed if you invoke "deploy" command from pgxc_ctl utility.
# If don't you don't need this variable.
pgxcInstallDir=$PGHOME
pgxlDATA=$PGHOME/data
#---- OVERALL -----------------------------------------------------------------------------
#
pgxcOwner=postgres # owner of the Postgres-XC databaseo cluster. Here, we use this
# both as linus user and database user. This must be
# the super user of each coordinator and datanode.
pgxcUser=$pgxcOwner # OS user of Postgres-XC owner
tmpDir=/tmp # temporary dir used in XC servers
localTmpDir=$tmpDir # temporary dir used here locally
configBackup=n # If you want config file backup, specify y to this value.
configBackupHost=pgxc-linker # host to backup config file
configBackupDir=$HOME/pgxc # Backup directory
configBackupFile=pgxc_ctl.bak # Backup file name --> Need to synchronize when original changed.
#---- GTM ------------------------------------------------------------------------------------
#---- GTM Master -----------------------------------------------
#---- Overall ----
gtmName=gtm
gtmMasterServer=server0
gtmMasterPort=6666
gtmMasterDir=$pgxlDATA/nodes/gtm
#---- Configuration ---
gtmExtraConfig=none # Will be added gtm.conf for both Master and Slave (done at initilization only)
gtmMasterSpecificExtraConfig=none # Will be added to Master's gtm.conf (done at initialization only)
#---- Coordinators ----------------------------------------------------------------------------------------------------
#---- shortcuts ----------
coordMasterDir=$pgxlDATA/nodes/coord
coordSlaveDir=$pgxlDATA/nodes/coord_slave
coordArchLogDir=$pgxlDATA/nodes/coord_archlog
#---- Overall ------------
coordNames=(coord1) # Master and slave use the same name
coordPorts=(5432) # Master ports
poolerPorts=(20004) # Master pooler ports
coordPgHbaEntries=(0.0.0.0/0) # Assumes that all the coordinator (master/slave) accepts
# the same connection
# This entry allows only $pgxcOwner to connect.
# If you'd like to setup another connection, you should
# supply these entries through files specified below.
#---- Master -------------
coordMasterServers=server1 # none means this master is not available
coordMasterDirs=$coordMasterDir
coordMaxWALsernder=10 # max_wal_senders: needed to configure slave. If zero value is specified,
# it is expected to supply this parameter explicitly by external files
# specified in the following. If you don't configure slaves, leave this value to zero.
coordMaxWALSenders=$coordMaxWALsernder
# max_wal_senders configuration for each coordinator.
#---- Configuration files---
coordExtraConfig=coordExtraConfig # Extra configuration file for coordinators.
# This file will be added to all the coordinators'
# postgresql.conf
# Pleae note that the following sets up minimum parameters which you may want to change.
# You can put your postgresql.conf lines here.
cat > $coordExtraConfig <<EOF
#================================================
# Added to all the coordinator postgresql.conf
# Original: $coordExtraConfig
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
max_connections = 512
EOF
# Additional Configuration file for specific coordinator master.
# You can define each setting by similar means as above.
coordSpecificExtraConfig=(none none)
coordExtraPgHba=none # Extra entry for pg_hba.conf. This file will be added to all the coordinators' pg_hba.conf
coordSpecificExtraPgHba=(none none)
#---- Datanodes -------------------------------------------------------------------------------------------------------
#---- Shortcuts --------------
datanodeMasterDir=$pgxlDATA/nodes/dn_master
datanodeSlaveDir=$pgxlDATA/nodes/dn_slave
datanodeArchLogDir=$pgxlDATA/nodes/datanode_archlog
#---- Overall ---------------
#primaryDatanode=datanode1 # Primary Node.
# At present, xc has a priblem to issue ALTER NODE against the primay node. Until it is fixed, the test will be done
# without this feature.
primaryDatanode=datanode1 # Primary Node.
datanodeNames=(datanode1 datanode2)
datanodePorts=(5433 5433) # Master ports
datanodePoolerPorts=(20005 20005) # Master pooler ports
datanodePgHbaEntries=(0.0.0.0/0) # Assumes that all the coordinator (master/slave) accepts
# the same connection
# This list sets up pg_hba.conf for $pgxcOwner user.
# If you'd like to setup other entries, supply them
# through extra configuration files specified below.
# Note: The above parameter is extracted as "host all all 0.0.0.0/0 trust". If you don't want
# such setups, specify the value () to this variable and suplly what you want using datanodeExtraPgHba
# and/or datanodeSpecificExtraPgHba variables.
#datanodePgHbaEntries=(::1/128) # Same as above but for IPv6 addresses
#---- Master ----------------
datanodeMasterServers=(server2 server3) # none means this master is not available.
# This means that there should be the master but is down.
# The cluster is not operational until the master is
# recovered and ready to run.
datanodeMasterDirs=($datanodeMasterDir/dn1 $datanodeMasterDir/dn2)
datanodeMaxWalSender=10 # max_wal_senders: needed to configure slave. If zero value is
# specified, it is expected this parameter is explicitly supplied
# by external configuration files.
# If you don't configure slaves, leave this value zero.
datanodeMaxWALSenders=($datanodeMaxWalSender $datanodeMaxWalSender)
# max_wal_senders configuration for each datanode
#---- Slave -----------------
datanodeSlave=y # Specify y if you configure at least one coordiantor slave. Otherwise, the following
# configuration parameters will be set to empty values.
# If no effective server names are found (that is, every servers are specified as none),
# then datanodeSlave value will be set to n and all the following values will be set to
# empty values.
datanodeSlaveServers=(server4 server5) # value none means this slave is not available
datanodeSlavePorts=(15433 15433) # value none means this slave is not available
datanodeSlavePoolerPorts=(20015 20015) # value none means this slave is not available
datanodeSlaveSync=y # If datanode slave is connected in synchronized mode
datanodeSlaveDirs=($datanodeSlaveDir $datanodeSlaveDir)
datanodeArchLogDirs=( $datanodeArchLogDir $datanodeArchLogDir)
# ---- Configuration files ---
# You may supply your bash script to setup extra config lines and extra pg_hba.conf entries here.
# These files will go to corresponding files for the master.
# Or you may supply these files manually.
datanodeExtraConfig=none # Extra configuration file for datanodes. This file will be added to all the
# datanodes' postgresql.conf
datanodeSpecificExtraConfig=(none none)
datanodeExtraPgHba=none # Extra entry for pg_hba.conf. This file will be added to all the datanodes' postgresql.conf
datanodeSpecificExtraPgHba=(none none)
15.初始化叢集
在GTM節點上以postgres使用者執行:
初始化叢集:
$ pgxc_ctl -c ~/pgxc_ctl/pgxc_ctl.conf init all
啟動叢集:
$ pgxc_ctl -c /home/postgres/pgxc_ctl/pgxc_ctl.conf start all
停止叢集
$ pgxc_ctl -c /home/postgres/pgxc_ctl/pgxc_ctl.conf stop all
16.刪除叢集
在所有節點上以postgres使用者執行:
$ rm /home/postgres/pgxl9.5/data/nodes