1. 程式人生 > >在CentOS7.2上部署Postgres-XL分散式資料庫

在CentOS7.2上部署Postgres-XL分散式資料庫

1. 下載安裝包

2. 節點分類及說明

【GTM】
全域性事務控制節點,保證叢集資料的一致性,與Coordinator節點和Datanode節點不斷通訊,是整個叢集的核心節點,只存在一個,可以存在一個GTM Standby節點,對GTM實時備份。GTM一旦故障,整個叢集立刻無法訪問,此時可以切換到GTM Standby節點上。如果部署了GTM Standby節點,就應該同時部署GTM Proxy,一般和Coordinator、Datanode部署在同一臺伺服器上。GTM Proxy的作用代理Coordinator和Datanode對GTM的訪問,起到減輕GTM負載的作用,另外一個重要的作用是幫助完成GTM的故障切換,當GTM節點發生故障後,GTM Standby成為新的GTM,此時Coordinator和Datanode節點並不需要重新指定GTM地址,只需要GTM Proxy重新連線到新的GTM地址即可。

【Coordinator】
接收資料訪問請求的節點,本質上是由PG後臺程序組成。接收的一條查詢後,Coordinator節點執行查詢計劃,然後會根據查詢資料涉及的資料節點將查詢分發給相關的資料節點。寫入資料時,也會根據不同的資料分佈策略將資料寫入相關的節點。可以說Coordinator節點上儲存著叢集的全域性資料位置。Coordinator節點可以任意擴充套件,各個節點之間除了訪問地址不同以外是完全對等的,通過一個節點更新的資料可以在另一個節點上立刻看到。每個Coordinator節點可以配置一個對應的standby節點,避免單點故障。

【Datanode】
實際存取資料的節點,接收Coordinator的請求並執行SQL語句存取資料,節點之間也會互相通訊。一般的,一個節點上的資料並不是全域性的,資料節點不直接對外提供資料訪問。一個表的資料在資料節點上的分佈存在兩種模式:複製模式和分片模式,複製模式下,一個表的資料在指定的節點上存在多個副本;分片模式下,一個表的資料按照一定的規則分佈在多個數據節點上,這些節點共同儲存一份完整的資料。這兩種模式的選擇是在建立表的時候執行CREATE TABLE語句指定的,具體語法如下:

CREATE TABLE table_name(...)
DISTRIBUTE BY 
HASH(col)|MODULO(col)|ROUNDROBIN|REPLICATION
TO NODE(nodename1,nodename2...)

可以看到,如果DISTRIBUTE BY 後面是REPLICATION,則是複製模式,其餘則是分片模式,HASH指的是按照指定列的雜湊值分佈資料,MODULO指的是按照指定列的取摩運算分佈資料,ROUNDROBIN指的是按照輪詢的方式分佈資料。TO NODE指定了資料分佈的節點範圍,如果沒有指定則預設所有資料節點參與資料分佈。如果沒有指定分佈模式,即使用普通的CREATE TABLE語句,PGXL會預設採用分片模式將資料分佈到所有資料節點。

3. 主機規劃

GTM: server0: 192.168.51.140
Coordinator: server1: 192.168.51.141
Datanode1 Master: server2: 192.168.51.142
Datanode2 Master: server3: 192.168.51.143
Datanode1 Slave: server4: 192.168.51.144
Datanode2 Slave: server5: 192.168.51.145

—–Datanode3用於後面演示動態增刪節點—–
Datanode3 Master: server6: 192.168.51.146

【下面的步驟4-13,在所有節點上都要執行】

4. 修改主機hosts檔案

# vi /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.51.140    server0
192.168.51.141    server1
192.168.51.142    server2
192.168.51.143    server3
192.168.51.144    server4
192.168.51.145    server5
192.168.51.146    server6

5. 安裝依賴的軟體包

openssh-clients, flex, bison, readline-devel, zlib-devel, openjade, docbook-style-dsssl, gcc
用rpm命令檢視是否已安裝

# rpm -qa |grep xxx

如果沒裝可以用yum安裝

6. 解除安裝作業系統自帶的PostgreSQL

# rpm -qa |grep postgresql
# rpm -qa |grep postgresql | xargs rpm -e --nodeps
# rpm -qa |grep postgresql

7. 新增postgres使用者

建立組:
# groupadd  postgres
建立使用者:
# useradd  -m  -d   /home/postgres   postgres  -g  postgres
初始化密碼:
# passwd postgres
輸入密碼:12345678(舉例)
注:如果需要刪除postgres使用者,可以以root使用者執行命令:
# userdel -r postgres

8. 配置免密登陸

root使用者

# ssh-keygen
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]
# ssh-copy-id -i ~/.ssh/id_rsa.pub [email protected]

postgres使用者

$ ssh-keygen
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server0
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server1
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server2
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server3
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server4
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server5
$ ssh-copy-id -i ~/.ssh/id_rsa.pub postgres@server6

如果配置完成後,發現ssh並不免密,可按下面操作進行完善:

chmod 755 /home/postgres
chmod 700 /home/postgres/.ssh
chmod 644 /home/postgres/.ssh/authorized_keys

9. 修改核心引數

在/etc/sysctl.conf中新增引數kernel.sem

# vi /etc/sysctl.conf
kernel.sem = 50100 128256000 50100 2560

使引數生效

# sysctl -p

驗證引數是否生效。

# ipcs -ls

—— Semaphore Limits ——–
max number of arrays = 2560
max semaphores per array = 50100
max semaphores system wide = 128256000
max ops per semop call = 50100
semaphore max value = 32767

10. 配置防火牆

關閉防火牆

# systemctl stop firewalld
# systemctl disable firewalld

或在防火牆中開放埠
遇到防護牆不能關閉的情況,可使用該步驟!
具體需要開放的埠參加(主機規劃部分),以5432為例:
使用這些命令來永久開啟一個新埠(如TCP/5432)。

# sudo firewall-cmd --zone=public --add-port=5432/tcp --permanent
# sudo firewall-cmd --reload

注:檢視防火牆狀態命令:systemctl status firewalld
注:開啟防火牆:systemctl start firewalld
注:關閉防火牆:systemctl stop firewalld

11. 關閉SELinux

檢視SELinux狀態,執行:

# /usr/sbin/sestatus -v  

如果SELinux status引數為enabled即為開啟狀態
永久關閉SELinux,執行:

# vi /etc/selinux/config

將 SELINUX=enforcing 改為 SELINUX=disabled
重啟後生效

12. 安裝Postgres-XL軟體

解壓

# tar -jxvf postgres-xl-9.5r1.6.tar.bz2
# chown -R postgres:postgres postgres-xl-9.5r1.6

切換使用者

# su - postgres

安裝

$ cd postgres-xl-9.5r1.6
$ ./configure --prefix=/home/postgres/pgxl9.5
$ make
$ make install

安裝擴充套件

$ cd contrib
$ make
$ make install

13. 配置環境變數

在檔案.bash_profile中新增如下內容:

# su - postgres
$ vi .bash_profile

export PGHOME=/home/postgres/pgxl9.5
export PGUSER=postgres
export LD_LIBRARY_PATH=$PGHOME/lib:$LD_LIBRARY_PATH
export PATH=$PGHOME/bin:$PATH

使環境變數生效:

$ source ~/.bashrc

驗證環境變數是否生效:

pg_ctl --version 

顯示:pg_ctl (PostgreSQL) 9.5.8 (Postgres-XL 9.5r1.6)

14. 配置Postgres-XL叢集

在GTM節點上以postgres使用者執行:

$ pgxc_ctl              ---初次執行,會提示Error說沒有配置檔案,忽略即可
PGXC prepare            ---執行該命令將會生成一份配置檔案模板
PGXC exit               --退出 pgxc_ctl互動窗

執行完成後,在postgres使用者根目錄下,會生成一個pgxc_ctl目錄,編輯其中的pgxc_ctl.conf檔案

$ vi pgxc_ctl.conf

修改為:

#!/usr/bin/env bash

# pgxcInstallDir variable is needed if you invoke "deploy" command from pgxc_ctl utility.
# If don't you don't need this variable.
pgxcInstallDir=$PGHOME
pgxlDATA=$PGHOME/data
#---- OVERALL -----------------------------------------------------------------------------
#
pgxcOwner=postgres      # owner of the Postgres-XC databaseo cluster.  Here, we use this
                        # both as linus user and database user.  This must be
                        # the super user of each coordinator and datanode.
pgxcUser=$pgxcOwner        # OS user of Postgres-XC owner

tmpDir=/tmp                 # temporary dir used in XC servers
localTmpDir=$tmpDir            # temporary dir used here locally

configBackup=n                  # If you want config file backup, specify y to this value.
configBackupHost=pgxc-linker    # host to backup config file
configBackupDir=$HOME/pgxc     # Backup directory
configBackupFile=pgxc_ctl.bak   # Backup file name --> Need to synchronize when original changed.

#---- GTM ------------------------------------------------------------------------------------

#---- GTM Master -----------------------------------------------

#---- Overall ----
gtmName=gtm
gtmMasterServer=server0
gtmMasterPort=6666
gtmMasterDir=$pgxlDATA/nodes/gtm

#---- Configuration ---
gtmExtraConfig=none         # Will be added gtm.conf for both Master and Slave (done at initilization only)
gtmMasterSpecificExtraConfig=none   # Will be added to Master's gtm.conf (done at initialization only)

#---- Coordinators ----------------------------------------------------------------------------------------------------

#---- shortcuts ----------
coordMasterDir=$pgxlDATA/nodes/coord
coordSlaveDir=$pgxlDATA/nodes/coord_slave
coordArchLogDir=$pgxlDATA/nodes/coord_archlog

#---- Overall ------------
coordNames=(coord1)     # Master and slave use the same name
coordPorts=(5432)           # Master ports
poolerPorts=(20004)         # Master pooler ports
coordPgHbaEntries=(0.0.0.0/0)               # Assumes that all the coordinator (master/slave) accepts
                                                # the same connection
                                                # This entry allows only $pgxcOwner to connect.
                                                # If you'd like to setup another connection, you should
                                                # supply these entries through files specified below.

#---- Master -------------
coordMasterServers=server1      # none means this master is not available
coordMasterDirs=$coordMasterDir
coordMaxWALsernder=10   # max_wal_senders: needed to configure slave. If zero value is specified,
                        # it is expected to supply this parameter explicitly by external files
                        # specified in the following.   If you don't configure slaves, leave this value to zero.
coordMaxWALSenders=$coordMaxWALsernder
                        # max_wal_senders configuration for each coordinator.

#---- Configuration files---
coordExtraConfig=coordExtraConfig   # Extra configuration file for coordinators.  
                        # This file will be added to all the coordinators'
                        # postgresql.conf
# Pleae note that the following sets up minimum parameters which you may want to change.
# You can put your postgresql.conf lines here.
cat > $coordExtraConfig <<EOF
#================================================
# Added to all the coordinator postgresql.conf
# Original: $coordExtraConfig
log_destination = 'stderr'
logging_collector = on
log_directory = 'pg_log'
listen_addresses = '*'
max_connections = 512
EOF

# Additional Configuration file for specific coordinator master.
# You can define each setting by similar means as above.
coordSpecificExtraConfig=(none none)
coordExtraPgHba=none    # Extra entry for pg_hba.conf.  This file will be added to all the coordinators' pg_hba.conf
coordSpecificExtraPgHba=(none none)

#---- Datanodes -------------------------------------------------------------------------------------------------------

#---- Shortcuts --------------
datanodeMasterDir=$pgxlDATA/nodes/dn_master
datanodeSlaveDir=$pgxlDATA/nodes/dn_slave
datanodeArchLogDir=$pgxlDATA/nodes/datanode_archlog

#---- Overall ---------------
#primaryDatanode=datanode1              # Primary Node.
# At present, xc has a priblem to issue ALTER NODE against the primay node.  Until it is fixed, the test will be done
# without this feature.
primaryDatanode=datanode1               # Primary Node.
datanodeNames=(datanode1 datanode2)
datanodePorts=(5433 5433)   # Master ports
datanodePoolerPorts=(20005 20005)   # Master pooler ports
datanodePgHbaEntries=(0.0.0.0/0)    # Assumes that all the coordinator (master/slave) accepts
                                        # the same connection
                                        # This list sets up pg_hba.conf for $pgxcOwner user.
                                        # If you'd like to setup other entries, supply them
                                        # through extra configuration files specified below.
# Note: The above parameter is extracted as "host all all 0.0.0.0/0 trust".   If you don't want
# such setups, specify the value () to this variable and suplly what you want using datanodeExtraPgHba
# and/or datanodeSpecificExtraPgHba variables.
#datanodePgHbaEntries=(::1/128) # Same as above but for IPv6 addresses

#---- Master ----------------
datanodeMasterServers=(server2 server3) # none means this master is not available.
                                                    # This means that there should be the master but is down.
                                                    # The cluster is not operational until the master is
                                                    # recovered and ready to run.   
datanodeMasterDirs=($datanodeMasterDir/dn1 $datanodeMasterDir/dn2)
datanodeMaxWalSender=10                             # max_wal_senders: needed to configure slave. If zero value is 
                                                    # specified, it is expected this parameter is explicitly supplied
                                                    # by external configuration files.
                                                    # If you don't configure slaves, leave this value zero.
datanodeMaxWALSenders=($datanodeMaxWalSender $datanodeMaxWalSender)
                        # max_wal_senders configuration for each datanode

#---- Slave -----------------
datanodeSlave=y         # Specify y if you configure at least one coordiantor slave.  Otherwise, the following
                        # configuration parameters will be set to empty values.
                        # If no effective server names are found (that is, every servers are specified as none),
                        # then datanodeSlave value will be set to n and all the following values will be set to
                        # empty values.
datanodeSlaveServers=(server4 server5)  # value none means this slave is not available
datanodeSlavePorts=(15433 15433)    # value none means this slave is not available
datanodeSlavePoolerPorts=(20015 20015)  # value none means this slave is not available
datanodeSlaveSync=y     # If datanode slave is connected in synchronized mode
datanodeSlaveDirs=($datanodeSlaveDir $datanodeSlaveDir)
datanodeArchLogDirs=( $datanodeArchLogDir $datanodeArchLogDir)

# ---- Configuration files ---
# You may supply your bash script to setup extra config lines and extra pg_hba.conf entries here.
# These files will go to corresponding files for the master.
# Or you may supply these files manually.
datanodeExtraConfig=none    # Extra configuration file for datanodes.  This file will be added to all the 
                            # datanodes' postgresql.conf
datanodeSpecificExtraConfig=(none none)
datanodeExtraPgHba=none     # Extra entry for pg_hba.conf.  This file will be added to all the datanodes' postgresql.conf
datanodeSpecificExtraPgHba=(none none)

15.初始化叢集

在GTM節點上以postgres使用者執行:
初始化叢集:

$ pgxc_ctl  -c  ~/pgxc_ctl/pgxc_ctl.conf init all

啟動叢集:

$ pgxc_ctl  -c /home/postgres/pgxc_ctl/pgxc_ctl.conf start all

停止叢集

$ pgxc_ctl -c /home/postgres/pgxc_ctl/pgxc_ctl.conf stop all

16.刪除叢集

在所有節點上以postgres使用者執行:

$ rm /home/postgres/pgxl9.5/data/nodes