1. 程式人生 > 實用技巧 >【ClickHouse資料庫】基於Docker構建叢集模式(多分片單備份)

【ClickHouse資料庫】基於Docker構建叢集模式(多分片單備份)

建立容器

  1. 建立網路

    docker network create -d bridge iot-net
    
  2. 啟動3個數據庫例項

    docker run -d --name chdb1 --ulimit nofile=262144:262144 --volume=/root/iot/chdb1:/var/lib/clickhouse --publish 9001:9000 --network iot-net yandex/clickhouse-server
    
    docker run -d --name chdb2 --ulimit nofile=262144:262144 --volume=/root/iot/chdb2:/var/lib/clickhouse --publish 9002:9000 --network iot-net yandex/clickhouse-server
    
    docker run -d --name chdb3 --ulimit nofile=262144:262144 --volume=/root/iot/chdb3:/var/lib/clickhouse --publish 9003:9000 --network iot-net yandex/clickhouse-server
    

配置叢集模式

  1. 載入叢集配置檔案:先從容器中獲得配置檔案

    docker cp chdb1:/etc/clickhouse-server/config.xml ./
    
  2. config.xml自定義資料分片配置中定義3分片1備份:

    <remote_servers>
        <perftest_3shards_1replicas>
            <shard>
                <replica>
                    <host>chdb1</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>chdb2</host>
                    <port>9000</port>
                </replica>
            </shard>
            <shard>
                <replica>
                    <host>chdb3</host>
                    <port>9000</port>
                </replica>
            </shard>
        </perftest_3shards_1replicas>
    </remote_servers>
    

    隨後將config.xml配置檔案導回至3個例項並重啟之:

    docker cp ./config.xml chdb1:/etc/clickhouse-server && docker cp ./config.xml chdb2:/etc/clickhouse-server && docker cp ./config.xml chdb3:/etc/clickhouse-server
    docker restart chdb1 chdb2 chdb3
    

驗證叢集

  1. 連線至任意例項:

    clickhouse-client --port 9001
    

    執行以下命令可看到當前叢集資訊:

    e16ff05d1ca6 :) select * from system.clusters;
    
    SELECT *
    FROM system.clusters
    
    ┌─cluster───────────────────┬─shard_num─┬─shard_weight─┬─replica_num─┬─host_name─┬─host_address─┬─port─┬─is_local─┬─user────┬─default_database─┬─errors_count─┬─estimated_recovery_time─┐
    │ cluster_3shards_1replicas │         1 │            1 │           1 │ chdb1     │ 172.25.0.2   │ 9000 │        1 │ default │                  │            0 │                       0 │
    │ cluster_3shards_1replicas │         2 │            1 │           1 │ chdb2     │ 172.25.0.3   │ 9000 │        0 │ default │                  │            0 │                       0 │
    │ cluster_3shards_1replicas │         3 │            1 │           1 │ chdb3     │ 172.25.0.4   │ 9000 │        0 │ default │                  │            0 │                       0 │
    └───────────────────────────┴───────────┴──────────────┴─────────────┴───────────┴──────────────┴──────┴──────────┴─────────┴──────────────────┴──────────────┴─────────────────────────┘
    
    3 rows in set. Elapsed: 0.006 sec. 
    
  2. 在3個例項中構建測試資料表:

    create table population (
        `ozone` Int8,
        `particullate_matter` Int8,
        `carbon_monoxide` Int8,
        `sulfure_dioxide` Int8,
        `nitrogen_dioxide` Int8,
        `longitude` Float64,
        `latitude` Float64,
        `timestamp` DateTime
     ) ENGINE = MergeTree()
     ORDER BY `timestamp`
     PRIMARY KEY `timestamp`
    

    可以直接使用clinet建立表而不一一進入資料庫執行SQL:

    clickhouse-client --port 9001 \
    --query 'CREATE TABLE population (
        `ozone` Int8,
        `particullate_matter` Int8,
        `carbon_monoxide` Int8,
        `sulfure_dioxide` Int8,
        `nitrogen_dioxide` Int8,
        `longitude` Float64,
        `latitude` Float64,
        `timestamp` DateTime
     ) ENGINE = MergeTree()
     ORDER BY `timestamp`
     PRIMARY KEY `timestamp`'
    
  3. 建立分佈表,分佈表可以認為是一個路由,表明了資料如何流轉至叢集中具體的某一例項:

    CREATE TABLE population_all AS population
    ENGINE = Distributed(cluster_3shards_1replicas, default, population, rand())
    
  4. 將資料匯入到此資料庫例項的表中:

    root@mq-227 ~/i/db_file# cat pollutionData204273.csv | wc -l
    17568
    
    clickhouse-client --port 9001 --query "INSERT INTO population_all FORMAT CSV" < ./pollutionData204273.csv  
    

    查詢資料表可得當前資料量:

    root@mq-227 ~/i/db_file# clickhouse-client --port 9001 --query "select count(*) from population_all"                                      1
    17568
    root@mq-227 ~/i/db_file# clickhouse-client --port 9001 --query "select count(*) from population"    
    5955
    root@mq-227 ~/i/db_file# clickhouse-client --port 9002 --query "select count(*) from population"
    5690
    root@mq-227 ~/i/db_file# clickhouse-client --port 9003 --query "select count(*) from population"
    5923
    

    可以看到資料已經被分配至3個分片中。