Understand Redshift Cluster Storage Space

阿新 • • 發佈：2019-01-12

The amount of disk storage space allocated to two tables that are in different Amazon Redshift clusters can vary significantly, even if the tables are created using the same data definition language (DDL) statements and contain the same number of rows. In the following scenario, the difference in disk storage space consumed by each table is determined by:

The number of populated slices on each Amazon Redshift cluster
The number of table segments used by each table

The minimum disk space is the smallest data footprint that a table can have on an Amazon Redshift cluster. You can check the minimal table size when analyzing the cluster storage use or when

resizing an Amazon Redshift cluster. You can calculate the minimum disk space using the following formula:

For tables created using the KEY or EVEN distribution style:
Minimum table size = block_size (1 MB) * (number_of_user_columns + 3 system columns) * number_of_populated_slices * number_of_table_segments.

For tables created using the ALL distribution style:
Minimum table size = block_size (1 MB) * (number_of_user_columns + 3 system columns) * number_of_cluster_nodes * number_of_table_segments.

If two Amazon Redshift tables share the following attributes:

Created with identical DDL statements
Contain the same number of rows
Haven't been manually modified

Then the table disk storage space allocation can vary depending on:

The number of cluster slices populated by the Table, for the EVEN and Key Distribution style
The number of nodes in the cluster for ALL distributed slices
The number of segments in a table

If an Amazon Redshift table has a sort key, the table has two segments—one sorted segment and one unsorted segment. If an Amazon Redshift table has no sort key, all data is unsorted, and therefore the table has one unsorted segment.

When data is added to an existing table with a sort key, the new data is maintained in a separate segment that contains unsorted data—the data is not inserted into the original sorted key segment until a VACUUM operation is performed. For more information, see Managing the Volume of Merged Rows.

Note: The VACUUM operation merges the data with sorted data. However the table will still have unsorted segment for future loads.

The variable number_of_table_segments is one of three values that represent the number of table segments to allocate for Amazon Redshift tables:

0: A table has never been loaded; allocate disk space for zero table segments.

1: A table without a sort key has been loaded one or more times.

2: A table with a sort key has been loaded one or more times

Example minimum table size calculations:

If a table has 125 user columns with sort keys on a cluster with 16 slices, then the smallest size the table can have populating all 16 slices is calculated as follows:

1 MB * (125 + 3) * 16 * 2 = 4096 MB

If a table is created with a DDL statement and the table resides on a cluster two-slice cluster that is populating both slices, then the minimal table size calculation dictates that the table uses significantly less disk storage:

1 MB* (125 + 3)* 2 * 2 = 512 MB

If a table is created with an identical DDL statement and the table resides on a cluster with 64 populated slices, then the following minimum table size calculation dictates that the table uses significantly more disk storage:

1 MB * (125 + 3) * 64 * 2 = 16384 MB

Based on the minimal table size example, the table size can grow or shrink based on the number of slices populated on the cluster.

Understand Redshift Cluster Storage Space

Understand Redshift Cluster Storage Space

Resize an Amazon Redshift Cluster

Transfer Amazon Redshift Cluster

Find a VPC for Use With Your Redshift Cluster

Moving a Redshift Cluster From One VPC to Another

Change Redshift Cluster from Private to Public

Questions in Cloud Object Storage space

View Storage Use for Your Amazon Aurora DB Cluster

Understand Amazon RDS and Amazon Redshift Queries Running During a Maintenance Window

Use Logs to Track Redshift Database Cluster

Understand Connection Limits for Amazon Redshift

連接db2數據庫出現No buffer space available (maximum connections reached?)

ActiveMQ集群Master-Slave + Broker Cluster模式

/ThinkPHP/Library/Think/Storage/Driver/File.class.php 　LINE: 48

mysql報錯Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage

Mariadb Galera Cluster 部署

redis-cluster的安裝管理

ceph 集群報 mds cluster is degraded 故障排查

web storage

AIX創建刪除page space

Understand Redshift Cluster Storage Space

相關推薦