RGW 資料模型設計
ceph是一個開源的統一分散式儲存系統,RADOS是提供了底層基礎物件儲存服務,它由mon和osd組成。RADOS主要操作的物件有pool,object和object的xattr、omap。
rados gateway是基於RADOS的一個物件儲存服務,對外提供了S3、swift和RESTful api介面,對外提供儲存服務。
bucket和object(key)是rados gateway構造的兩個主要的資料模型,本文主要是介紹gateway中bucket和key的設計。
bucket:是一個存放key的容器,也可以理解為一個目錄,但是bucket不可以巢狀。
key:也可以稱作物件,它代表這上傳到儲存服務中的一份完整資料。
接下來通過一組實際操作來介紹bucket和key的設計。
rados gateway中也構造了account、zone、region等資料結構,但不是本文介紹重點,這裡就不做詳細介紹。
要想在gateway中建立bucket,上傳資料,首先要有建立一個使用者拿到一對認證金鑰(access_key、secret_key)。
gateway user
建立使用者:
# radosgw-admin user create --uid=yankun --display-name=yankun
{
"user_id": "yankun",
"display_name" : "yankun",
"email": "",
"suspended": 0,
"max_buckets": 1000,
"auid": 0,
"subusers": [],
"keys": [
{
"user": "yankun",
"access_key": "FLNOEBKYFT7R0VA2ZH03",
"secret_key": "2a3O5epEHpnRw26Rb6tukdYJz6nQes6hCoO5fIM3"
}
],
"swift_keys" : [],
"caps": [],
"op_mask": "read, write, delete",
"default_placement": "",
"placement_tags": [],
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"user_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"temp_url_keys": []
}
建立使用者之後就會獲得access_key和secret_key,然後就使用s3cmd這個客戶端來建立bucket,並上傳資料。
在s3cmd的配置檔案中,配置access_key、secret_key和服務地址。
RGW中的bucket
建立bucket:
# s3cmd mb s3://where_is_my_bucket
# s3cmd mb s3://where_is_my_bucket1
檢視bucket資訊:
# radosgw-admin bucket stats --bucket=where_is_my_bucket
{
"bucket": "where_is_my_bucket",
"pool": ".rgw.buckets",
"index_pool": ".rgw.buckets.index",
"id": "default.5762326.25",
"marker": "default.5762326.25",
"owner": "yankun",
"ver": "0#9",
"master_ver": "0#0",
"mtime": "2017-09-12 10:16:47.000000",
"max_marker": "0#",
"usage": {
"rgw.main": {
"size_kb": 4105961,
"size_kb_actual": 4105964,
"num_objects": 3
}
},
"bucket_quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
}
}
bucket物件:
使用者建立的bucket都會儲存在.users.uid pool 中物件yankun.buckets的omap中,key是bucket名字value是bucket的資訊。.users.id中儲存使用者的使用者名稱{username}和{username}.buckets
# rados -p .users.uid listomapkeys yankun.buckets
where_is_my_bucket
where_is_my_bucket1
# rados -p .users.uid getomapval yankun.buckets where_is_my_bucket binary_where_is_my_bucket
Writing to binary_where_is_my_bucket
# ceph-dencoder type RGWBucketEnt import binary_where_is_my_bucket decode dump_json
{
"bucket": {
"name": "where_is_my_bucket",
"pool": ".rgw.buckets",
"data_extra_pool": ".rgw.buckets.extra",
"index_pool": ".rgw.buckets.index",
"marker": "default.5762326.25",
"bucket_id": "default.5762326.25"
},
"size": 4204504056,
"size_rounded": 4204507136,
"mtime": 1505182607,
"count": 3
}
bucket在rados中的物件:
每個bucket,rados都會為其在.rgw.buckets.index pool中建立一個物件,其命名格式為:.dir.{bucket_id}
# rados -p .rgw.buckets.index ls > .rgw.buckets.index
# grep default.5762326.25 .rgw.buckets.index
.dir.default.5762326.25
bucket的元資訊:
bucket的元資訊在rados中一個獨立的物件儲存在.rgw pool中的.bucket.meta.{bucket_name}:{marker}。
# rados -p .rgw ls
where_is_my_bucket1
.bucket.meta.where_is_my_bucket1:default.5762326.26
where_is_my_bucket
.bucket.meta.where_is_my_bucket:default.5762326.25
# rados -p .rgw get .bucket.meta.where_is_my_bucket:default.5762326.25 binary.bucket.meta.where_is_my_bucket:default.5762326.25
# ceph-dencoder type RGWBucketInfo import .bucket.meta.where_is_my_bucket\:default.5762326.25 decode dump_json
{
"bucket": {
"name": "where_is_my_bucket",
"pool": ".rgw.buckets",
"data_extra_pool": ".rgw.buckets.extra",
"index_pool": ".rgw.buckets.index",
"marker": "default.5762326.25",
"bucket_id": "default.5762326.25"
},
"creation_time": 1505182607,
"owner": "yankun",
"flags": 0,
"region": "default",
"placement_rule": "default-placement",
"has_instance_obj": "true",
"quota": {
"enabled": false,
"max_size_kb": -1,
"max_objects": -1
},
"num_shards": 0,
"bi_shard_hash_type": 0
}
bucket的acl儲存在.bucket.meta.{bucket_name}:{marker}物件的xattr中。
# rados -p .rgw getxattr .bucket.meta.where_is_my_bucket:default.5762326.25 user.rgw.acl > binary.bucket.acl
# ceph-dencoder type RGWAccessControlPolicy import binary.bucket.acl decode dump_json
{
"acl": {
"acl_user_map": [
{
"user": "yankun",
"acl": 15
}
],
"acl_group_map": [],
"grant_map": [
{
"id": "yankun",
"grant": {
"type": {
"type": 0
},
"id": "yankun",
"email": "",
"permission": {
"flags": 15
},
"name": "yankun",
"group": 0
}
}
]
},
"owner": {
"id": "yankun",
"display_name": "yankun"
}
}
RGW中的object
object只能儲存在bucket中,這裡構造了一個大檔案where_is_my_object.txt,用於上傳到bucket中。
構造大檔案:
#dd if=/dev/zero of=./where_is_my_object.txt bs=2M count=1000
# du where_is_my_object.txt -h
2.0G where_is_my_object.txt
上傳大檔案到bucket中:
#s3cmd put where_is_my_object.txt s3://where_is_my_bucket
upload: 'where_is_my_object.txt' -> 's3://where_is_my_bucket/where_is_my_object.txt' [1 of 1]
2097152000 of 2097152000 100% in 123s 16.24 MB/s done
object與bucket之間的對映
檔案上傳到bucket where_is_my_bucket中該bucket的id為default.5762326.25,該物件與bucket的關係維護在.dir.{bucket_id}物件的omap中。
# rados -p .rgw.buckets.index listomapkeys .dir.default.5762326.25
where_is_my_object.txt
物件命名格式:
上傳的物件在rados中以一個物件存在或者多個物件存在,這主要看上傳物件的大小。
物件的資料儲存在.rgw.buckets pool中,如果上傳資料大小大於512KB,則會儲存多個物件,分別是一個頭物件(512KB)和一個或者多個尾物件(預設4MB)。頭物件命名格式為_,如where_is_my_bucket bucket中的where_is_my_object.txt物件在.rgw.buckets中的名字為:
default.5762326.25_where_is_my_object.txt;尾物件命名格式:{bucket_id}_shadow.{object_head:prefix}_{從1開始的自然序列}
# du default.5762326.25_where_is_my_object.txt
512 default.5762326.25_where_is_my_object.txt
# du default.5762326.25__shadow_.h_oQhOgqDTmDZx2FUSm8zMTOlbhDQsq_99
4096 default.5762326.25__shadow_.h_oQhOgqDTmDZx2FUSm8zMTOlbhDQsq_99
物件的元資訊:
物件的元資訊儲存在頭物件的xattr中
# rados -p .rgw.buckets listxattr default.5762326.25_where_is_my_object.txt
user.rgw.acl
user.rgw.content_type
user.rgw.etag
user.rgw.idtag
user.rgw.manifest
user.rgw.x-amz-date
user.rgw.x-amz-meta-s3cmd-attrs
user.rgw.x-amz-storage-class
物件的user.rgw.manifest屬性:
# rados -p .rgw.buckets getxattr default.5762326.25_where_is_my_object.txt ./binary.default.5762326.25_where_is_my_object.txt.user.rgw.manifest
# rados -p .rgw.buckets getxattr default.5762326.25_where_is_my_object.txt user.rgw.manifest > ./binary.default.5762326.25_where_is_my_object.txt.user.rgw.manifest
# ceph-dencoder type RGWObjManifest import binary.default.5762326.25_where_is_my_object.txt.user.rgw.manifest decode dump_json
{
"objs": [],
"obj_size": 2097152000,
"explicit_objs": "false",
"head_obj": {
"bucket": {
"name": "where_is_my_bucket",
"pool": ".rgw.buckets",
"data_extra_pool": ".rgw.buckets.extra",
"index_pool": ".rgw.buckets.index",
"marker": "default.5762326.25",
"bucket_id": "default.5762326.25"
},
"key": "",
"ns": "",
"object": "where_is_my_object.txt",
"instance": ""
},
"head_size": 524288,
"max_head_size": 524288,
"prefix": ".h_oQhOgqDTmDZx2FUSm8zMTOlbhDQsq_",
"tail_bucket": {
"name": "where_is_my_bucket",
"pool": ".rgw.buckets",
"data_extra_pool": ".rgw.buckets.extra",
"index_pool": ".rgw.buckets.index",
"marker": "default.5762326.25",
"bucket_id": "default.5762326.25"
},
"rules": [
{
"key": 0,
"val": {
"start_part_num": 0,
"start_ofs": 524288,
"part_size": 0,
"stripe_max_size": 4194304,
"override_prefix": ""
}
}
]
}
Object ACL:
# rados -p .rgw.buckets getxattr default.5762326.25_where_is_my_object.txt user.rgw.acl > binary.object.acl
# ceph-dencoder type RGWAccessControlPolicy import binary.object.acl decode dump_json
{
"acl": {
"acl_user_map": [
{
"user": "yankun",
"acl": 15
}
],
"acl_group_map": [],
"grant_map": [
{
"id": "yankun",
"grant": {
"type": {
"type": 0
},
"id": "yankun",
"email": "",
"permission": {
"flags": 15
},
"name": "yankun",
"group": 0
}
}
]
},
"owner": {
"id": "yankun",
"display_name": "yankun"
}
}
手動還原資料
根據object的模型設計,不通過rados gateway獲取一份完整的物件。
構造一個物件
location_object
# du -h location_object
9.8M location_object
本地物件md5值:
# md5sum location_object
24796d54d73d694168170135091f7eba location_object
上傳該物件到where_is_my_bucket
# s3cmd put location_object s3://where_is_my_bucket
upload: 'location_object' -> 's3://where_is_my_bucket/location_object' [1 of 1]
10200056 of 10200056 100% in 0s 77.72 MB/s
10200056 of 10200056 100% in 4s 2.18 MB/s done
物件切分:
根據object的設計他會在rados中存在4個物件,一個頭物件和3個尾物件。
頭物件:default.5762326.25_location_object
尾物件:default.5762326.25__shadow_.{object_head:prefix}{1,2,3}
頭物件:
rados -p .rgw.buckets ls | grep location
default.5762326.25_location_object
該物件的prefix:
# rados -p .rgw.buckets getxattr default.5762326.25_location_object user.rgw.manifest > ./binary.default.5762326.25_location_object.user.rgw.manifest
# ceph-dencoder type RGWObjManifest import binary.default.5762326.25_location_object.user.rgw.manifest decode dump_json
{
"objs": [],
"obj_size": 10200056,
"explicit_objs": "false",
"head_obj": {
"bucket": {
"name": "where_is_my_bucket",
"pool": ".rgw.buckets",
"data_extra_pool": ".rgw.buckets.extra",
"index_pool": ".rgw.buckets.index",
"marker": "default.5762326.25",
"bucket_id": "default.5762326.25"
},
"key": "",
"ns": "",
"object": "location_object",
"instance": ""
},
"head_size": 524288,
"max_head_size": 524288,
"prefix": ".Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_",
"tail_bucket": {
"name": "where_is_my_bucket",
"pool": ".rgw.buckets",
"data_extra_pool": ".rgw.buckets.extra",
"index_pool": ".rgw.buckets.index",
"marker": "default.5762326.25",
"bucket_id": "default.5762326.25"
},
"rules": [
{
"key": 0,
"val": {
"start_part_num": 0,
"start_ofs": 524288,
"part_size": 0,
"stripe_max_size": 4194304,
"override_prefix": ""
}
}
]
}
為物件為:default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_{1,2,3}
獲取被切分的物件:
使用rados來獲取這些被切分的物件:
# rados -p .rgw.buckets get default.5762326.25_location_object ./location_head
# rados -p .rgw.buckets get default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_1 ./default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_1
# rados -p .rgw.buckets get default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_2 ./default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_2
# rados -p .rgw.buckets get default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_3 ./default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_3
拼接該物件:
# cat location_head > new_location_object
# cat default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_1 >> new_location_object
# cat default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_2 >> new_location_object
# cat default.5762326.25__shadow_.Ux77sSsCN2UdioL5XxO0Hx8Ph9oXb35_3 >> new_location_object
new_location_object的md5值:
# md5sum new_location_object
24796d54d73d694168170135091f7eba new_location_object
注:拉取拼接後的物件與之前的物件md5值相同,內容沒有發生變化。