如何使用Tunnel SDK上傳/下載MaxCompute復雜類型數據
復雜數據類型
MaxCompute采用基於ODPS2.0的SQL引擎,豐富了對復雜數據類型類型的支持。MaxCompute支持ARRAY, MAP, STRUCT類型,並且可以任意嵌套使用並提供了配套的內建函數。
復雜類型構造與操作函數
Tunnel SDK 介紹
Tunnel 是 ODPS 的數據通道,用戶可以通過 Tunnel 向 ODPS 中上傳或者下載數據。
TableTunnel 是訪問 ODPS Tunnel 服務的入口類,僅支持表數據(非視圖)的上傳和下載。
對一張表或 partition 上傳下載的過程,稱為一個session。session 由一或多個到 Tunnel RESTful API 的 HTTP Request 組成。
數據的上傳和下載分別由 TableTunnel.UploadSession 和 TableTunnel.DownloadSession 這兩個會話來負責。
TableTunnel 提供創建 UploadSession 對象和 DownloadSession 對象的方法.
典型表數據上傳流程:
1) 創建 TableTunnel
2) 創建 UploadSession
3) 創建 RecordWriter,寫入 Record
4)提交上傳操作
典型表數據下載流程:
2) 創建 DownloadSession
3) 創建 RecordReader,讀取 Record
基於Tunnel SDK構造復雜類型數據
代碼示例:
RecordWriter recordWriter = uploadSession.openRecordWriter(0); ArrayRecord record = (ArrayRecord) uploadSession.newRecord(); // prepare data List arrayData = Arrays.asList(1, 2, 3); Map<String, Long> mapData = new HashMap<String, Long>(); mapData.put("a", 1L); mapData.put("c", 2L); List<Object> structData = new ArrayList<Object>(); structData.add("Lily"); structData.add(18); // set data to record record.setArray(0, arrayData); record.setMap(1, mapData); record.setStruct(2, new SimpleStruct((StructTypeInfo) schema.getColumn(2).getTypeInfo(), structData)); // write the record recordWriter.write(record);
從MaxCompute下載復雜類型數據
代碼示例:
RecordReader recordReader = downloadSession.openRecordReader(0, 1);
// read the record
ArrayRecord record1 = (ArrayRecord)recordReader.read();
// get array field data
List field0 = record1.getArray(0);
List<Long> longField0 = record1.getArray(Long.class, 0);
// get map field data
Map field1 = record1.getMap(1);
Map<String, Long> typedField1 = record1.getMap(String.class, Long.class, 1);
// get struct field data
Struct field2 = record1.getStruct(2);
運行實例
完整代碼如下:
import java.io.IOException;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import com.aliyun.odps.Odps;
import com.aliyun.odps.PartitionSpec;
import com.aliyun.odps.TableSchema;
import com.aliyun.odps.account.Account;
import com.aliyun.odps.account.AliyunAccount;
import com.aliyun.odps.data.ArrayRecord;
import com.aliyun.odps.data.RecordReader;
import com.aliyun.odps.data.RecordWriter;
import com.aliyun.odps.data.SimpleStruct;
import com.aliyun.odps.data.Struct;
import com.aliyun.odps.tunnel.TableTunnel;
import com.aliyun.odps.tunnel.TableTunnel.UploadSession;
import com.aliyun.odps.tunnel.TableTunnel.DownloadSession;
import com.aliyun.odps.tunnel.TunnelException;
import com.aliyun.odps.type.StructTypeInfo;
public class TunnelComplexTypeSample {
private static String accessId = "<your access id>";
private static String accessKey = "<your access Key>";
private static String odpsUrl = "<your odps endpoint>";
private static String project = "<your project>";
private static String table = "<your table name>";
// partitions of a partitioned table, eg: "pt=\‘1\‘,ds=\‘2\‘"
// if the table is not a partitioned table, do not need it
private static String partition = "<your partition spec>";
public static void main(String args[]) {
Account account = new AliyunAccount(accessId, accessKey);
Odps odps = new Odps(account);
odps.setEndpoint(odpsUrl);
odps.setDefaultProject(project);
try {
TableTunnel tunnel = new TableTunnel(odps);
PartitionSpec partitionSpec = new PartitionSpec(partition);
// ---------- Upload Data ---------------
// create upload session for table
// the table schema is {"col0": ARRAY<BIGINT>, "col1": MAP<STRING, BIGINT>, "col2": STRUCT<name:STRING,age:BIGINT>}
UploadSession uploadSession = tunnel.createUploadSession(project, table, partitionSpec);
// get table schema
TableSchema schema = uploadSession.getSchema();
// open record writer
RecordWriter recordWriter = uploadSession.openRecordWriter(0);
ArrayRecord record = (ArrayRecord) uploadSession.newRecord();
// prepare data
List arrayData = Arrays.asList(1, 2, 3);
Map<String, Long> mapData = new HashMap<String, Long>();
mapData.put("a", 1L);
mapData.put("c", 2L);
List<Object> structData = new ArrayList<Object>();
structData.add("Lily");
structData.add(18);
// set data to record
record.setArray(0, arrayData);
record.setMap(1, mapData);
record.setStruct(2, new SimpleStruct((StructTypeInfo) schema.getColumn(2).getTypeInfo(),
structData));
// write the record
recordWriter.write(record);
// close writer
recordWriter.close();
// commit uploadSession, the upload finish
uploadSession.commit(new Long[]{0L});
System.out.println("upload success!");
// ---------- Download Data ---------------
// create download session for table
// the table schema is {"col0": ARRAY<BIGINT>, "col1": MAP<STRING, BIGINT>, "col2": STRUCT<name:STRING,age:BIGINT>}
DownloadSession downloadSession = tunnel.createDownloadSession(project, table, partitionSpec);
schema = downloadSession.getSchema();
// open record reader, read one record here for example
RecordReader recordReader = downloadSession.openRecordReader(0, 1);
// read the record
ArrayRecord record1 = (ArrayRecord)recordReader.read();
// get array field data
List field0 = record1.getArray(0);
List<Long> longField0 = record1.getArray(Long.class, 0);
// get map field data
Map field1 = record1.getMap(1);
Map<String, Long> typedField1 = record1.getMap(String.class, Long.class, 1);
// get struct field data
Struct field2 = record1.getStruct(2);
System.out.println("download success!");
} catch (TunnelException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
如何使用Tunnel SDK上傳/下載MaxCompute復雜類型數據