java - hive - 讀寫orc檔案

阿新 • • 發佈：2020-09-11

讀取orc檔案

    @Test
    public void readOrc() throws IOException {
        Configuration conf = new Configuration();
        Reader reader = OrcFile.createReader(new Path("/tmp/Orc.orc"),
                OrcFile.readerOptions(conf));
        RecordReader rows = reader.rows();
        VectorizedRowBatch batch  
= reader.getSchema().createRowBatch();
        while (rows.nextBatch(batch)) {
            System.out.println(batch.toString());
        }
        rows.close();
    }

寫orc檔案---一行

    @Test
    public void writeLine3() throws IOException {
        Configuration conf = new Configuration();
        TypeDescription schema  
= TypeDescription.fromString("struct<x:int,y:int>");
        Writer writer = OrcFile.createWriter(new Path("/tmp/Orc.orc"),
                OrcFile.writerOptions(conf)
                        .setSchema(schema));
        VectorizedRowBatch batch = schema.createRowBatch();
        LongColumnVector x  
= (LongColumnVector) batch.cols[0];
        LongColumnVector y = (LongColumnVector) batch.cols[1];
        int row = batch.size++;
        x.vector[row] = 2;
        y.vector[row] = 2 * 3;
        if (batch.size != 0) {
            writer.addRowBatch(batch);
            batch.reset();
        }
        writer.close();
    }

寫orc檔案--多行

    @Test
    public void writeLine2() throws IOException {
        String[] lines = new String[]{"1,a,aa", "2,b,bb", "3,c,cc", "4,d,dd", "1,a,aa", "2,b,bb", "3,c,cc", "4,d,dd", "1,a,aa", "2,b,bb", "3,c,cc", "4,d,dd", "1,a,aa", "2,b,bb", "3,c,cc", "4,d,dd"};
//        String[] lines = new String[]{"1,2,4", "1,2,3", "1,2,3", "1,2,3", "1,2,3", "1,2,3", "1,2,3", "1,2,3"};


        Configuration conf = new Configuration();
        TypeDescription schema = TypeDescription.fromString("struct<field1:String,field2:String,field3:String>");
//        TypeDescription schema = TypeDescription.fromString("struct<field1:int,field2:int,field3:int>");
        Writer writer = OrcFile.createWriter(new Path("/tmp/Orc.orc"),
                OrcFile.writerOptions(conf)
                        .setSchema(schema).overwrite(true));
        VectorizedRowBatch batch = schema.createRowBatch();
        List<? super ColumnVector> columnVectors = new ArrayList<>();

        for (int i = 0; i < batch.numCols; i++) {
            columnVectors.add(batch.cols[i]);
        }

        for (String line : lines) {
            String[] columns = line.split(",");
            System.out.println(batch.size);
            int row = batch.size++;
            for (int i = 0; i < columns.length; i++) {
                switch (columnVectors.get(i).getClass().getName()) {
                    case "org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector":
                        BytesColumnVector bytesColumnVector = BytesColumnVector.class.cast(columnVectors.get(i));
                        bytesColumnVector.setVal(row,columns[i].getBytes(),0,columns[i].getBytes().length);
                        break;
                    case "org.apache.hadoop.hive.ql.exec.vector.LongColumnVector":
                        LongColumnVector longColumnVector = LongColumnVector.class.cast(columnVectors.get(i));
                        longColumnVector.vector[row] = Long.parseLong(columns[i]);
                        break;
                }
                if (batch.size == batch.getMaxSize()) {
                    writer.addRowBatch(batch);
                    batch.reset();
                }
            }
        }
        if (batch.size != 0) {
            writer.addRowBatch(batch);
            batch.reset();
        }
        writer.close();

    }

引用jar

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hive.ql.exec.vector.BytesColumnVector;
import org.apache.hadoop.hive.ql.exec.vector.ColumnVector;
import org.apache.hadoop.hive.ql.exec.vector.LongColumnVector;
import org.apache.hadoop.hive.ql.exec.vector.VectorizedRowBatch;
import org.apache.orc.*;
import org.junit.Test;

import java.io.File;
import java.io.IOException;
import java.util.ArrayList;
import java.util.List;

java - hive - 讀寫orc檔案

讀取orc檔案 @Test public void readOrc() throws IOException { Configuration conf = new Configuration();

java讀寫磁碟檔案

參考： https://blog.csdn.net/qq_30141957/article/details/80049128 https://blog.csdn.net/liuhenghui5201/article/details/8279557

Java利用POI讀寫Excel檔案工具類

本文例項為大家分享了Java讀寫Excel檔案工具類的具體程式碼，供大家參考，具體內容如下

【Java】利用JavaCSV API來讀寫csv檔案

1 背景 CSV檔案的讀寫其實是有很多方法的，在這裡介紹一種利用第三方jar包來讀寫CSV檔案的方法。

使用hive客戶端java api讀寫hive叢集上的資訊

上文介紹了hdfs叢集資訊的讀取方式，本文說hive 1、先解決依賴 <properties> <hive.version>1.2.1</hive.version>

Java利用讀寫的方式實現音訊播放程式碼例項

這篇文章主要介紹了Java利用讀寫的方式實現音訊播放程式碼例項,文中通過示例程式碼介紹的非常詳細，對大家的學習或者工作具有一定的參考學習價值,需要的朋友可以參考下

Python3讀寫Excel檔案(使用xlrd,xlsxwriter,openpyxl3種方式讀寫例項與優劣)

Python中幾種常用包比較 2、用xlrd包讀取Excel檔案引用包 import xlrd 開啟檔案 xlrd.open_workbook(r\'/root/excel/chat.xls\')

python使用docx模組讀寫docx檔案的方法與docx模組常用方法詳解

一，docx模組 Python可以利用python-docx模組處理word文件，處理方式是面向物件的。也就是說python-docx模組會把word文件，文件中的段落、文字、字型等都看做物件，對物件進行處理就是對word文件的內容處理。

Python3中configparser模組讀寫ini檔案並解析配置的用法詳解

Python3中configparser模組簡介 configparser 是 Pyhton 標準庫中用來解析配置檔案的模組，並且內建方法和字典非常接近。Python2.x 中名為 ConfigParser，3.x 已更名小寫，並加入了一些新功能。

C# 讀寫XML檔案例項程式碼

C#史上最簡單讀寫xml檔案方式，建立控制檯應用程式賦值程式碼，就可以執行，需要改動，請自行調整

Python3操作讀寫CSV檔案使用包過程解析

CSV(Comma-Separated Values)即逗號分隔值，一種以逗號分隔按行儲存的文字檔案，所有的值都表現為字串型別（注意：數字為字串型別）。

WPF 讀寫XML檔案

程式集整體框架如下：其中XmlReader類如下： using System; using System.Collections.Generic;

C#讀寫文字檔案原始碼片段

下邊內容段是關於C#讀寫文字檔案片段的內容，應該是對碼農們也有用。 using System; using System.IO; public class TestReadFile {public static void Main(String[] args){FileStream fs = new FileStream(@c:temp