大資料匯入EXCEL

阿新 • • 發佈：2019-01-07

最近上頭給我派了一個活，oracle資料匯入excel，接任務的時候，我感覺比較輕鬆，心裡想，這很簡單，三下五除二，一個上午就可以搞定，因為之前實現過嘛！

但是在加上“大資料”烙印之後，就不是那麼簡單的一回事了，實現過程中，出現最常見的兩個問題：超出行數限制和記憶體溢位！

18天的資料，總共是500w條，如何將500w條記錄存入excel中，我當時想過兩種實現方式：PLSQL DEVELOPER和Java poi！

PLSQL DEVELOPER

有兩種實現方法：

1、在新建一個SQL WINDOW，執行你要匯出資料的查詢語句，查詢完之後，在結果顯示的地方點擊向下的箭頭，讓它全部顯示，這可能需要一點點時間，顯示結束後，右鍵點選顯示結果的地方，選中 copy to excel（xls和xlsx，前者是03及以前版本，每個sheet只能顯示65535條記錄；後者是07及以後版本每個sheet可以顯示1048576條記錄）。

2、在新建一個REPORT WINDOW，執行你要匯出資料的查詢語句，查詢完之後，點選螢幕右邊綠色圓餅狀圖示（export results），後面的操作就很簡單了，不再廢話。

兩種方式比較實現起來簡單，易操作，但是有很嚴重的弊端：其一，PLSQL DEVELOPER一次匯出excel資料有限，只有幾十萬條，超出範圍，則記憶體溢位；其二，如果分頁查詢或者條件查詢，則分批的資料又不能匯入同一個excel中。挺痛苦的~

在簡單方法行不通的時候，只能走向更加複雜的程式之路...

Java poi

在使用Java poi之前，嘗試過JXL，但是個人覺得Java poi更加順手，這並不是說JXL不好用，JXL更多地面向的是底層，比較麻煩點，但更加靈活；而Java poi封裝地更多，使用起來更加順手。

其實這些都不是重點！

重點是在實現過程中如何處理上面兩個最常見的問題：超出行數限制和記憶體溢位！

記憶體溢位：

一個經常處理大資料，公司硬體卻跟不上的軟肋，真心耗費時間！最常用的解決途徑就是分批處理，結合Java 虛擬機器觀察一次處理中在不導致記憶體溢位的前提下，最大能處理的資料量，以達到虛擬機器的充分利用。

在oracle查詢資料這一段，寫個分頁查詢，分頁查詢完後，都放入到一個集合中，具體實現過程，暫且不表！

超出行數限制：

如果將oracle中查詢出500w資料一股腦兒匯入excel，又會遇到另一個棘手問題：超出行數限制。

如果到的是xls格式，我就讓程式迴圈跑起來，迴圈一次，匯入65535條；xlsx格式的，就讓它迴圈一次，匯入1048576條，如此迴圈下去，直到程式跑完！

請看本人程式碼示例：

public class XlsDto2Excel {  
    @Autowired
    private ToDBDao toDBDao;
    /**
     *
     * @param xls
     * XlsDto實體類的一個物件
     * @throws Exception
     * 在匯入Excel的過程中丟擲異常
     */
    public void toExcel(String date, int count) {

        int PAGESIZE = 65535;
        // declare a new workbook 宣告一個工作簿
        HSSFWorkbook wb = new HSSFWorkbook();

        // declare a row object reference 宣告一個新行
        HSSFRow r = null;

        // declare a cell object reference 宣告一個單元格
        HSSFCell c0, c1, c2, c3, c4, c5 = null;

        HSSFCell[] firstcell = new HSSFCell[6];

        // create 2 cell styles 建立2個單元格樣式
        HSSFCellStyle cs = wb.createCellStyle();
        HSSFCellStyle cs2 = wb.createCellStyle();

        // create 2 fonts objects 建立2個單元格字型
        HSSFFont f = wb.createFont();
        HSSFFont f2 = wb.createFont();

        // Set font 1 to 12 point type, blue and bold 設定字型型別1到12號，藍色和粗體
        f.setFontHeightInPoints((short) 12);
        f.setColor(HSSFColor.RED.index);
        f.setBoldweight(HSSFFont.BOLDWEIGHT_BOLD);

        // Set font 2 to 10 point type, red and bold 設定字型型別2到10號，黑色和粗體
        f2.setFontHeightInPoints((short) 10);
        f2.setColor(HSSFColor.BLACK.index);
        f2.setBoldweight(HSSFFont.BOLDWEIGHT_BOLD);

        // Set cell style and formatting 設定單元格樣式和格式
        cs.setFont(f);
        // 水平佈局：居中
        cs.setAlignment(HSSFCellStyle.ALIGN_CENTER);
        // cs.setDataFormat(df.getFormat("#,##0.0"));

        // Set the other cell style and formatting 設定其他單元格樣式和格式
        cs2.setBorderBottom(cs2.BORDER_THIN);
        cs2.setDataFormat(HSSFDataFormat.getBuiltinFormat("text"));
        cs2.setFont(f2);
        // 水平佈局：居中
        cs2.setAlignment(HSSFCellStyle.ALIGN_CENTER);

        // 從資料庫，獲取總的集合大小
        List list = this.toDBDao.selectFatherData(date);

        // 獲取迴圈次數（在此結果上+1），一次迴圈下，子迴圈PAGESIZE次，最後一次迴圈，子迴圈mod次
        int circleCount = list.size() / PAGESIZE;

        int mod = list.size() % PAGESIZE;

        String firstOrderId = "";

        String orderTime = "";

        for (int i = 0; i < circleCount + 1; i++) {
          
            // create a new sheet 建立一個新工作表，但一個sheet載入滿65535條記錄後，自動生成一個新的sheet，以保證不會超出行數限制
            HSSFSheet sheet = wb.createSheet("第" + i+ "頁");
            /*
             * 設定表頭
             */
            HSSFRow firstrow = sheet.createRow(0); // 下標為0的行開始
            
            String[] names = new String[6];
            names[0] = "訪問編號";
            names[1] = "瀏覽數";
            names[2] = "平均訪問時長";
            names[3] = "訂單編號";
            names[4] = "下單時間";
            names[5] = "初始時間";

            for (int j = 0; j < 6; j++) {

                firstcell[j] = firstrow.createCell(j);
                firstcell[j].setCellValue(new HSSFRichTextString(names[j]));
                firstcell[j].setCellStyle(cs2);
            }
            //最後一次迴圈
            if (i == circleCount) {
                
                for (int rownum = 1; rownum < mod; rownum++) {

                    // 獲取行物件
                    r = sheet.createRow(rownum);

                    HashMap father = (HashMap) list.get(rownum + PAGESIZE * i);

                    for (int cellnum = 0; cellnum < 6; cellnum++) {

                        /*
                         * 獲取列物件
                         */
                        c0 = r.createCell(0);
                        c1 = r.createCell(1);
                        c2 = r.createCell(2);
                        c3 = r.createCell(3);
                        c4 = r.createCell(4);
                        c5 = r.createCell(5);
                        /*
                         * 給列物件賦值
                         */
                        c0.setCellValue(father.get("sessionId").toString());
                        c0.setCellStyle(cs2);

                        c1.setCellValue(father.get("visitPages").toString());
                        c1.setCellStyle(cs2);

                        c2.setCellValue(father.get("perVisitsTime").toString());
                        c2.setCellStyle(cs2);

                        if (null != father.get("firstOrderId")) {

                            firstOrderId = father.get("firstOrderId")
                                    .toString();

                        }
                        c3.setCellValue(firstOrderId);
                        c3.setCellStyle(cs2);

                        if (null != father.get("orderTime")) {

                            orderTime = father.get("orderTime").toString();

                        }
                        c4.setCellValue(orderTime);
                        c4.setCellStyle(cs2);

                        c5.setCellValue(father.get("initTime").toString());
                        c5.setCellStyle(cs2);

                    }

                }

            } else {

                for (int rownum = 1; rownum <= PAGESIZE; rownum++) {

                    // 獲取行物件
                    r = sheet.createRow(rownum);

                    HashMap father = (HashMap) list.get(rownum + PAGESIZE * i);

                    for (int cellnum = 0; cellnum < 6; cellnum++) {

                        /*
                         * 獲取列物件
                         */
                        c0 = r.createCell(0);
                        c1 = r.createCell(1);
                        c2 = r.createCell(2);
                        c3 = r.createCell(3);
                        c4 = r.createCell(4);
                        c5 = r.createCell(5);
                        /*
                         * 給列物件賦值
                         */
                        c0.setCellValue(father.get("sessionId").toString());
                        c0.setCellStyle(cs2);

                        c1.setCellValue(father.get("visitPages").toString());
                        c1.setCellStyle(cs2);

                        c2.setCellValue(father.get("perVisitsTime").toString());
                        c2.setCellStyle(cs2);

                        if (null != father.get("firstOrderId")) {

                            firstOrderId = father.get("firstOrderId")
                                    .toString();

                        }
                        c3.setCellValue(firstOrderId);
                        c3.setCellStyle(cs2);

                        if (null != father.get("orderTime")) {

                            orderTime = father.get("orderTime").toString();

                        }
                        c4.setCellValue(orderTime);
                        c4.setCellStyle(cs2);

                        c5.setCellValue(father.get("initTime").toString());
                        c5.setCellStyle(cs2);

                    }
                }

            }

            sheet.autoSizeColumn((short) 0); // 根據內容調整第一列寬度，不過不設定，預設情況下，按照表頭自動調整寬度
            sheet.autoSizeColumn((short) 4); // 根據內容調整第五列寬度，不過不設定，預設情況下，按照表頭自動調整寬度
            sheet.autoSizeColumn((short) 5); // 根據內容調整第六列寬度，不過不設定，預設情況下，按照表頭自動調整寬度
        }
        // Save儲存
        FileOutputStream out;
        try {
            out = new FileOutputStream("d://workbook.xls");
            wb.write(out);
            out.close();
        } catch (FileNotFoundException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        } catch (IOException e) {
            // TODO Auto-generated catch block
            e.printStackTrace();
        }
        System.out.println("--執行完畢--");

    }
}

以上程式碼匯入的是xls格式。

說明：雖然，用xlsx格式方式可以一次匯入100w條記錄，但是程式碼執行要慢上許多，我也沒搞清楚具體原因。不過，大家可以在匯入同樣多資料情況下，用xls和xlsx兩種方式對比一下。

其實除了以上兩種方式之外，經本人查詢資料，還找到了另一個更加方便的方式，那就是搭建資料來源ODBC，連線excel，資料傳輸。

如何搭建oracle ODBC，網上資料一查一大把，這裡我就不再贅述！

搭建完之後，我們就來操作excel，新建一個excel，07版的更好，選中資料。

選擇來自資料連線嚮導—>其他/高階—>選擇帶有oracle的選項，接著輸入使用者名稱、密碼和連線地址，連線成功後，會顯示oracle中的所有表名，注意這些表明排列是沒有順序的，但是我們可以快速索引到我們想要匯出資料的表名，比如一個表名叫做father_user_behavior，我們可以這樣定位：先按f，再按a，再按t，這樣基本上可以直接定位到你想要的表名，接下來的操作就很簡單了，不多說！

需要注意的是，當資料超出1048576條，同樣會出現問題（它不會自動生成第二個sheet）：

所以這種方式，也有其弊端。

大資料匯入EXCEL

大資料匯入EXCEL

Java 讀取較大資料的excel檔案

oracle大資料匯入sqlldr與匯出sqluldr2

Java POI 大資料生成Excel

PHP phpspreadsheet 匯出大資料 csv excel

Revit二次開發_將資料匯入Excel

Python 實現MySQL資料匯入Excel

Java操作json資料匯入Excel

Oracle將資料匯入excel

datagridview資料匯入Excel

大資料匯入之Bulk Insert的一點總結

MVC 實現資料匯入Excel，並在客戶端下載。

爬蟲+資料匯入Excel

C#把txt的資料匯入excel,用逗號分隔

牛逼的OSQL----大資料匯入

使用php將資料匯入excel的簡單方法

使用Apache POI技術把資料庫中的資料匯入Excel表

Python實現資料庫資料匯入Excel

SQL Server大資料匯入匯出:將一張表的資料匯入到另一張表

使用JS程式碼把WEB頁面的資料匯入Excel,程式碼如下:

大資料匯入EXCEL

相關推薦