Hive-之Load data into table[partition](hdfs -＞ ods ,ods -＞ dw)

阿新 • • 發佈：2021-01-25

技術標籤：Hive

Hive-之Load data into table[partition]

1 從HDFS到ODS層

--建立表，確定schema和各種format
DROP TABLE IF EXISTS shufang.students;
CREATE TABLE IF NOT EXISTS shufang.students(
    id int ,
    name string,
    create_time string 
) partitioned by (dt string) --指定分割槽表
row format delimited fields terminated 
 by '\t' --指定欄位分隔符
STORED AS 
INPUTFORMAT  'com.hadoop.mapred.DeprecatedLzoTextInputFormat'  --指定INPUTFORMAT，就是從
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/user/hive/warehouse/shufang.db/students'  --指定在表在HDFS中的儲存路徑
;

--匯入資料，是不走MR的，只需要將指定的目錄中的檔案移動到分割槽目錄下
LOAD DATA 
 INPATH '/origin_data/db/shufang/students/2021-01-18' INTO TABLE shufang.students PARTITION(dt = '2021-01-18');

--如果是flume過來的日誌資料，由於只做了壓縮，還不支援切片，所以我們需要load之後將資料建立索引支援切片
hadoop jar /opt/module/hadoop-2.7.7/share/hadoop/common/hadoop-lzo-0.4.21-SNAPSHOT.jar \
com.hadoop.compression.lzo.DistributedLzoIndexer\
/user/hive/ 
warehouse/shufang.db/students/dt=2021-01-18

2 從ODS到DWD

CREATE TABLE IF NOT EXISTS student1(
id int ,
name string,
create_time string
) COMMENT 'parquet store table，parquet is born to support split'
PARTITIONED BY(dt string) --指定分割槽鍵
STORED AS parquet --指定儲存，底層還是inputformat 和 outputformat
LOCATION '/user/hive/warehouse/shufang.db/student1' --指定儲存路徑
TBLPROPERTIES('parquet.compression' = 'lzo'); --指定表屬性，為parquet指定壓縮格式



INSERT OVERWRITE TABLE student1 PARTITION(dt = '2021-01-18') 
SELECT 
id,
name,
create_time
FROM students 
WHERE dt='2021-01-18';

Hive-之Load data into table[partition](hdfs -＞ ods ,ods -＞ dw)

技術標籤：Hive Hive-之Load data into table[partition] 1 從HDFS到ODS層 --建立表，確定schema和各種format

MySQL 之 LOAD DATA INFILE 快速匯入資料 (單表資料很大)

SELECT INTO OUTFILE LOAD DATA INFILE mysqlimport SELECT INTO OUTFILE > help select; Name: \'SELECT\' Description:

hive的load報錯Error: EXECUTION FAILED: Task MOVE error HiveException: [Error 20531] Unable to move source hdfs:

hive在進行資料load的時候報錯。 Error: EXECUTION FAILED: Task MOVE error HiveException: [Error 20531] Unable to move source hdfs://nameservice1/tmp/logs to destination hdfs://nameservice1/inceptor1/us

Spark專案實戰從0到1之（10）Spark讀取HDFS寫入Hive

package com.xxxx.report.service; import com.google.common.collect.Lists; import com.xx.report.config.Constants;

MYSQL備份恢復資料（使用INTO OUTFILE和LOAD DATA INFILE）

備份語句：SELECT * FROM 表名 INTO OUTFILE \'檔案路徑\' CHARACTER SET gbk FIELDS TERMINATED BY \',\' OPTIONALLY ENCLOSED BY \'\"\' ESCAPED BY \'\"\' LINES TERMINATED BY \'\\r\\n\';

mysql遇到load data匯入檔案資料出現1290錯誤的解決方案

錯誤出現情景　　在cmd中使用mysql命令，學生資訊表新增資料。使用load data方式簡單批量匯入資料。

MySQL LOAD DATA INFILE—批量從檔案（csv、txt）匯入資料

最近做的專案，有個需求(從Elastic Search取資料，業務運算後），每次要向MySQL插入1300萬資料左右。最初用MySQL的executemany()一次插入10000條資料，統計的時間如下：

load data infile的用法

簡單來兩個例子建立要插入的表use database bastion;create table tmp(id int,data varchar(40));

Backtrader中文筆記之CSV Data Feed Data - Data Resampling

When data is only available in a single timeframe and the analysis has to be done for a different timeframe, it’s time to do some resampling.

Backtrader中文筆記之 Feed Data - Data - Replay

The time is gone and testing a strategy against a fully formed and closed bar is good, but it could be better.

Backtrader中文筆記之 Feed Data - Filters

Filters This functionality is a relatively late addition to backtrader and had to be fitted to the already existing internals. This makes it to be not as flexible and 100% feature full as wished, but

Flink 從 0 到 1 學習之（20）Flink讀取hdfs檔案

接一下以一個示例配置來介紹一下如何以Flink連線HDFS 1. 依賴HDFS pom.xml 新增依賴

Backtrader中文筆記之Tick Data and Resampling

參考連結:https://www.backtrader.com/blog/posts/2015-09-25-tickdata-resample/resample-tickdata/ backtrader could already do resampling up from minute data. Accepting tick data was not a problem, by si