簡單示例用例(Simple Example Use Cases)--hive GettingStarted用例翻譯

阿新 • • 發佈：2017-06-27

翻譯 nload insert fields 清洗 group eas lease wid

1、MovieLens User Ratings

First, create a table with tab-delimited text file format:

首先，創建一個通過tab分隔的表：

CREATE TABLE u_data (

userid INT,

movieid INT,

rating INT,

unixtime STRING)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t‘

STORED AS TEXTFILE;

Then, download the data files from MovieLens 100k

on the GroupLens datasets page (which also has a README.txt file and index of unzipped files):

然後，下載數據文件從下面方法：

wget http://files.grouplens.org/datasets/movielens/ml-100k.zip

or:

curl --remote-name http://files.grouplens.org/datasets/movielens/ml-100k.zip

Note: If the link to GroupLens datasets does not work, please report it on HIVE-5341 or send a message to the [email protected]

mailing list.

Unzip the data files:

解壓縮這個文件：

unzip ml-100k.zip

And load u.data into the table that was just created:

並且加載數據到剛剛創建的u_data表中：

LOAD DATA LOCAL INPATH ‘<path>/u.data‘ OVERWRITE INTO TABLE u_data;

Count the number of rows in table u_data:

統計表u_data的行數：

SELECT COUNT(*) FROM u_data;

Note that for older versions of Hive which don‘t include HIVE-287, you‘ll need to use COUNT(1) in place of COUNT(*).

Now we can do some complex data analysis on the table u_data:

現在我們可以做一些復雜的數據分析針對表u_data：

Create weekday_mapper.py:

import sys

import datetime

for line in sys.stdin:

line = line.strip()

userid, movieid, rating, unixtime = line.split(‘\t‘)

weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()

print ‘\t‘.join([userid, movieid, rating, str(weekday)])

Use the mapper script:

使用這個腳本：

CREATE TABLE u_data_new (

userid INT,

movieid INT,

rating INT,

weekday INT)

ROW FORMAT DELIMITED FIELDS TERMINATED BY ‘\t‘;

add FILE weekday_mapper.py;

INSERT OVERWRITE TABLE u_data_new

SELECT

TRANSFORM (userid, movieid, rating, unixtime)

USING ‘python weekday_mapper.py‘

AS (userid, movieid, rating, weekday)

FROM u_data;

解釋：這裏通過python腳本清洗表u_data中數據，使用python腳本通過

TRANSFORM (userid, movieid, rating, unixtime) --輸入字段

USING ‘python weekday_mapper.py‘ --腳本處理

AS (userid, movieid, rating, weekday) --輸出字段

SELECT weekday, COUNT(*)

FROM u_data_new

GROUP BY weekday;

Note that if you‘re using Hive 0.5.0 or earlier you will need to use COUNT(1) in place of COUNT(*).

2、Apache Weblog Data

The format of Apache weblog is customizable, while most webmasters use the default.
For default Apache weblog, we can create a table with the following command.

More about RegexSerDe can be found here in HIVE-662 and HIVE-1719.

CREATE TABLE apachelog (

host STRING,

identity STRING,

user STRING,

time STRING,

request STRING,

status STRING,

size STRING,

referer STRING,

agent STRING)

ROW FORMAT SERDE ‘org.apache.hadoop.hive.serde2.RegexSerDe‘

WITH SERDEPROPERTIES (

"input.regex" = "([^ ]*) ([^ ]*) ([^ ]*) (-|\\[^\\]*\\]) ([^ \"]*|\"[^\"]*\") (-|[0-9]*) (-|[0-9]*)(?: ([^ \"]*|\".*\") ([^ \"]*|\".*\"))?"

)

STORED AS TEXTFILE;

簡單示例用例(Simple Example Use Cases)--hive GettingStarted用例翻譯

翻譯 nload insert fields 清洗 group eas lease wid 1、MovieLens User Ratings First, create a table with tab-delimited text file format: 首先，創建

簡單示例用例(Simple Example Use Cases)--hive GettingStarted用例翻譯

2、Apache Weblog Data

簡單示例用例(Simple Example Use Cases)--hive GettingStarted用例翻譯

用擴充套件和包含來構造用例 - Structuring Use Cases with Extend and Include

確定用例之間的關係 - Identify Relationships Among Use Cases

JavaSE8基礎鏈式編程調用方法返回對象再調方法簡單示例

Linux I2C驅動--用戶態驅動簡單示例

UML之靜態圖---用例圖（use case diagram）

用java編寫spark程式，簡單示例及執行

laravel 用戶認證簡單示例

用map函式來完成Python並行任務的簡單示例

用python實現解常微分方程組的簡單示例以及用odeint解常微分方程的範例

UML——Use Case Diagram（用例圖）

[shell]system和execlp簡單示例

Asp.Net Core WebAPI入門整理（二）簡單示例

JAVA入門[20]-Hibernate簡單示例

死鎖的簡單示例

vue-router單頁應用簡單示例（一）

vue-router單頁應用簡單示例（二）

[pthread]Linux C 多線程簡單示例

Echart 使用圖表簡單示例

A Simple Example About Privileged Methods in JavaScript

簡單示例用例(Simple Example Use Cases)--hive GettingStarted用例翻譯

2、Apache Weblog Data

相關推薦