在純 hive SQL 中建立時間維度表
Without further ado, here is the full SQL to create a table giving you a table with one row per day, with date, year, mont, day, day and name of the week, day of the year. If you want the hours as well, look at the bottom of this post.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
, a.pos) as d
'D' ) as daynumber_of_year
|
Note that I use d as date column because date is a reserved keyword.
The biggest issue is to generate one row per day. The trick here is to use a clever combination of posexplode, split and reapeat. This is what the first CTE does:
1 2 3 4 5 |
|
We can break it down in a few parts:
1 2 |
|
Just computes the difference between start and end day in days.
1 2 |
|
Will output a string with 9 ‘o’. The actual character does not matter at all.
1 2 |
|
Creates a hive array of 9 (empty) strings.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
|
Actually create a row per array element, with the index (0 to 9) and the value (nothing) of each element.
That was the tricky part, the rest is easy. The first CTE creates a row with each date, adding the array index (in day) to the start_day:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
|
From there on, you can just create whatever column you feel like. Quarter column? floor(1+ month(d)/4) as quarter
. Long name of the week? date_format(d, 'EEEE') as dayname_of_week_long
.
As a bonus, I give you the same table but with hours added. The principles are exactly the same, with a cartesian join beween dates and hour:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
|