第六屆中國軟體杯WIFI探針資料分析
Hello Spark_WIFIProbe_Analyse
基於Hadoop,Spark的WIFI探針大資料分析。
使用Scala語言 版本2.11.0
匯入資料庫,tanzhen
把lib包內容放入/home/example中
//分析資料
spark-submit --master spark://master:7077 --name DataAnalyse --class DataAnalyse --executor-memory 1G --total-executor-cores 2 --jars /home/examples/mysql.jar /home/examples/WIFIAnalyse .jar
//存json資料
spark-submit --master spark://master:7077 --name JsonTanZhen --class JsonTanZhen --executor-memory 1G --total-executor-cores 2 --jars /home/examples/mysql.jar /home/examples/WIFIAnalyse.jar hdfs://master:55555/input/data*.txt
探針資料分析–完整版
1客流量:店鋪或區域整體客流及趨勢
2入店量:進入店鋪或區域的客流及趨勢
3入店率:進⼊店鋪或區域的客流佔全部客流的比例及趨勢
4駐店時長:進⼊店鋪的顧客在店內的停留時長
5跳出率:進⼊店鋪後很快離店的顧客及佔比(佔總體客流)
6深訪率:進⼊店鋪深度訪問的顧客及佔⽐(佔總體客流)(可以根據定位軌跡或者停留時長判定)
7新老顧客:一定時間段內首次/兩次以上進⼊店鋪的顧客
8來訪週期:進⼊店鋪或區域的顧客距離上次來店的間隔
9顧客活躍度:按顧客距離上次來訪間隔,劃分為不同活躍度(高活躍度、中活躍度、低活躍度、沉睡活躍度)
客流量
1.id店鋪昨日客流量,sql=”select count(distinct mac) as count from data where to_days(now())-to_days(time) = 1 and id=?”
2.id店鋪七天客流量,sql=”select count(distinct mac) as count from data where DATE_SUB(CURDATE(), INTERVAL 7 DAY) <= date(time) and id=?”
3.id店鋪月客流量,sql=”select count(distinct mac) as count from data where DATE_SUB(CURDATE(), INTERVAL 30 DAY) <= date(time) and id=?”
4.id店鋪上月客流量,sql=”select count(distinct mac) as count from data where PERIOD_DIFF(date_format(now(),’%Y%m’) , date_format(time, ‘%Y%m’ ) ) =1 and id=?”
入店量
5.id店鋪昨日入店量,sql=”select count(distinct mac) as count from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=?”
6.id店鋪七日入店量,sql=”select count(distinct mac) as count from data where DATE_SUB(CURDATE(), INTERVAL 7 DAY) <= date(time) and ranges<=300 and id=?”
7.id店鋪月入店量,sql=”select count(distinct mac) as count from data where DATE_SUB(CURDATE(), INTERVAL 30 DAY) <= date(time) and ranges<=300 and id=?”
8.id店鋪上月入店量,sql=”select count(distinct mac) as count from data where PERIOD_DIFF(date_format(now(),’%Y%m’) , date_format(time, ‘%Y%m’ ) ) =1 and ranges<=300 and id=?”
入店率
客戶端計算
駐店時長
9.id店鋪昨日使用者停留時間分段,sql=”select case when cha>=0 and cha<15 then ‘a’ when cha>=15 and cha<30 then ‘b’ when cha>=30 and cha<45 then ‘c’ when cha>=45 and cha<60 then ‘d’ when cha>=60 then ‘e’ end as type,count(*) as count from (SELECT max(minute(time))-min(minute(time)) as cha from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? GROUP by mac) as total group by (case when cha>=0 and cha<15 then ‘a’ when cha>=15 and cha<30 then ‘b’ when cha>=30 and cha<45 then ‘c’ when cha>=45 and cha<60 then ‘d’ when cha>=60 then ‘e’ end)”
新老顧客
10.id店鋪昨日老顧客,sql=”select count(* )as count from (select mac,count(*) as count from
(select mac from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) > 1 and ranges<=300 and id=? group by mac)as total group by mac having count>1)as totals”
11.id店鋪昨日新顧客,sql=”select count(* )as count from (select mac,count(*) as count from
(select mac from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) > 1 and ranges<=300 and id=? group by mac)as total group by mac having count=1)as totals”
跳出人數
12.id店鋪昨日跳出率,sql=”select count(*) as count from (SELECT max(minute(time))-min(minute(time)) as cha from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? GROUP by mac)as total where cha>=0 and cha<5”
深訪人數
13.id店鋪昨日深訪率,sql=”select count(*) as count from (SELECT max(minute(time))-min(minute(time)) as cha from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? GROUP by mac)as total where cha>=30”
來訪週期
14.id店鋪來訪週期,
前七天活躍度,每天的人數
sql=”select case when cha>=1 and cha<2 then ‘a’ when cha>=2 and cha<3 then ‘b’ when cha>=3 and cha<4 then ‘c’ when cha>=4 and cha<7 then ‘d’ when cha>=7 then ‘e’ end type,count(* ) as count from (select mac,count(*) as cha from
(select mac from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 2 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 3 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 4 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 5 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 6 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 7 and ranges<=300 and id=? group by mac)as total group by mac)as totals group by (case when cha>=1 and cha<2 then ‘a’ when cha>=2 and cha<3 then ‘b’ when cha>=3 and cha<4 then ‘c’ when cha>=4 and cha<7 then ‘d’ when cha>=7 then ‘e’ end)”
sql=”select mac,count(*) as count from
(select mac from data where to_days(now())-to_days(time) = 1 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 2 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 3 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 4 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 5 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 6 and ranges<=300 and id=? group by mac
union all
select mac from data where to_days(now())-to_days(time) = 7 and ranges<=300 and id=? group by mac)as total group by mac”
顧客活躍度
客戶端計算