device public set
backgroud: our dvertiser provide on device list of idfa to show ad to target audience,however none of the ad shows ,so we want to know how many public device id in our traffic request。
to find the public deviceid,we need to get all device id(idfa/google adid) in one day .
method1: use map reduce on azkaban ,however it failed .
method2: use hive tables; insert the deviceidlist to one table and join deviceids .
method3: select all distinct deviceids from request log and output as a file , about 0.2 billion deviceid list and file size 6G.
then use shell command just as this :
grep -F -f a.txt b.txt > public_ids.txt
then ,we get the public deviceids .
refer:http://blog.csdn.net/autofei/article/details/6579320
device public set