Scala 造資料指令碼,方便Spark做測試用
苦於spark 無資料可測試,於是就動手寫了些scala 程式用來造百G 或更多的資料,以方便spark sql 做測試使用,之前在某影視公司面試的面試題資料結構,我就按這個來進行造資料。結構一共6個欄位:DataStructure("ID","Username","Userage","PhoneType,"Click","LoginTime")
資料預覽:
1,Role97,16,MI,13,2016-11-21
2,Role42,30,Meizu,15,2016-5-12
3,Role87,41,Apple,14,2016-3-5
4,Role59,21,Oppo,2,2016-3-8
5,Role26,54,MI,3,2016-4-23
6,Role27,32,Huawei,2,2016-3-18
7,Role22,15,Oppo,10,2016-5-12
8,Role64,31,Samsung,11,2016-10-29
9,Role7,46,Lenovo,5,2016-10-7
10,Role50,37,Nokia,5,2016-10-30
11,Role30,64,Samsung,9,2016-10-7
12,Role27,54,Samsung,5,2016-3-8
13,Role3,37,Samsung,4,2016-5-9
14,Role84,66,Meizu,5,2016-6-11
15,Role48,25,Oppo,0,2016-8-0
16,Role92,29,Meizu,5,2016-2-17
17,Role77,85,Oppo,7,2016-8-13
18,Role67,85,Samsung,4,2016-10-27
19,Role41,16,Nokia,13,2016-6-12
20,Role0,42,Apple,5,2016-10-18
21,Role64,85,Oppo,4,2016-2-11
22,Role27,85,Samsung,6,2016-1-10
23,Role84,59,Apple,17,2016-8-15
24,Role26,52,Samsung,0,2016-7-19
25,Role27,59,Meizu,8,2016-12-3
26,Role52,56,Apple,2,2016-12-20
以下為程式碼:
package main.scala.CreateData
import java.io.{FileWriter, Writer}
import scala.util.Random
/**
* Created by Zhao Qiang on 2016/12/8.
*/
object DataCreater {
private val datapath = "E://platform.txt"
private val max_records = 100
private val age = 70
private val brand = Array("Huawei","MI","Apple","Samsung","Meizu","Lenovo","Oppo","Nokia")
// define a method to make data
def Creater(): Unit ={
val rand = new Random()
val writer: FileWriter = new FileWriter(datapath,true)
// create age of data
for(i <- 1 to max_records){
var dataage = rand.nextInt(age)
if (dataage < 15){dataage = age + 15}
//create phonePlus of data
var phonePlus = brand(rand.nextInt(8))
//create clicks of data
var clicks = rand.nextInt(20)
//create users of data
var name = "Role"+ rand.nextInt(100).toString
//println(name)
var months = rand.nextInt(12)+1
var logintime = "2016" + "-" + months + "-" + rand.nextInt(31)
//println(logintime)
//DataStructure("ID","Username","Userage","PhoneType,"Click","LoginTime")
writer.write(i + "," + name + "," + dataage + "," + phonePlus + "," + clicks + "," + logintime)
writer.write(System.getProperty("line.separator"))
}
writer.flush()
writer.close()
}
def main(args: Array[String]): Unit = {
Creater()
System.exit(1)
}
}
修改max_records 為你想要的資料大小,就可以生成指定的資料了