Spark SQL CLI 執行
阿新 • • 發佈:2019-01-15
1:執行 ./bin/spark-sql
需要先把hive-site.xml 負責到spark的conf目錄下
提示編譯的時候要帶2個引數[[email protected] spark-1.2.0-bin-2.4.1]$ ./bin/spark-sql Spark assembly has been built with Hive, including Datanucleus jars on classpath java.lang.ClassNotFoundException: org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:425) at java.lang.ClassLoader.loadClass(ClassLoader.java:358) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:270) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:342) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Failed to load main class org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver. You need to build Spark with -Phive and -Phive-thriftserver. Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
重新編譯:./make-distribution.sh --tgz -Phadoop-2.4 -Pyarn -DskipTests -Dhadoop.version=2.4.1 -Phive -Phive-thriftserver
2:再次執行:
發現找不到mysql的jdbc包[[email protected] spark-1.2.0-bin-2.4.1]$ ./bin/spark-sql Spark assembly has been built with Hive, including Datanucleus jars on classpath log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell). log4j:WARN Please initialize the log4j system properly. log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info. Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:346) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:101) at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:358) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:75) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1412) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465) at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340) ... 9 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410) ... 14 more Caused by: javax.jdo.JDOFatalInternalException: Error creating transactional connection factory NestedThrowables: java.lang.reflect.InvocationTargetException at org.datanucleus.api.jdo.NucleusJDOHelper.getJDOExceptionForNucleusException(NucleusJDOHelper.java:587) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:788) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.createPersistenceManagerFactory(JDOPersistenceManagerFactory.java:333) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.getPersistenceManagerFactory(JDOPersistenceManagerFactory.java:202) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at javax.jdo.JDOHelper$16.run(JDOHelper.java:1965) at java.security.AccessController.doPrivileged(Native Method) at javax.jdo.JDOHelper.invoke(JDOHelper.java:1960) at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1166) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:310) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:339) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:248) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:223) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.metastore.RawStoreProxy.<init>(RawStoreProxy.java:58) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:67) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:497) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:475) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:523) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:397) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.<init>(HiveMetaStore.java:356) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.<init>(RetryingHMSHandler.java:54) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:59) at org.apache.hadoop.hive.metastore.HiveMetaStore.newHMSHandler(HiveMetaStore.java:4944) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:171) ... 19 more Caused by: java.lang.reflect.InvocationTargetException at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:325) at org.datanucleus.store.AbstractStoreManager.registerConnectionFactory(AbstractStoreManager.java:282) at org.datanucleus.store.AbstractStoreManager.<init>(AbstractStoreManager.java:240) at org.datanucleus.store.rdbms.RDBMSStoreManager.<init>(RDBMSStoreManager.java:286) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.datanucleus.plugin.NonManagedPluginRegistry.createExecutableExtension(NonManagedPluginRegistry.java:631) at org.datanucleus.plugin.PluginManager.createExecutableExtension(PluginManager.java:301) at org.datanucleus.NucleusContext.createStoreManagerForProperties(NucleusContext.java:1187) at org.datanucleus.NucleusContext.initialise(NucleusContext.java:356) at org.datanucleus.api.jdo.JDOPersistenceManagerFactory.freezeConfiguration(JDOPersistenceManagerFactory.java:775) ... 48 more Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "dbcp-builtin" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver. at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:259) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.initialiseDataSources(ConnectionFactoryImpl.java:131) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.<init>(ConnectionFactoryImpl.java:85) ... 66 more Caused by: org.datanucleus.store.rdbms.connectionpool.DatastoreDriverNotFoundException: The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver. at org.datanucleus.store.rdbms.connectionpool.AbstractConnectionPoolFactory.loadDriver(AbstractConnectionPoolFactory.java:58) at org.datanucleus.store.rdbms.connectionpool.DBCPBuiltinConnectionPoolFactory.createConnectionPool(DBCPBuiltinConnectionPoolFactory.java:49) at org.datanucleus.store.rdbms.ConnectionFactoryImpl.generateDataSources(ConnectionFactoryImpl.java:238) ... 68 more Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
3:載入jdbc執行
[[email protected] spark-1.2.0-bin-2.4.1]$ ./bin/spark-sql --driver-class-path /home/jifeng/hadoop/spark-1.2.0-bin-2.4.1/lib/mysql-connector-java-5.1.32-bin.jar
Spark assembly has been built with Hive, including Datanucleus jars on classpath
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/03/04 14:30:01 INFO SecurityManager: Changing view acls to: jifeng
15/03/04 14:30:01 INFO SecurityManager: Changing modify acls to: jifeng
15/03/04 14:30:01 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/03/04 14:30:02 INFO Slf4jLogger: Slf4jLogger started
15/03/04 14:30:02 INFO Remoting: Starting remoting
15/03/04 14:30:02 INFO Remoting: Remoting started; listening on addresses :[akka.tcp:// [email protected]:53897]
15/03/04 14:30:02 INFO Utils: Successfully started service 'sparkDriver' on port 53897.
15/03/04 14:30:02 INFO SparkEnv: Registering MapOutputTracker
15/03/04 14:30:02 INFO SparkEnv: Registering BlockManagerMaster
15/03/04 14:30:02 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150304143002-0534
15/03/04 14:30:02 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
15/03/04 14:30:02 INFO HttpFileServer: HTTP File server directory is /tmp/spark-5172cadb-24ba-4eaa-b78b-cefb2e0ffb5b
15/03/04 14:30:02 INFO HttpServer: Starting HTTP Server
15/03/04 14:30:02 INFO Utils: Successfully started service 'HTTP file server' on port 55683.
15/03/04 14:30:03 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/04 14:30:03 INFO SparkUI: Started SparkUI at http://feng02:4040
15/03/04 14:30:03 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://[email protected]:53897/user/HeartbeatReceiver
15/03/04 14:30:03 INFO NettyBlockTransferService: Server created on 39062
15/03/04 14:30:03 INFO BlockManagerMaster: Trying to register BlockManager
15/03/04 14:30:03 INFO BlockManagerMasterActor: Registering block manager localhost:39062 with 267.3 MB RAM, BlockManagerId(<driver>, localhost, 39062)
15/03/04 14:30:03 INFO BlockManagerMaster: Registered BlockManager
SET spark.sql.hive.version=0.13.1
15/03/04 14:30:04 INFO HiveMetaStore: 0: get_all_databases
15/03/04 14:30:04 INFO audit: ugi=jifeng ip=unknown-ip-addr cmd=get_all_databases
15/03/04 14:30:04 INFO HiveMetaStore: 0: get_functions: db=default pat=*
15/03/04 14:30:04 INFO audit: ugi=jifeng ip=unknown-ip-addr cmd=get_functions: db=default pat=*
15/03/04 14:30:04 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
4:操作:show tables;
spark-sql> show tables;
15/03/04 14:39:23 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
15/03/04 14:39:23 INFO ParseDriver: Parsing command: show tables
15/03/04 14:39:23 INFO ParseDriver: Parse Completed
15/03/04 14:39:24 INFO PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO Driver: Concurrency mode is disabled, not creating a lock manager
15/03/04 14:39:24 INFO PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO ParseDriver: Parsing command: show tables
15/03/04 14:39:24 INFO ParseDriver: Parse Completed
15/03/04 14:39:24 INFO PerfLogger: </PERFLOG method=parse start=1425451164380 end=1425451164381 duration=1 from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO Driver: Semantic Analysis Completed
15/03/04 14:39:24 INFO PerfLogger: </PERFLOG method=semanticAnalyze start=1425451164381 end=1425451164458 duration=77 from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO ListSinkOperator: Initializing Self 0 OP
15/03/04 14:39:24 INFO ListSinkOperator: Operator 0 OP initialized
15/03/04 14:39:24 INFO ListSinkOperator: Initialization Done 0 OP
15/03/04 14:39:24 INFO Driver: Returning Hive schema: Schema(fieldSchemas:[FieldSchema(name:tab_name, type:string, comment:from deserializer)], properties:null)
15/03/04 14:39:24 INFO PerfLogger: </PERFLOG method=compile start=1425451164361 end=1425451164622 duration=261 from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO Driver: Starting command: show tables
15/03/04 14:39:24 INFO PerfLogger: </PERFLOG method=TimeToSubmit start=1425451164358 end=1425451164625 duration=267 from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO HiveMetaStore: 0: get_database: default
15/03/04 14:39:24 INFO audit: ugi=jifeng ip=unknown-ip-addr cmd=get_database: default
15/03/04 14:39:24 INFO HiveMetaStore: 0: get_tables: db=default pat=.*
15/03/04 14:39:24 INFO audit: ugi=jifeng ip=unknown-ip-addr cmd=get_tables: db=default pat=.*
15/03/04 14:39:24 INFO PerfLogger: </PERFLOG method=runTasks start=1425451164625 end=1425451164660 duration=35 from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO PerfLogger: </PERFLOG method=Driver.execute start=1425451164622 end=1425451164660 duration=38 from=org.apache.hadoop.hive.ql.Driver>
OK
15/03/04 14:39:24 INFO Driver: OK
15/03/04 14:39:24 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO PerfLogger: </PERFLOG method=releaseLocks start=1425451164660 end=1425451164660 duration=0 from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO PerfLogger: </PERFLOG method=Driver.run start=1425451164358 end=1425451164660 duration=302 from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:24 INFO deprecation: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir
15/03/04 14:39:24 INFO FileInputFormat: Total input paths to process : 1
15/03/04 14:39:24 INFO ListSinkOperator: 0 finished. closing...
15/03/04 14:39:24 INFO ListSinkOperator: 0 Close done
15/03/04 14:39:24 INFO SparkContext: Starting job: collect at SparkPlan.scala:84
15/03/04 14:39:24 INFO DAGScheduler: Got job 0 (collect at SparkPlan.scala:84) with 1 output partitions (allowLocal=false)
15/03/04 14:39:24 INFO DAGScheduler: Final stage: Stage 0(collect at SparkPlan.scala:84)
15/03/04 14:39:24 INFO DAGScheduler: Parents of final stage: List()
15/03/04 14:39:24 INFO DAGScheduler: Missing parents: List()
15/03/04 14:39:24 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[2] at map at SparkPlan.scala:84), which has no missing parents
15/03/04 14:39:25 INFO MemoryStore: ensureFreeSpace(2560) called with curMem=0, maxMem=280248975
15/03/04 14:39:25 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 2.5 KB, free 267.3 MB)
15/03/04 14:39:25 INFO MemoryStore: ensureFreeSpace(1561) called with curMem=2560, maxMem=280248975
15/03/04 14:39:25 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1561.0 B, free 267.3 MB)
15/03/04 14:39:25 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:39062 (size: 1561.0 B, free: 267.3 MB)
15/03/04 14:39:25 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/04 14:39:25 INFO SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:838
15/03/04 14:39:25 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (MappedRDD[2] at map at SparkPlan.scala:84)
15/03/04 14:39:25 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/03/04 14:39:25 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, PROCESS_LOCAL, 1378 bytes)
15/03/04 14:39:25 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/03/04 14:39:25 INFO Executor: Finished task 0.0 in stage 0.0 (TID 0). 744 bytes result sent to driver
15/03/04 14:39:25 INFO DAGScheduler: Stage 0 (collect at SparkPlan.scala:84) finished in 0.202 s
15/03/04 14:39:25 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 191 ms on localhost (1/1)
15/03/04 14:39:25 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/03/04 14:39:25 INFO DAGScheduler: Job 0 finished: collect at SparkPlan.scala:84, took 0.625512 s
15/03/04 14:39:25 INFO StatsReportListener: Finished stage: [email protected]
15/03/04 14:39:25 INFO StatsReportListener: task runtime:(count: 1, mean: 191.000000, stdev: 0.000000, max: 191.000000, min: 191.000000)
course
hbase_table_1
pokes
student
Time taken: 2.535 seconds
15/03/04 14:39:25 INFO CliDriver: Time taken: 2.535 seconds
15/03/04 14:39:25 INFO PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
15/03/04 14:39:25 INFO PerfLogger: </PERFLOG method=releaseLocks start=1425451165520 end=1425451165520 duration=0 from=org.apache.hadoop.hive.ql.Driver>
spark-sql> 15/03/04 14:39:25 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:39:25 INFO StatsReportListener: 191.0 ms 191.0 ms 191.0 ms 191.0 ms 191.0 ms191.0 ms 191.0 ms 191.0 ms 191.0 ms
15/03/04 14:39:25 INFO StatsReportListener: task result size:(count: 1, mean: 744.000000, stdev: 0.000000, max: 744.000000, min: 744.000000)
15/03/04 14:39:25 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:39:25 INFO StatsReportListener: 744.0 B 744.0 B 744.0 B 744.0 B 744.0 B 744.0 B 744.0 B 744.0 B 744.0 B
15/03/04 14:39:25 INFO StatsReportListener: executor (non-fetch) time pct: (count: 1, mean: 8.900524, stdev: 0.000000, max: 8.900524, min: 8.900524)
15/03/04 14:39:25 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:39:25 INFO StatsReportListener: 9 % 9 % 9 % 9 % 9 % 9 % 9 % 9 % 9 %
15/03/04 14:39:25 INFO StatsReportListener: other time pct: (count: 1, mean: 91.099476, stdev: 0.000000, max: 91.099476, min: 91.099476)
15/03/04 14:39:25 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:39:25 INFO StatsReportListener: 91 % 91 % 91 % 91 % 91 % 91 % 91 % 91 % 91 %
select * from student;
> select * from student;
15/03/04 14:42:05 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
15/03/04 14:42:05 INFO ParseDriver: Parsing command: select * from student
15/03/04 14:42:05 INFO ParseDriver: Parse Completed
15/03/04 14:42:05 INFO HiveMetaStore: 0: get_table : db=default tbl=student
15/03/04 14:42:05 INFO audit: ugi=jifeng ip=unknown-ip-addr cmd=get_table : db=default tbl=student
15/03/04 14:42:05 INFO deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/03/04 14:42:05 INFO MemoryStore: ensureFreeSpace(444571) called with curMem=4121, maxMem=280248975
15/03/04 14:42:05 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 434.2 KB, free 266.8 MB)
15/03/04 14:42:05 INFO MemoryStore: ensureFreeSpace(47070) called with curMem=448692, maxMem=280248975
15/03/04 14:42:05 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 46.0 KB, free 266.8 MB)
15/03/04 14:42:05 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on localhost:39062 (size: 46.0 KB, free: 267.2 MB)
15/03/04 14:42:05 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/03/04 14:42:05 INFO SparkContext: Created broadcast 1 from broadcast at TableReader.scala:68
15/03/04 14:42:06 INFO BlockManager: Removing broadcast 0
15/03/04 14:42:06 INFO BlockManager: Removing block broadcast_0
15/03/04 14:42:06 INFO MemoryStore: Block broadcast_0 of size 2560 dropped from memory (free 279755773)
15/03/04 14:42:06 INFO BlockManager: Removing block broadcast_0_piece0
15/03/04 14:42:06 INFO MemoryStore: Block broadcast_0_piece0 of size 1561 dropped from memory (free 279757334)
15/03/04 14:42:06 INFO BlockManagerInfo: Removed broadcast_0_piece0 on localhost:39062 in memory (size: 1561.0 B, free: 267.2 MB)
15/03/04 14:42:06 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/04 14:42:06 INFO ContextCleaner: Cleaned broadcast 0
15/03/04 14:42:06 INFO FileInputFormat: Total input paths to process : 1
15/03/04 14:42:06 INFO SparkContext: Starting job: collect at SparkPlan.scala:84
15/03/04 14:42:06 INFO DAGScheduler: Got job 1 (collect at SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
15/03/04 14:42:06 INFO DAGScheduler: Final stage: Stage 1(collect at SparkPlan.scala:84)
15/03/04 14:42:06 INFO DAGScheduler: Parents of final stage: List()
15/03/04 14:42:06 INFO DAGScheduler: Missing parents: List()
15/03/04 14:42:06 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[7] at map at SparkPlan.scala:84), which has no missing parents
15/03/04 14:42:06 INFO MemoryStore: ensureFreeSpace(8616) called with curMem=491641, maxMem=280248975
15/03/04 14:42:06 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 8.4 KB, free 266.8 MB)
15/03/04 14:42:06 INFO MemoryStore: ensureFreeSpace(5184) called with curMem=500257, maxMem=280248975
15/03/04 14:42:06 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 5.1 KB, free 266.8 MB)
15/03/04 14:42:06 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on localhost:39062 (size: 5.1 KB, free: 267.2 MB)
15/03/04 14:42:06 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/03/04 14:42:06 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
15/03/04 14:42:06 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[7] at map at SparkPlan.scala:84)
15/03/04 14:42:06 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
15/03/04 14:42:06 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, localhost, ANY, 1318 bytes)
15/03/04 14:42:06 INFO Executor: Running task 0.0 in stage 1.0 (TID 1)
15/03/04 14:42:06 INFO HadoopRDD: Input split: hdfs://feng01:9000/user/hive/warehouse/student/stu.txt:0+28
15/03/04 14:42:06 INFO deprecation: mapred.tip.id is deprecated. Instead, use mapreduce.task.id
15/03/04 14:42:06 INFO deprecation: mapred.task.id is deprecated. Instead, use mapreduce.task.attempt.id
15/03/04 14:42:06 INFO deprecation: mapred.task.is.map is deprecated. Instead, use mapreduce.task.ismap
15/03/04 14:42:06 INFO deprecation: mapred.task.partition is deprecated. Instead, use mapreduce.task.partition
15/03/04 14:42:06 INFO deprecation: mapred.job.id is deprecated. Instead, use mapreduce.job.id
15/03/04 14:42:06 INFO Executor: Finished task 0.0 in stage 1.0 (TID 1). 1843 bytes result sent to driver
15/03/04 14:42:06 INFO TaskSetManager: Starting task 1.0 in stage 1.0 (TID 2, localhost, ANY, 1318 bytes)
15/03/04 14:42:06 INFO Executor: Running task 1.0 in stage 1.0 (TID 2)
15/03/04 14:42:06 INFO HadoopRDD: Input split: hdfs://feng01:9000/user/hive/warehouse/student/stu.txt:28+29
15/03/04 14:42:06 INFO TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 267 ms on localhost (1/2)
15/03/04 14:42:06 INFO Executor: Finished task 1.0 in stage 1.0 (TID 2). 1828 bytes result sent to driver
15/03/04 14:42:06 INFO DAGScheduler: Stage 1 (collect at SparkPlan.scala:84) finished in 0.313 s
15/03/04 14:42:06 INFO StatsReportListener: Finished stage: [email protected]
15/03/04 14:42:06 INFO DAGScheduler: Job 1 finished: collect at SparkPlan.scala:84, took 0.338182 s
15/03/04 14:42:06 INFO StatsReportListener: task runtime:(count: 2, mean: 165.000000, stdev: 102.000000, max: 267.000000, min: 63.000000)
15/03/04 14:42:06 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:42:06 INFO StatsReportListener: 63.0 ms 63.0 ms 63.0 ms 63.0 ms 267.0 ms 267.0 ms 267.0 ms267.0 ms 267.0 ms
1 nick 24
2 doping 25
3 caizhi 26
4 liaozhi 27
5 wind 30
Time taken: 1.442 seconds
15/03/04 14:42:06 INFO CliDriver: Time taken: 1.442 seconds
spark-sql> 15/03/04 14:42:06 INFO StatsReportListener: task result size:(count: 2, mean: 1835.500000, stdev: 7.500000, max: 1843.000000, min: 1828.000000)
15/03/04 14:42:06 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:42:06 INFO StatsReportListener: 1828.0 B 1828.0 B 1828.0 B 1828.0 B 1843.0 B1843.0 B 1843.0 B 1843.0 B 1843.0 B
15/03/04 14:42:06 INFO TaskSetManager: Finished task 1.0 in stage 1.0 (TID 2) in 63 ms on localhost (2/2)
15/03/04 14:42:06 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool
15/03/04 14:42:06 INFO StatsReportListener: executor (non-fetch) time pct: (count: 2, mean: 73.640093, stdev: 5.386124, max: 79.026217, min: 68.253968)
15/03/04 14:42:06 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:42:06 INFO StatsReportListener: 68 % 68 % 68 % 68 % 79 % 79 % 79 % 79 % 79 %
15/03/04 14:42:06 INFO StatsReportListener: other time pct: (count: 2, mean: 26.359907, stdev: 5.386124, max: 31.746032, min: 20.973783)
15/03/04 14:42:06 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:42:06 INFO StatsReportListener: 21 % 21 % 21 % 21 % 32 % 32 % 32 % 32 % 32 %
JOIN操作select a.*,b.* from student a join course b where a.id=b.id ;
select a.*,b.* from student a join course b where a.id=b.id ;
15/03/04 14:45:18 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
15/03/04 14:45:18 INFO ParseDriver: Parsing command: select a.*,b.* from student a join course b where a.id=b.id
15/03/04 14:45:18 INFO ParseDriver: Parse Completed
15/03/04 14:45:18 INFO HiveMetaStore: 0: get_table : db=default tbl=student
15/03/04 14:45:18 INFO audit: ugi=jifeng ip=unknown-ip-addr cmd=get_table : db=default tbl=student
15/03/04 14:45:18 INFO HiveMetaStore: 0: get_table : db=default tbl=course
15/03/04 14:45:18 INFO audit: ugi=jifeng ip=unknown-ip-addr cmd=get_table : db=default tbl=course
15/03/04 14:45:19 INFO MemoryStore: ensureFreeSpace(444571) called with curMem=1014900, maxMem=280248975
15/03/04 14:45:19 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 434.2 KB, free 265.9 MB)
15/03/04 14:45:19 INFO MemoryStore: ensureFreeSpace(47070) called with curMem=1459471, maxMem=280248975
15/03/04 14:45:19 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 46.0 KB, free 265.8 MB)
15/03/04 14:45:19 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on localhost:39062 (size: 46.0 KB, free: 267.1 MB)
15/03/04 14:45:19 INFO BlockManagerMaster: Updated info of block broadcast_8_piece0
15/03/04 14:45:19 INFO SparkContext: Created broadcast 8 from broadcast at TableReader.scala:68
15/03/04 14:45:19 INFO MemoryStore: ensureFreeSpace(444627) called with curMem=1506541, maxMem=280248975
15/03/04 14:45:19 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 434.2 KB, free 265.4 MB)
15/03/04 14:45:19 INFO MemoryStore: ensureFreeSpace(47091) called with curMem=1951168, maxMem=280248975
15/03/04 14:45:19 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 46.0 KB, free 265.4 MB)
15/03/04 14:45:19 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on localhost:39062 (size: 46.0 KB, free: 267.1 MB)
15/03/04 14:45:19 INFO BlockManagerMaster: Updated info of block broadcast_9_piece0
15/03/04 14:45:19 INFO SparkContext: Created broadcast 9 from broadcast at TableReader.scala:68
15/03/04 14:45:19 INFO FileInputFormat: Total input paths to process : 1
15/03/04 14:45:19 INFO SparkContext: Starting job: collect at BroadcastHashJoin.scala:53
15/03/04 14:45:19 INFO DAGScheduler: Got job 4 (collect at BroadcastHashJoin.scala:53) with 2 output partitions (allowLocal=false)
15/03/04 14:45:19 INFO DAGScheduler: Final stage: Stage 4(collect at BroadcastHashJoin.scala:53)
15/03/04 14:45:19 INFO DAGScheduler: Parents of final stage: List()
15/03/04 14:45:19 INFO DAGScheduler: Missing parents: List()
15/03/04 14:45:19 INFO DAGScheduler: Submitting Stage 4 (MappedRDD[23] at map at BroadcastHashJoin.scala:53), which has no missing parents
15/03/04 14:45:19 INFO MemoryStore: ensureFreeSpace(6552) called with curMem=1998259, maxMem=280248975
15/03/04 14:45:19 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 6.4 KB, free 265.4 MB)
15/03/04 14:45:19 INFO MemoryStore: ensureFreeSpace(3857) called with curMem=2004811, maxMem=280248975
15/03/04 14:45:19 INFO MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 3.8 KB, free 265.4 MB)
15/03/04 14:45:19 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on localhost:39062 (size: 3.8 KB, free: 267.1 MB)
15/03/04 14:45:19 INFO BlockManagerMaster: Updated info of block broadcast_10_piece0
15/03/04 14:45:19 INFO SparkContext: Created broadcast 10 from broadcast at DAGScheduler.scala:838
15/03/04 14:45:19 INFO DAGScheduler: Submitting 2 missing tasks from Stage 4 (MappedRDD[23] at map at BroadcastHashJoin.scala:53)
15/03/04 14:45:19 INFO TaskSchedulerImpl: Adding task set 4.0 with 2 tasks
15/03/04 14:45:19 INFO TaskSetManager: Starting task 0.0 in stage 4.0 (TID 7, localhost, ANY, 1320 bytes)
15/03/04 14:45:19 INFO Executor: Running task 0.0 in stage 4.0 (TID 7)
15/03/04 14:45:19 INFO HadoopRDD: Input split: hdfs://feng01:9000/user/hive/warehouse/course/course.txt:0+60
15/03/04 14:45:19 WARN LazyStruct: Extra bytes detected at the end of the row! Ignoring similar problems.
15/03/04 14:45:19 INFO Executor: Finished task 0.0 in stage 4.0 (TID 7). 2091 bytes result sent to driver
15/03/04 14:45:19 INFO TaskSetManager: Starting task 1.0 in stage 4.0 (TID 8, localhost, ANY, 1320 bytes)
15/03/04 14:45:19 INFO Executor: Running task 1.0 in stage 4.0 (TID 8)
15/03/04 14:45:19 INFO HadoopRDD: Input split: hdfs://feng01:9000/user/hive/warehouse/course/course.txt:60+61
15/03/04 14:45:19 INFO TaskSetManager: Finished task 0.0 in stage 4.0 (TID 7) in 26 ms on localhost (1/2)
15/03/04 14:45:19 INFO Executor: Finished task 1.0 in stage 4.0 (TID 8). 2044 bytes result sent to driver
15/03/04 14:45:19 INFO DAGScheduler: Stage 4 (collect at BroadcastHashJoin.scala:53) finished in 0.042 s
15/03/04 14:45:19 INFO StatsReportListener: Finished stage: [email protected]
15/03/04 14:45:19 INFO StatsReportListener: task runtime:(count: 2, mean: 25.500000, stdev: 0.500000, max: 26.000000, min: 25.000000)
15/03/04 14:45:19 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:45:19 INFO StatsReportListener: 25.0 ms 25.0 ms 25.0 ms 25.0 ms 26.0 ms 26.0 ms 26.0 ms 26.0 ms 26.0 ms
15/03/04 14:45:19 INFO DAGScheduler: Job 4 finished: collect at BroadcastHashJoin.scala:53, took 0.075328 s
15/03/04 14:45:19 INFO MemoryStore: ensureFreeSpace(2264) called with curMem=2008668, maxMem=280248975
15/03/04 14:45:19 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 2.2 KB, free 265.3 MB)
15/03/04 14:45:19 INFO StatsReportListener: task result size:(count: 2, mean: 2067.500000, stdev: 23.500000, max: 2091.000000, min: 2044.000000)
15/03/04 14:45:19 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:45:19 INFO StatsReportListener: 2044.0 B 2044.0 B 2044.0 B 2044.0 B 2.0 KB 2.0 KB 2.0 KB 2.0 KB 2.0 KB
15/03/04 14:45:19 INFO TaskSetManager: Finished task 1.0 in stage 4.0 (TID 8) in 25 ms on localhost (2/2)
15/03/04 14:45:19 INFO TaskSchedulerImpl: Removed TaskSet 4.0, whose tasks have all completed, from pool
15/03/04 14:45:19 INFO StatsReportListener: executor (non-fetch) time pct: (count: 2, mean: 51.153846, stdev: 8.846154, max: 60.000000, min: 42.307692)
15/03/04 14:45:19 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:45:19 INFO StatsReportListener: 42 % 42 % 42 % 42 % 60 % 60 % 60 % 60 % 60 %
15/03/04 14:45:19 INFO StatsReportListener: other time pct: (count: 2, mean: 48.846154, stdev: 8.846154, max: 57.692308, min: 40.000000)
15/03/04 14:45:19 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:45:19 INFO StatsReportListener: 40 % 40 % 40 % 40 % 58 % 58 % 58 % 58 % 58 %
15/03/04 14:45:19 INFO MemoryStore: ensureFreeSpace(350) called with curMem=2010932, maxMem=280248975
15/03/04 14:45:19 INFO MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 350.0 B, free 265.3 MB)
15/03/04 14:45:19 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on localhost:39062 (size: 350.0 B, free: 267.1 MB)
15/03/04 14:45:19 INFO BlockManagerMaster: Updated info of block broadcast_11_piece0
15/03/04 14:45:19 INFO SparkContext: Created broadcast 11 from broadcast at BroadcastHashJoin.scala:55
15/03/04 14:45:19 INFO FileInputFormat: Total input paths to process : 1
15/03/04 14:45:19 INFO SparkContext: Starting job: collect at SparkPlan.scala:84
15/03/04 14:45:19 INFO DAGScheduler: Got job 5 (collect at SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
15/03/04 14:45:19 INFO DAGScheduler: Final stage: Stage 5(collect at SparkPlan.scala:84)
15/03/04 14:45:19 INFO DAGScheduler: Parents of final stage: List()
15/03/04 14:45:19 INFO DAGScheduler: Missing parents: List()
15/03/04 14:45:19 INFO DAGScheduler: Submitting Stage 5 (MappedRDD[29] at map at SparkPlan.scala:84), which has no missing parents
15/03/04 14:45:19 INFO MemoryStore: ensureFreeSpace(11896) called with curMem=2011282, maxMem=280248975
15/03/04 14:45:19 INFO MemoryStore: Block broadcast_12 stored as values in memory (estimated size 11.6 KB, free 265.3 MB)
15/03/04 14:45:19 INFO MemoryStore: ensureFreeSpace(6623) called with curMem=2023178, maxMem=280248975
15/03/04 14:45:19 INFO MemoryStore: Block broadcast_12_piece0 stored as bytes in memory (estimated size 6.5 KB, free 265.3 MB)
15/03/04 14:45:19 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on localhost:39062 (size: 6.5 KB, free: 267.1 MB)
15/03/04 14:45:19 INFO BlockManagerMaster: Updated info of block broadcast_12_piece0
15/03/04 14:45:19 INFO SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:838
15/03/04 14:45:19 INFO DAGScheduler: Submitting 2 missing tasks from Stage 5 (MappedRDD[29] at map at SparkPlan.scala:84)
15/03/04 14:45:19 INFO TaskSchedulerImpl: Adding task set 5.0 with 2 tasks
15/03/04 14:45:19 INFO TaskSetManager: Starting task 0.0 in stage 5.0 (TID 9, localhost, ANY, 1318 bytes)
15/03/04 14:45:19 INFO Executor: Running task 0.0 in stage 5.0 (TID 9)
15/03/04 14:45:19 INFO HadoopRDD: Input split: hdfs://feng01:9000/user/hive/warehouse/student/stu.txt:0+28
15/03/04 14:45:19 INFO Executor: Finished task 0.0 in stage 5.0 (TID 9). 1930 bytes result sent to driver
15/03/04 14:45:19 INFO TaskSetManager: Starting task 1.0 in stage 5.0 (TID 10, localhost, ANY, 1318 bytes)
15/03/04 14:45:19 INFO Executor: Running task 1.0 in stage 5.0 (TID 10)
15/03/04 14:45:19 INFO TaskSetManager: Finished task 0.0 in stage 5.0 (TID 9) in 27 ms on localhost (1/2)
15/03/04 14:45:19 INFO HadoopRDD: Input split: hdfs://feng01:9000/user/hive/warehouse/student/stu.txt:28+29
15/03/04 14:45:19 INFO Executor: Finished task 1.0 in stage 5.0 (TID 10). 1884 bytes result sent to driver
15/03/04 14:45:19 INFO DAGScheduler: Stage 5 (collect at SparkPlan.scala:84) finished in 0.066 s
15/03/04 14:45:19 INFO StatsReportListener: Finished stage: [email protected]
15/03/04 14:45:19 INFO DAGScheduler: Job 5 finished: collect at SparkPlan.scala:84, took 0.094749 s
1 nick 24 1 英語 中文 法文 日文
2 doping 25 2 中文 法文
3 caizhi 26 3 中文 法文 日文
4 liaozhi 27 4 中文 法文 拉丁
5 wind 30 5 中文 法文 德文
Time taken: 0.612 seconds
15/03/04 14:45:19 INFO CliDriver: Time taken: 0.612 seconds
spark-sql> 15/03/04 14:45:19 INFO StatsReportListener: task runtime:(count: 2, mean: 34.000000, stdev: 7.000000, max: 41.000000, min: 27.000000)
15/03/04 14:45:19 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:45:19 INFO StatsReportListener: 27.0 ms 27.0 ms 27.0 ms 27.0 ms 41.0 ms 41.0 ms 41.0 ms 41.0 ms 41.0 ms
15/03/04 14:45:19 INFO StatsReportListener: task result size:(count: 2, mean: 1907.000000, stdev: 23.000000, max: 1930.000000, min: 1884.000000)
15/03/04 14:45:19 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:45:19 INFO StatsReportListener: 1884.0 B 1884.0 B 1884.0 B 1884.0 B 1930.0 B1930.0 B 1930.0 B 1930.0 B 1930.0 B
15/03/04 14:45:19 INFO StatsReportListener: executor (non-fetch) time pct: (count: 2, mean: 71.725384, stdev: 8.762421, max: 80.487805, min: 62.962963)
15/03/04 14:45:19 INFO TaskSetManager: Finished task 1.0 in stage 5.0 (TID 10) in 41 ms on localhost (2/2)
15/03/04 14:45:19 INFO TaskSchedulerImpl: Removed TaskSet 5.0, whose tasks have all completed, from pool
15/03/04 14:45:19 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:45:19 INFO StatsReportListener: 63 % 63 % 63 % 63 % 80 % 80 % 80 % 80 % 80 %
15/03/04 14:45:19 INFO StatsReportListener: other time pct: (count: 2, mean: 28.274616, stdev: 8.762421, max: 37.037037, min: 19.512195)
15/03/04 14:45:19 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 14:45:19 INFO StatsReportListener: 20 % 20 % 20 % 20 % 37 % 37 % 37 % 37 % 37 %
5:叢集執行
是CLI啟動一個SparkSQL應用程式的引數,如果不設定--master的話,將在啟動spark-sql的機器以local方式執行,只能通過http://機器名:4040進行監控
./bin/spark-sql --master spark://feng02:7077 --driver-class-path /home/jifeng/hadoop/spark-1.2.0-bin-2.4.1/lib/mysql-connector-java-5.1.32-bin.jar
[[email protected] spark-1.2.0-bin-2.4.1]$ ./bin/spark-sql --master spark://feng02:7077 --driver-class-path /home/jifeng/hadoop/spark-1.2.0-bin-2.4.1/lib/mysql-connector-java-5.1.32-bin.jar
Spark assembly has been built with Hive, including Datanucleus jars on classpath
log4j:WARN No appenders could be found for logger (org.apache.hadoop.util.Shell).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/03/04 15:36:50 INFO SecurityManager: Changing view acls to: jifeng
15/03/04 15:36:50 INFO SecurityManager: Changing modify acls to: jifeng
15/03/04 15:36:50 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(jifeng); users with modify permissions: Set(jifeng)
15/03/04 15:36:50 INFO Slf4jLogger: Slf4jLogger started
15/03/04 15:36:50 INFO Remoting: Starting remoting
15/03/04 15:36:51 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://[email protected]:51996]
15/03/04 15:36:51 INFO Utils: Successfully started service 'sparkDriver' on port 51996.
15/03/04 15:36:51 INFO SparkEnv: Registering MapOutputTracker
15/03/04 15:36:51 INFO SparkEnv: Registering BlockManagerMaster
15/03/04 15:36:51 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150304153651-d089
15/03/04 15:36:51 INFO MemoryStore: MemoryStore started with capacity 267.3 MB
15/03/04 15:36:51 INFO HttpFileServer: HTTP File server directory is /tmp/spark-f3705b9a-d878-441f-8128-042e0363b2d3
15/03/04 15:36:51 INFO HttpServer: Starting HTTP Server
15/03/04 15:36:51 INFO Utils: Successfully started service 'HTTP file server' on port 47583.
15/03/04 15:36:51 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/04 15:36:51 INFO SparkUI: Started SparkUI at http://feng02:4040
15/03/04 15:36:51 INFO AppClient$ClientActor: Connecting to master spark://feng02:7077...
15/03/04 15:36:52 INFO SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150304153652-0001
15/03/04 15:36:52 INFO AppClient$ClientActor: Executor added: app-20150304153652-0001/0 on worker-20150304143413-feng03-41090 (feng03:41090) with 4 cores
15/03/04 15:36:52 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150304153652-0001/0 on hostPort feng03:41090 with 4 cores, 512.0 MB RAM
15/03/04 15:36:52 INFO AppClient$ClientActor: Executor added: app-20150304153652-0001/1 on worker-20150304143413-feng01-51840 (feng01:51840) with 4 cores
15/03/04 15:36:52 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150304153652-0001/1 on hostPort feng01:51840 with 4 cores, 512.0 MB RAM
15/03/04 15:36:52 INFO AppClient$ClientActor: Executor added: app-20150304153652-0001/2 on worker-20150304142824-feng02-56719 (feng02:56719) with 1 cores
15/03/04 15:36:52 INFO SparkDeploySchedulerBackend: Granted executor ID app-20150304153652-0001/2 on hostPort feng02:56719 with 1 cores, 512.0 MB RAM
15/03/04 15:36:52 INFO AppClient$ClientActor: Executor updated: app-20150304153652-0001/1 is now LOADING
15/03/04 15:36:52 INFO AppClient$ClientActor: Executor updated: app-20150304153652-0001/0 is now LOADING
15/03/04 15:36:52 INFO AppClient$ClientActor: Executor updated: app-20150304153652-0001/0 is now RUNNING
15/03/04 15:36:52 INFO AppClient$ClientActor: Executor updated: app-20150304153652-0001/2 is now LOADING
15/03/04 15:36:52 INFO AppClient$ClientActor: Executor updated: app-20150304153652-0001/1 is now RUNNING
15/03/04 15:36:52 INFO AppClient$ClientActor: Executor updated: app-20150304153652-0001/2 is now RUNNING
15/03/04 15:36:52 INFO NettyBlockTransferService: Server created on 44473
15/03/04 15:36:52 INFO BlockManagerMaster: Trying to register BlockManager
15/03/04 15:36:52 INFO BlockManagerMasterActor: Registering block manager feng02:44473 with 267.3 MB RAM, BlockManagerId(<driver>, feng02, 44473)
15/03/04 15:36:52 INFO BlockManagerMaster: Registered BlockManager
15/03/04 15:36:53 INFO SparkDeploySchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.0
SET spark.sql.hive.version=0.13.1
15/03/04 15:36:55 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:41011/user/Executor#987795855] with ID 0
15/03/04 15:36:55 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:34997/user/Executor#-664106945] with ID 1
15/03/04 15:36:55 INFO HiveMetaStore: 0: get_all_databases
15/03/04 15:36:55 INFO audit: ugi=jifeng ip=unknown-ip-addr cmd=get_all_databases
15/03/04 15:36:55 INFO BlockManagerMasterActor: Registering block manager feng03:42326 with 265.4 MB RAM, BlockManagerId(0, feng03, 42326)
15/03/04 15:36:55 INFO BlockManagerMasterActor: Registering block manager feng01:60774 with 265.4 MB RAM, BlockManagerId(1, feng01, 60774)
15/03/04 15:36:55 INFO HiveMetaStore: 0: get_functions: db=default pat=*
15/03/04 15:36:55 INFO audit: ugi=jifeng ip=unknown-ip-addr cmd=get_functions: db=default pat=*
15/03/04 15:36:55 INFO Datastore: The class "org.apache.hadoop.hive.metastore.model.MResourceUri" is tagged as "embedded-only" so does not have its own datastore table.
spark-sql> 15/03/04 15:36:58 INFO SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://[email protected]:45328/user/Executor#-2134631037] with ID 2
15/03/04 15:36:58 INFO BlockManagerMasterActor: Registering block manager feng02:60870 with 267.3 MB RAM, BlockManagerId(2, feng02, 60870)
>
資料量少的時候慢好多
> select a.*,b.* from student a join course b where a.id=b.id ;
15/03/04 15:44:53 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
15/03/04 15:44:53 INFO ParseDriver: Parsing command: select a.*,b.* from student a join course b where a.id=b.id
15/03/04 15:44:53 INFO ParseDriver: Parse Completed
15/03/04 15:44:55 INFO HiveMetaStore: 0: get_table : db=default tbl=student
15/03/04 15:44:55 INFO audit: ugi=jifeng ip=unknown-ip-addr cmd=get_table : db=default tbl=student
15/03/04 15:44:55 INFO HiveMetaStore: 0: get_table : db=default tbl=course
15/03/04 15:44:55 INFO audit: ugi=jifeng ip=unknown-ip-addr cmd=get_table : db=default tbl=course
15/03/04 15:44:55 INFO deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps
15/03/04 15:44:56 INFO MemoryStore: ensureFreeSpace(436469) called with curMem=0, maxMem=280248975
15/03/04 15:44:56 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 426.2 KB, free 266.9 MB)
15/03/04 15:44:56 INFO MemoryStore: ensureFreeSpace(46914) called with curMem=436469, maxMem=280248975
15/03/04 15:44:56 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 45.8 KB, free 266.8 MB)
15/03/04 15:44:56 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on feng02:44473 (size: 45.8 KB, free: 267.2 MB)
15/03/04 15:44:56 INFO BlockManagerMaster: Updated info of block broadcast_0_piece0
15/03/04 15:44:56 INFO SparkContext: Created broadcast 0 from broadcast at TableReader.scala:68
15/03/04 15:44:56 INFO MemoryStore: ensureFreeSpace(436525) called with curMem=483383, maxMem=280248975
15/03/04 15:44:56 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 426.3 KB, free 266.4 MB)
15/03/04 15:44:56 INFO MemoryStore: ensureFreeSpace(46946) called with curMem=919908, maxMem=280248975
15/03/04 15:44:56 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 45.8 KB, free 266.3 MB)
15/03/04 15:44:56 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on feng02:44473 (size: 45.8 KB, free: 267.2 MB)
15/03/04 15:44:56 INFO BlockManagerMaster: Updated info of block broadcast_1_piece0
15/03/04 15:44:56 INFO SparkContext: Created broadcast 1 from broadcast at TableReader.scala:68
15/03/04 15:44:57 WARN HiveConf: DEPRECATED: hive.metastore.ds.retry.* no longer has any effect. Use hive.hmshandler.retry.* instead
15/03/04 15:44:58 INFO FileInputFormat: Total input paths to process : 1
15/03/04 15:44:58 INFO SparkContext: Starting job: collect at BroadcastHashJoin.scala:53
15/03/04 15:44:58 INFO DAGScheduler: Got job 0 (collect at BroadcastHashJoin.scala:53) with 2 output partitions (allowLocal=false)
15/03/04 15:44:58 INFO DAGScheduler: Final stage: Stage 0(collect at BroadcastHashJoin.scala:53)
15/03/04 15:44:58 INFO DAGScheduler: Parents of final stage: List()
15/03/04 15:44:58 INFO DAGScheduler: Missing parents: List()
15/03/04 15:44:58 INFO DAGScheduler: Submitting Stage 0 (MappedRDD[4] at map at BroadcastHashJoin.scala:53), which has no missing parents
15/03/04 15:44:58 INFO MemoryStore: ensureFreeSpace(6544) called with curMem=966854, maxMem=280248975
15/03/04 15:44:58 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 6.4 KB, free 266.3 MB)
15/03/04 15:44:58 INFO MemoryStore: ensureFreeSpace(3848) called with curMem=973398, maxMem=280248975
15/03/04 15:44:58 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 3.8 KB, free 266.3 MB)
15/03/04 15:44:58 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on feng02:44473 (size: 3.8 KB, free: 267.2 MB)
15/03/04 15:44:58 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/03/04 15:44:58 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:838
15/03/04 15:44:58 INFO DAGScheduler: Submitting 2 missing tasks from Stage 0 (MappedRDD[4] at map at BroadcastHashJoin.scala:53)
15/03/04 15:44:58 INFO TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
15/03/04 15:44:58 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, feng02, NODE_LOCAL, 1320 bytes)
15/03/04 15:44:59 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on feng02:60870 (size: 3.8 KB, free: 267.3 MB)
15/03/04 15:45:01 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, feng01, ANY, 1320 bytes)
15/03/04 15:45:02 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on feng01:60774 (size: 3.8 KB, free: 265.4 MB)
15/03/04 15:45:02 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on feng02:60870 (size: 45.8 KB, free: 267.2 MB)
15/03/04 15:45:03 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on feng01:60774 (size: 45.8 KB, free: 265.4 MB)
15/03/04 15:45:06 INFO TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 4445 ms on feng01 (1/2)
15/03/04 15:45:07 INFO TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 9432 ms on feng02 (2/2)
15/03/04 15:45:07 INFO DAGScheduler: Stage 0 (collect at BroadcastHashJoin.scala:53) finished in 9.436 s
15/03/04 15:45:07 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/03/04 15:45:07 INFO StatsReportListener: Finished stage: [email protected]
15/03/04 15:45:07 INFO DAGScheduler: Job 0 finished: collect at BroadcastHashJoin.scala:53, took 9.687128 s
15/03/04 15:45:08 INFO MemoryStore: ensureFreeSpace(2264) called with curMem=977246, maxMem=280248975
15/03/04 15:45:08 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 2.2 KB, free 266.3 MB)
15/03/04 15:45:08 INFO MemoryStore: ensureFreeSpace(350) called with curMem=979510, maxMem=280248975
15/03/04 15:45:08 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 350.0 B, free 266.3 MB)
15/03/04 15:45:08 INFO BlockManager: Removing broadcast 2
15/03/04 15:45:08 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on feng02:44473 (size: 350.0 B, free: 267.2 MB)
15/03/04 15:45:08 INFO BlockManager: Removing block broadcast_2
15/03/04 15:45:08 INFO BlockManagerMaster: Updated info of block broadcast_3_piece0
15/03/04 15:45:08 INFO SparkContext: Created broadcast 3 from broadcast at BroadcastHashJoin.scala:55
15/03/04 15:45:08 INFO MemoryStore: Block broadcast_2 of size 6544 dropped from memory (free 279275659)
15/03/04 15:45:08 INFO BlockManager: Removing block broadcast_2_piece0
15/03/04 15:45:08 INFO MemoryStore: Block broadcast_2_piece0 of size 3848 dropped from memory (free 279279507)
15/03/04 15:45:08 INFO StatsReportListener: task runtime:(count: 2, mean: 6938.500000, stdev: 2493.500000, max: 9432.000000, min: 4445.000000)
15/03/04 15:45:08 INFO BlockManagerInfo: Removed broadcast_2_piece0 on feng02:44473 in memory (size: 3.8 KB, free: 267.2 MB)
15/03/04 15:45:08 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 15:45:08 INFO BlockManagerMaster: Updated info of block broadcast_2_piece0
15/03/04 15:45:08 INFO BlockManagerInfo: Removed broadcast_2_piece0 on feng01:60774 in memory (size: 3.8 KB, free: 265.4 MB)
15/03/04 15:45:08 INFO StatsReportListener: 4.4 s 4.4 s 4.4 s 4.4 s 9.4 s 9.4 s 9.4 s 9.4 s 9.4 s
15/03/04 15:45:08 INFO StatsReportListener: fetch wait time:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
15/03/04 15:45:08 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 15:45:08 INFO StatsReportListener: 0.0 ms 0.0 ms 0.0 ms 0.0 ms 0.0 ms 0.0 ms 0.0 ms 0.0 ms 0.0 ms
15/03/04 15:45:08 INFO StatsReportListener: remote bytes read:(count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
15/03/04 15:45:08 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 15:45:08 INFO StatsReportListener: 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B 0.0 B
15/03/04 15:45:08 INFO BlockManagerInfo: Removed broadcast_2_piece0 on feng02:60870 in memory (size: 3.8 KB, free: 267.2 MB)
15/03/04 15:45:08 INFO ContextCleaner: Cleaned broadcast 2
15/03/04 15:45:08 INFO StatsReportListener: task result size:(count: 2, mean: 2146.000000, stdev: 105.000000, max: 2251.000000, min: 2041.000000)
15/03/04 15:45:08 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 15:45:08 INFO StatsReportListener: 2041.0 B 2041.0 B 2041.0 B 2041.0 B 2.2 KB 2.2 KB 2.2 KB 2.2 KB 2.2 KB
15/03/04 15:45:08 INFO StatsReportListener: executor (non-fetch) time pct: (count: 2, mean: 96.539434, stdev: 1.668793, max: 98.208227, min: 94.870641)
15/03/04 15:45:08 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 15:45:08 INFO StatsReportListener: 95 % 95 % 95 % 95 % 98 % 98 % 98 % 98 % 98 %
15/03/04 15:45:08 INFO StatsReportListener: fetch wait time pct: (count: 1, mean: 0.000000, stdev: 0.000000, max: 0.000000, min: 0.000000)
15/03/04 15:45:08 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 15:45:08 INFO StatsReportListener: 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 % 0 %
15/03/04 15:45:08 INFO StatsReportListener: other time pct: (count: 2, mean: 3.460566, stdev: 1.668793, max: 5.129359, min: 1.791773)
15/03/04 15:45:08 INFO StatsReportListener: 0% 5% 10% 25% 50% 75% 90% 95% 100%
15/03/04 15:45:08 INFO StatsReportListener: 2 % 2 % 2 % 2 % 5 % 5 % 5 % 5 % 5 %
15/03/04 15:45:09 INFO FileInputFormat: Total input paths to process : 1
15/03/04 15:45:09 INFO SparkContext: Starting job: collect at SparkPlan.scala:84
15/03/04 15:45:09 INFO DAGScheduler: Got job 1 (collect at SparkPlan.scala:84) with 2 output partitions (allowLocal=false)
15/03/04 15:45:09 INFO DAGScheduler: Final stage: Stage 1(collect at SparkPlan.scala:84)
15/03/04 15:45:09 INFO DAGScheduler: Parents of final stage: List()
15/03/04 15:45:09 INFO DAGScheduler: Missing parents: List()
15/03/04 15:45:09 INFO DAGScheduler: Submitting Stage 1 (MappedRDD[10] at map at SparkPlan.scala:84), which has no missing parents
15/03/04 15:45:09 INFO MemoryStore: ensureFreeSpace(11896) called with curMem=969468, maxMem=280248975
15/03/04 15:45:09 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 11.6 KB, free 266.3 MB)
15/03/04 15:45:09 INFO MemoryStore: ensureFreeSpace(6628) called with curMem=981364, maxMem=280248975
15/03/04 15:45:09 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 6.5 KB, free 266.3 MB)
15/03/04 15:45:09 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on feng02:44473 (size: 6.5 KB, free: 267.2 MB)
15/03/04 15:45:09 INFO BlockManagerMaster: Updated info of block broadcast_4_piece0
15/03/04 15:45:09 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:838
15/03/04 15:45:09 INFO DAGScheduler: Submitting 2 missing tasks from Stage 1 (MappedRDD[10] at map at SparkPlan.scala:84)
15/03/04 15:45:09 INFO TaskSchedulerImpl: Adding task set 1.0 with 2 tasks
15/03/04 15:45:09 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 2, feng02, NODE_LOCAL, 1318 bytes)
15/03/04 15:45:09 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on feng02:60870 (size: 6.5 KB, free: 267.2 MB)
15/03/04 15:45:09 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on feng02:60870 (size: 45.8 KB, free: