第三課:java開發hdfs
阿新 • • 發佈:2018-08-07
node 執行 需要 public conf iss import lean logging
(1)關於hdfs小結
hadoop由hdfs + yarn + map/reduce組成,
hdfs是數據庫存儲模塊,主要由1臺namenode和n臺datanode組成的一個集群系統,
datanode可以動態擴展,文件根據固定大小分塊(默認為128M),
每一塊數據默認存儲到3臺datanode,故意冗余存儲,防止某一臺datanode掛掉,數據不會丟失。
HDFS = NameNode + SecondaryNameNode + journalNode + DataNode
hdfs的典型應用就是:百度雲盤
(2)修改hadoop.tmp.dir默認值
hadoop.tmp.dir默認值為/tmp/hadoop-${user.name},由於/tmp目錄是系統重啟時候會被刪除,所以應該修改目錄位置。
修改core-site.xml(在所有節點上都修改)
[root@master ~]# vim core-site.xml
修改完namenode和datanode上的hadoop.tmp.dir參數後,需要格式化namenode,在master上執行:
[root@master ~]# hdfs namenode -format
(4)測試期間關閉權限檢查
為了簡單起見,需要關閉權限檢查,需要在namenode的hdfs-site.xml上,添加配置:
<property> <name>dfs.permissions.enabled</name> <value>false</value> </property>
重新啟動namenode:
[root@master ~]# hadoop-daemon.sh stop namenode [root@master ~]# hadoop-daemon.sh start namenode
(5) 使用FileSyste類來讀寫hdfs
package com.hadoop.hdfs; import java.io.FileInputStream; import org.apache.commons.logging.Log; import org.apache.commons.logging.LogFactory; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileStatus; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; public class HelloHDFS { public static Log log = LogFactory.getLog(HelloHDFS.class); public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); conf.set("fs.defaultFS", "hdfs://192.168.56.100:9000"); conf.set("dfs.replication", "2");//默認為3 FileSystem fileSystem = FileSystem.get(conf); boolean success = fileSystem.mkdirs(new Path("/yucong")); log.info("創建文件是否成功:" + success); success = fileSystem.exists(new Path("/yucong")); log.info("文件是否存在:" + success); success = fileSystem.delete(new Path("/yucong"), true); log.info("刪除文件是否成功:" + success); /*FSDataOutputStream out = fileSystem.create(new Path("/test.data"), true); FileInputStream fis = new FileInputStream("c:/test.txt"); IOUtils.copyBytes(fis, out, 4096, true);*/ FSDataOutputStream out = fileSystem.create(new Path("/test2.data")); FileInputStream in = new FileInputStream("c:/test.txt"); byte[] buf = new byte[4096]; int len = in.read(buf); while(len != -1) { out.write(buf,0,len); len = in.read(buf); } in.close(); out.close(); FileStatus[] statuses = fileSystem.listStatus(new Path("/")); log.info(statuses.length); for(FileStatus status : statuses) { log.info(status.getPath()); log.info(status.getPermission()); log.info(status.getReplication()); } } }
這是一個maven項目,pom.xml文件為:
<dependencies> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-common</artifactId> <version>2.7.3</version> </dependency> <dependency> <groupId>org.apache.hadoop</groupId> <artifactId>hadoop-hdfs</artifactId> <version>2.7.3</version> </dependency> </dependencies>
第三課:java開發hdfs