1. 程式人生 > >hdfs檔案3個副本BLK的查詢

hdfs檔案3個副本BLK的查詢

開始部署hdfs的時候,檔案冗餘3份。那麼1個檔案分拆成那些BLK,分別儲存在那裡呢?

hadoop fsck <需要找的檔名> -files -blocks -locations 語句幫你忙。

#######################

[[email protected] ~]# hadoop fsck --help
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


Usage: DFSck <path> [-list-corruptfileblocks | [-move | -delete | -openforwrite] [-files [-blocks [-locations | -racks]]]]
        <path>  start checking from this path
        -move   move corrupted files to /lost+found
        -delete delete corrupted files
        -files  print out files being checked
        -openforwrite   print out files opened for write
        -includeSnapshots       include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it
        -list-corruptfileblocks print out list of missing blocks and files they belong to
        -blocks print out block report
        -locations      print out locations for every block
        -racks  print out network topology for data-node locations


Please Note:
        1. By default fsck ignores files opened for write, use -openforwrite to report such files. They are usually  tagged CORRUPT or HEALTHY depending on 


their block allocation status
        2. Option -includeSnapshots should not be used for comparing stats, should be used only for HEALTH check, as this may contain duplicates if the same 


file present in both original fs tree and inside snapshots.


Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>            use value for given property
-fs <local|namenode:port>      specify a namenode
-jt <local|resourcemanager:port>    specify a ResourceManager
-files <comma separated list of files>    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>    specify comma separated archives to be unarchived on the compute machines.


The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]


[

[email protected] ~]#

###################################   例子:####################################################################

[[email protected] NEW]$ hadoop fsck /user/hue/external/tbl_8005/Finance_inequality_and_the_poor_data_8005.csv -files -blocks -locations
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.


Connecting to namenode via http://snn.hadoop:50070
FSCK started by hdfs (auth:SIMPLE) from /192.168.100.13 for path /user/hue/external/tbl_8005/Finance_inequality_and_the_poor_data_8005.csv at Sat Dec 05 22:18:36 HKT 2015
/user/hue/external/tbl_8005/Finance_inequality_and_the_poor_data_8005.csv 8472 bytes, 1 block(s):  OK
0. BP-170662068-192.168.100.11-1447496766461:blk_1073746407_5690 len=8472 Live_repl=3 [DatanodeInfoWithStorage[192.168.100.12:50010,DS-6db5c6fb-018c-446f-94cb-adfeed0e5222,DISK], DatanodeInfoWithStorage[192.168.100.10:50010,DS-fb298bf6-404c-46ad-848d-070ad0637248,DISK], DatanodeInfoWithStorage[192.168.100.15:50010,DS-fcc3d5f1-410f-4d69-aa4f-24064bf8c681,DISK]]


Status: HEALTHY
 Total size:    8472 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      1 (avg. block size 8472 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          5
 Number of racks:               1
FSCK ended at Sat Dec 05 22:18:36 HKT 2015 in 4 milliseconds




The filesystem under path '/user/hue/external/tbl_8005/Finance_inequality_and_the_poor_data_8005.csv' is HEALTHY
[

[email protected] NEW]$


#############################


hdfs fsck <file_name> -files -blocks -locations


#############################


[[email protected] NEW]$ hdfs fsck /user/hue/external/tbl_8005/Finance_inequality_and_the_poor_data_8005.csv -files -blocks -locations
Connecting to namenode via http://snn.hadoop:50070
FSCK started by hdfs (auth:SIMPLE) from /192.168.100.13 for path /user/hue/external/tbl_8005/Finance_inequality_and_the_poor_data_8005.csv at Sat Dec 05 22:20:19 HKT 2015
/user/hue/external/tbl_8005/Finance_inequality_and_the_poor_data_8005.csv 8472 bytes, 1 block(s):  OK
0. BP-170662068-192.168.100.11-1447496766461:blk_1073746407_5690 len=8472 Live_repl=3 [DatanodeInfoWithStorage[192.168.100.12:50010,DS-6db5c6fb-018c-446f-94cb-adfeed0e5222,DISK], DatanodeInfoWithStorage[192.168.100.10:50010,DS-fb298bf6-404c-46ad-848d-070ad0637248,DISK], DatanodeInfoWithStorage[192.168.100.15:50010,DS-fcc3d5f1-410f-4d69-aa4f-24064bf8c681,DISK]]


Status: HEALTHY
 Total size:    8472 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                0
 Total blocks (validated):      1 (avg. block size 8472 B)
 Minimally replicated blocks:   1 (100.0 %)
 Over-replicated blocks:        0 (0.0 %)
 Under-replicated blocks:       0 (0.0 %)
 Mis-replicated blocks:         0 (0.0 %)
 Default replication factor:    3
 Average block replication:     3.0
 Corrupt blocks:                0
 Missing replicas:              0 (0.0 %)
 Number of data-nodes:          5
 Number of racks:               1
FSCK ended at Sat Dec 05 22:20:19 HKT 2015 in 1 milliseconds




The filesystem under path '/user/hue/external/tbl_8005/Finance_inequality_and_the_poor_data_8005.csv' is HEALTHY
[
[email protected]
NEW]$




##############################


blk_1073746407




find / -name "blk_1073746407*"


192.168.100.15:


[[email protected] subdir17]# pwd
/dfs/dn/current/BP-170662068-192.168.100.11-1447496766461/current/finalized/subdir0/subdir17
[[email protected] subdir17]# ll blk_1073746407*
-rw-r--r-- 1 hdfs hdfs 8472 Dec  4 22:19 blk_1073746407
-rw-r--r-- 1 hdfs hdfs   75 Dec  4 22:19 blk_1073746407_5690.meta
[[email protected] subdir17]#
[[email protected] subdir17]#
[[email protected] subdir17]#
[[email protected] subdir17]#
[[email protected] subdir17]#


#######


192.168.100.12


[[email protected] ~]# ssh snn.hadoop
Last login: Sat Dec  5 13:25:46 2015 from 192.168.100.1
[[email protected] ~]# find / -name "blk_1073746407*"
/dfs/dn/current/BP-170662068-192.168.100.11-1447496766461/current/finalized/subdir0/subdir17/blk_1073746407
/dfs/dn/current/BP-170662068-192.168.100.11-1447496766461/current/finalized/subdir0/subdir17/blk_1073746407_5690.meta
[[email protected] ~]#
[[email protected] ~]#
[[email protected] ~]#




#######


192.168.100.10


[[email protected] ~]# find / -name "blk_1073746407*"
/dfs/dn/current/BP-170662068-192.168.100.11-1447496766461/current/finalized/subd                                                                              ir0/subdir17/blk_1073746407
/dfs/dn/current/BP-170662068-192.168.100.11-1447496766461/current/finalized/subd                                                                              ir0/subdir17/blk_1073746407_5690.meta
[[email protected] ~]#
[[email protected] ~]#



########################

相關推薦

hdfs檔案3副本BLK查詢

開始部署hdfs的時候,檔案冗餘3份。那麼1個檔案分拆成那些BLK,分別儲存在那裡呢? hadoop fsck <需要找的檔名> -files -blocks -locations 語句幫你忙。 ####################### [[email

hadoop之hdfs3節點以同一程序啟動

HDFS三個程序啟動都以xxxxxxxx啟動: 以我自己的機器名稱為例: HDFS三個程序啟動以hadoop001啟動:etc/hadoop目錄下設定 之前在部署hdfs時候修改了core-site.xml檔案slaves檔案 (1)在core-site.xm

Hadoop:本地檔案(window系統)定時獲取檔案並上傳至HDFS檔案(兩虛擬機器)系統 Java 實現

實現功能:定時日誌採集並上傳至HDFS檔案系統的Java API實現 環境+工具:windows  +  虛擬機器Centos * 2  +  eclipse  +  windows下編譯的Hadoop jar包  +  Hadoop叢集 一、流程        1)啟

hadoop預設對3副本的儲存策略和執行策略:

1,首先要先了解下什麼是rack(機架)叢集,一個叢集有多個機架,一個機架有多個機器,一個機器一個datanode或namenode節點。通常一個機架內的機器之間的網路速度會高於跨機架機器之間的網路速度

製作英文學習詞典。編寫程式製作英文學習詞典,詞典有3基本功能:新增、查詢和退出。程式讀取原始檔路徑下的txt格式詞典檔案,若沒有就建立一個(Python)

以下路徑可更換為你自己的路徑,本程式採用Python語言大致實現了serach()查詢函式和add()新增函式。細節有待完善,謝謝 def search(): w=input("請輸入要查詢的單詞:") fr=open("C:\\Users

HDFS中JavaAPI對檔案的上傳、查詢

Ubuntu + Hadoop2.7.3叢集搭建:https://blog.csdn.net/qq_38038143/article/details/83050840 Ubuntu配置Eclipse + Hadoop環境:https://blog.csdn.net/qq_380381

使用指令碼,獲取一個檔案每一行的第n(2,3)元素(使用你使用過的任何指令碼)

awk  '{print $2}'  /home/thomas/china.txt 使用vim編輯china裡邊的內容 執行awk命令: awk  '{print $2}'  /home/thomas/china.txt  

python官網某個版本安裝包對應的3下載檔案的區別

Python 3.6.0a1 - 2016-05-17 Download Windows x86 web-based installer Download Windows x86 executable installer Download Windows x86 embeddable zip fil

3.2 HDFS檔案讀寫

第3章 HDFS:分散式檔案系統 3.2 HDFS檔案讀寫 3.2.1 檔案訪問許可權 針對檔案和目錄,HDFS有與POSIX非常相似的許可權模式。 一共提供三類許可權模式:只讀許可權(r)、寫入許可權(w)和可

嘗試實現一個管理系統, 名字和電話號分別用兩列表儲存 =======通訊錄管理系統======= 1.增加姓名和手機 2.刪除姓名 3.修改手機 4.查詢所有使用者 5.根據姓名查詢手機號 6.退出

name = [] tel = [] while True: print('==通訊錄管理系統==') print('1.增加姓名和手機') print('2.刪除姓名') print('3.修改手機') print

讀取hdfs檔案上的第二塊的資料

package com.ghgj.cn.zy; import java.io.IOException; import java.net.URI; import java.net.URISyntaxException; import org.apache.hadoop.conf.Confi

PySpark關於HDFS檔案(目錄)輸入、資料格式的探討 ####3

背景 平臺HDFS資料儲存規則是按照“資料集/天目錄/小時目錄/若干檔案”進行的,其中資料集是依據產品線或業務劃分的。 使用者分析資料時,可能需要處理以下五個場景: (一)分析指定資料集、指定日期、指定小時、指定檔案的資料;(二)分析指定資料集、指定日期、指定小時的資料;(

偽分散式和完全分散式的3配置檔案的配置

執行的通用操作都是格式化+啟動 hdfs namenode -format start-dfs.sh 完全分散式 在node01節點(即NameNode節點)配置hadoop 修改hdfs-site.xml配置檔案 <property>

IP地址屬地查詢測試用例,涵蓋31省會和3運營商

編寫的IP地址屬地查詢的程式碼測試用的,參考了網上的一份用例表,並做了修改:只保留省會,對過時、不準確、缺項的IP地址進行了修正、補充。 電信 聯通 移動 北京 220.181.22.1 202.96.18.1 221.130.33.1 上

查詢近7天,近1月,近3月每天的資料量,查詢近一年每個月的資料量

統計近7天每天,近一個月每天,近三個月每天,近一年每個月的新增數量,用於畫折線圖,由於是根據create_time欄位統計的,所以如果有一天沒有新增,就會缺少這一天的日期,要對日期進行補充,當天沒有新增的new_count置為0,所以要建立一個日期表calendar 1、查

倪暢的彙編程式——為什麼文字檔案多了3位元組

  學生倪暢編了一個彙編程式,用記事本編輯的,如下圖:   他的檔案可以下載,點這裡…。   程式很簡單,目測沒問題。   但編譯後是這樣的:     奇了大怪了,第一行有多餘字元,看不見

spark 載入多目錄; RDD輸出到hdfs檔案壓縮

(1)  spark textFile載入多個目錄:   其實很簡單,將多個目錄(對應多個字串),用,作為分隔符連線起來    val inputPath = List("hdfs://localhost:9000/test/hiveTest", "hdfs://local

分別用Shell和Python遍歷查詢Hdfs檔案路徑

1、使用Shell/Users/nisj/PycharmProjects/BiDataProc/getOssFileForDemo/getHdfsFilePath.sh#!/usr/bin/env b