Manage Files on HDFS via Cli/Ambari Files View——如何在ambari上檢視HDFS檔案

阿新 • • 發佈：2020-07-27

原文：https://www.cloudera.com/tutorials/manage-files-on-hdfs-via-cli-ambari-files-view/2.html

Introduction

In the previous tutorial, we learned to manage files on the Hadoop Distributed File System (HDFS) with the command line. Now we will use Ambari Files View to perform many of the file management operations on HDFS that we learned with CLI, but through the web-based interface.

Prerequisites

Downloaded and deployed theHortonworks Data Platform (HDP)Sandbox
Learning the Ropes of the HDP Sandbox

Outline

Download the Drivers Related Datasets

We will downloadgeolocation.csvandtrucks.csvdata onto our local filesystems of the sandbox. The commands are tailored for mac and linux users.

Then, we will downloadgeolocation.csvandtrucks.csvdata onto our local filesystems of the sandbox. The commands are tailored for mac and linux users.

1. Open a terminal on your local machine, SSH into the sandbox:

ssh [email protected] -p 2222

Note: If you're on VMware or Docker, ensure that you map the sandbox IP to the correct hostname in the hosts file.Map your Sandbox IP

2. Open another terminal, change your current directory to Downloads then copy and paste the commands to download thegeolocation.csvandtrucks.csvfiles. We will use them while we learn file management operations.

#Change your current directory to Downloads
cd Downloads

#Download geolocation.csv
wget https://github.com/hortonworks/data-tutorials/raw/master/tutorials/hdp/manage-files-on-hdfs-via-cli-ambari-files-view/assets/drivers-datasets/geolocation.csv

#Download trucks.csv
wget https://github.com/hortonworks/data-tutorials/raw/master/tutorials/hdp/manage-files-on-hdfs-via-cli-ambari-files-view/assets/drivers-datasets/trucks.csv

#Create directory for drivers-datasets
mkdir drivers-datasets

#Move the geolocation and trucks csv files to the directory
mv geolocation.csv trucks.csv drivers-datasets/

Create a Directory in HDFS, Upload a file and List Contents

Create Directory Tree in User

1. Login to Ambari Interface atsandbox-hdp.hortonworks.com:8080. Use the following login credentials inTable 1.

Table 1: Ambari Login credentials

Username	Password
admin	**setup process

Setup Ambari Admin Password Manually

2. Now that we have admin privileges, we can manage files on HDFS using Files View. Hover over the Ambari Selector Icon, enter the Files View web-interface.

The Files View Interface will appear with the following default folders.

3. We will create 3 folders using the Files View web-interface. Allthree folders:hadoop,geolocations and trucksthe last two which will reside in thehadoopfolder, which resides inuser.

Navigate into theuserfolder. Click thenew folderbutton, an add new folder window appears and name the folderhadoop. Pressenteror+Add

4. Navigate into thehadoopfolder. Create thetwo folders:geolocation and trucksfollowing the process stated in the previous instruction.

Upload Local Machine Files to HDFS

We will upload two files from our local machine:geolocation.csvandtrucks.csvto appropriate HDFS directories.

1. Navigate through the path/user/hadoop/geolocationor if you're already inhadoop, enter thegeolocationfolder. Click the upload buttonto transfergeolocation.csvinto HDFS.

An Upload file window appears:

2. Click on the cloud with an arrow. A window with files from your local machine appears, findgeolocation.csvin theDownloads/drivers_datasetsfolder, select it and then pressopenbutton.

3. In Files View, navigate to thehadoopfolder and enter thetrucksfolder. Repeat the upload file process to uploadtrucks.csv.

View and Examine Directory Contents

Each time we open a directory, the Files View automatically lists the contents. Earlier we started in theuserdirectory.

1. Let's navigate back to theuserdirectory to examine the details given by the contents. Reference the image below while you read the Directory Contents Overview.

Directory Contents Overview of Columns

Nameare the files/folders
Sizecontains bytes for the Contents
Last Modifiedincludes the date/time the content was created or Modified
Owneris who owns that contents
Groupis who can make changes to the files/folders
Permissionsestablishes who can read, write and execute data

Find Out Space Utilization in a HDFS Directory

In the command line when the directories and files are listed with thehadoop fs -du /user/hadoop/, the size of the directory and file is shown. In Files View, we must navigate to the file to see the size, we are not able to see thesizeof the directory even if it contains files.

Let's view the size ofgeolocation.csvfile. Navigate through/user/hadoop/geolocation. How much space has the file utilized? Files View shows514.3 KBforgeolocation.csv.

Download File From HDFS to Local Machine

Files View enables users to download files and folders to their local machine with ease.

Let's download thegeolocation.csvfile to our computer. Click on the file's row, the row's color becomes blue, a group of file operations will appear, select the Download button. The default directory the file downloads to is ourDownloadfolder on our local machine.

Explore Two Advanced Features

Concatenate Files

File Concatenation merges two files together. If we concatenatetrucks.csvwithgeolocation.csv, the data fromgeolocation.csvwill be appended to the end oftrucks.csv. A typical use case for a user to use this feature is when they have similar large datasets that they want to merge together. The manual process to combine large datasets is inconvenient, so file concatenation was created to do the operation instantly.

1. Before we merge the csv files, we must place them in the same folder. Click ongeolocation.csvrow, it will highlight in blue, then press copy and in the copy window appears, select thetrucksfolder and pressCopyto copy the csv file to it.

2. We will merge two large files together by selecting them both and performing concatenate operation. Navigate to thetrucksfolder. Selectgeolocation.csv, hold shift and click ontrucks.csv. Click the concatenate button. The files will be downloaded into theDownloadfolder on your local machine.

3. By default, Files View saves the merged files as a txt file, we can open the file and save it as a csv file. Then open the csv file and you will notice that all the content from geolocation is appended to the trucks file.

Copy Files or Directories recursively

Copy file or directories recursively means all the directory's files and subdirectories to the bottom of the directory tree are copied. For instance, we will copy thehadoopdirectory and all of its contents to a new location within our hadoop cluster. In production, the copy operation is used to copy large datasets within the hadoop cluster or between 2 or more clusters.

1. Navigate to theuserdirectory. Click on the row of thehadoopdirectory. Select the Copy button.

2. TheCopy towindow will appear. Select thetmpfolder, the row will turn blue. If you select the folder icon, the contents oftmpbecome visible. Make sure the row is highlighted blue to do the copy. Click the blueCopybutton to copy thehadoopfolder recursively to this new location.

3. A new copy of thehadoopfolder and all of its contents can be found in thetmpfolder. Navigate totmpfor verification. Check that all of thehadoopfolder's contents copied successfully.

Summary

Congratulations! We just learned to use the Files View to manage ourgeolocation.csvandtrucks.csvdataset files in HDFS. We learned to create, upload and list the contents in our directories. We also acquired the skills to download files from HDFS to our local file system and explored a few advanced features of HDFS file management.

Manage Files on HDFS via Cli/Ambari Files View——如何在ambari上檢視HDFS檔案

Introduction

Prerequisites

Outline

Download the Drivers Related Datasets

Create a Directory in HDFS, Upload a file and List Contents

Create Directory Tree in User

Upload Local Machine Files to HDFS

View and Examine Directory Contents

Find Out Space Utilization in a HDFS Directory

Download File From HDFS to Local Machine

Explore Two Advanced Features

Concatenate Files

Copy Files or Directories recursively

Summary

Further Reading

Manage Files on HDFS via Cli/Ambari Files View——如何在ambari上檢視HDFS檔案

npm安裝webpack無反應_...install 'webpack-cli' to use webpack via CLI問題

webpack打包錯誤：No configuration file found and no output filename configured via CLI option.

HDFS客戶端環節準備--在Windows上安裝hadoop

python讀取hdfs上的parquet檔案方式

shapefile on leaflet，在leaflet上載入shapefile檔案

探尋從HDFS到Spark的高效資料通道：以小檔案輸入為案例(轉)

vue-cli專案中生成免打包的配置檔案，直接在配置檔案中修改url地址重新整理頁面即可更改請求域名地址，不需重新打包

使用Java API檢視HDFS檔案內容

C# read and compute the code lines number of cs files based on given directory

HDFS 提示 There are 6 missing blocks. The following files may be corrupted:

Access Excel or CSV Files Saved on Power BI Report Server Directly

第3期：Too many open files以及ulimit的探討

解決Cannot resolve classpath entry: /Program Files/IBM/SQLLIB/java/db2java.zip

解決：當前標識(DESKTOP-29DL0H4\ld0)沒有對“C:\Windows\Microsoft.NET\Framework64\v4.0.30319\Temporary ASP.NET Files”的寫訪問許可權。

Django 解決上傳檔案時,request.FILES為空的問題

Caused by: org.gradle.api.internal.artifacts.ivyservice.DefaultLenientConfiguration$ArtifactResolveException: Could not resolve all files for configuration ':classpath'.

登入普通使用者會報錯-bash: ulimit: open files: cannot modify limit: Operation not permitted

git diff-files (Git) – Git 中文開發手冊

Too many open files after upgrade to Spring Boot 2.2.8

Manage Files on HDFS via Cli/Ambari Files View——如何在ambari上檢視HDFS檔案

相關推薦