Linux 單詞計數 WordCount 以及程式碼案例

阿新 • • 發佈：2019-01-23

WordCount

首先是命令列的：

WordCount(單詞計數)

1:啟動hadoop 使用 start-all.sh 命令啟動hdfs

2:在hadoop的安裝目錄下新建一個目錄，使用hdfs的shell命令

cd /usr/local/hadoop-2.8.0 切換目錄

hdfs fs -mkdir /input

3:hadoop fs -put LICENSE.txt /input 將hadoop安裝目錄下的LICENSE.txt 檔案放入到input資料夾中

4:使用hadoop fs -ls /input 檢視input目錄下是否成功放入!

5:執行以下命令

cd /usr/local/hadoop-2.8.0/share/hadoop/mapreduce (切換目錄)

hadoop jar hadoop-mapreduce-examples-2.8.0.jar wordcount /input/output2(單詞計數)

結果如下圖所示:

6:檢視輸出結果的目錄 hadoop fs -ls /outpu2 圖為最終結果檔案

7:檢視最終結果 hadoop fs -cat /output2/part-r-00000

如果出現了圖上的狀態我們的wordCount就算是配置好了

接下來我們寫程式碼的部分

首先我們用的是Eclipse

我們要建一個Maven專案正常的java就可以沒必要是web的

首先我們修改pom.xml檔案

新增節點：

  <dependency>  
            <groupId>org.apache.hadoop</groupId>  
            <artifactId>hadoop-common</artifactId>  
            <version>2.2.0</version>  
        </dependency>  
        <dependency>  
            <groupId>org.apache.hadoop</groupId>  
            <artifactId>hadoop-hdfs</artifactId>  
            <version>2.2.0</version>  
        </dependency>  
        <dependency>  
            <groupId>org.apache.hadoop</groupId>  
            <artifactId>hadoop-client</artifactId>  
            <version>2.2.0</version>  
        </dependency>  
        <dependency>  
            <groupId>junit</groupId>  
            <artifactId>junit</artifactId>  
            <version>3.8.1</version>  
            <scope>test</scope>  
        </dependency>

程式碼：

package cn.happy.Word;

import java.io.IOException;
import java.net.URI;
import java.net.URISyntaxException;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.mapreduce.lib.partition.HashPartitioner;

public class WordCount {
	static final String INPUT_PATH = "hdfs://192.168.1.9:9000/input/LICENSE.txt";
	static final String OUTPUT_PATH = "hdfs://192.168.1.9:9000/output";
	// KEYIN 偏移量 代表讀取幾個字元 起始位置
	// VALUEIN 文字內容
	// KEYOUT 單詞
	// VALUEOUT 出現的次數
	static class MyMapper extends
	        //四個泛型
	        //No.1 代表行的偏移量   Map 方法執行之前   0行到字元(位元組的數量)
	        //No.2 行的內容   Hello World 
	        //No.3 map方法執行結束之後，要轉交給Reducer的鍵值對型別  Hello 1  World 1
			Mapper<LongWritable, Text, Text, LongWritable> {
		@Override
		//Key行的偏移量
		//Value的值  Hello World
		protected void map(LongWritable key, Text value,
				Mapper<LongWritable, Text, Text, LongWritable>.Context context)
				throws IOException, InterruptedException {
			// 轉為String型別
			String str = value.toString();
			// 根據檔案內容將字串拆分為String陣列 按空格拆分
			String[] split = str.split(" ");
			for (String string : split) {
				/*
				 * Hello 1
				 * World 1
				 * Me 1
				 * Hello 1
				 */
				context.write(new Text(string), new LongWritable(1));
			}
		}
	}

	// KEYIN 行中單詞
	// VALUEIN 行中的單詞數量
	// KEYOUT 不同單詞
	// VALUEOUT 總次數
	
	/*
	 * Hello 1
	 * World 1
	 * Me 1
	 * Hello 1
	 */
	static class MyReducer extends
			Reducer<Text, LongWritable, Text, LongWritable> {
		@Override
		protected void reduce(Text t1, Iterable<LongWritable> arg1,
				Reducer<Text, LongWritable, Text, LongWritable>.Context ctx)
				throws IOException, InterruptedException {
			long t = 0;
			for (LongWritable longWritable : arg1) {
				t += longWritable.get();
			}
			ctx.write(t1, new LongWritable(t));
		}
	}

	public static void main(String[] args) throws Exception {
		 System.setProperty("hadoop.home.dir", "E:\\Y2\\Y2\\Hadoop大資料\\hadoop-2.8.0");
		 Configuration conf = new Configuration();
         final FileSystem fileSystem = FileSystem.get(new URI(INPUT_PATH), conf);
         final Path outPath = new Path(OUTPUT_PATH);
         if(fileSystem.exists(outPath)){
             fileSystem.delete(outPath, true);
         }
         final Job job = new Job(conf,WordCount.class.getSimpleName());
         FileInputFormat.setInputPaths(job, new Path(INPUT_PATH)); 
         
         job.setInputFormatClass(TextInputFormat.class);//指定如何對輸入檔案進行格式化，把輸入檔案每一行解析成鍵值對
         job.setMapperClass(MyMapper.class);//1.2 指定自定義的map類
         job.setMapOutputKeyClass(Text.class);//map輸出的<k,v>型別。如果<k3,v3>的型別與<k2,v2>型別一致，則可以省略
         job.setMapOutputValueClass(LongWritable.class);
         
         job.setPartitionerClass(HashPartitioner.class);//1.3 分割槽    
         job.setNumReduceTasks(1);//有一個reduce任務執行
         //1.4 TODO 排序、分組
         //1.5 TODO 規約
         job.setReducerClass(MyReducer.class);//2.2 指定自定義reduce類
         job.setOutputKeyClass(Text.class);//指定reduce的輸出型別
         job.setOutputValueClass(LongWritable.class);//2.3 指定寫出到哪裡
         FileOutputFormat.setOutputPath(job, outPath);//指定輸出檔案的格式化類
         
         job.setOutputFormatClass(TextOutputFormat.class);
         
         job.waitForCompletion(true);//把job提交給JobTracker執行
	}
}

我們需要一個把後臺程式碼中的改成己寫的但是我們需要包名程式碼都和後臺的一樣：

/**
 * Licensed to the Apache Software Foundation (ASF) under one
 * or more contributor license agreements.  See the NOTICE file
 * distributed with this work for additional information
 * regarding copyright ownership.  The ASF licenses this file
 * to you under the Apache License, Version 2.0 (the
 * "License"); you may not use this file except in compliance
 * with the License.  You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific language governing permissions and
 * limitations under the License.
 */
package org.apache.hadoop.io.nativeio;

import com.google.common.annotations.VisibleForTesting;

import org.apache.commons.logging.Log;
import org.apache.commons.logging.LogFactory;
import org.apache.hadoop.classification.InterfaceAudience;
import org.apache.hadoop.classification.InterfaceStability;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.CommonConfigurationKeys;
import org.apache.hadoop.fs.HardLink;
import org.apache.hadoop.io.IOUtils;
import org.apache.hadoop.io.SecureIOUtils.AlreadyExistsException;
import org.apache.hadoop.io.nativeio.Errno;
import org.apache.hadoop.io.nativeio.NativeIOException;
import org.apache.hadoop.util.NativeCodeLoader;
import org.apache.hadoop.util.PerformanceAdvisory;
import org.apache.hadoop.util.Shell;

import sun.misc.Unsafe;

import java.io.*;
import java.lang.reflect.Field;
import java.nio.ByteBuffer;
import java.nio.MappedByteBuffer;
import java.nio.channels.FileChannel;
import java.util.Map;
import java.util.concurrent.ConcurrentHashMap;

/**
 * JNI wrappers for various native IO-related calls not available in Java.
 * These functions should generally be used alongside a fallback to another
 * more portable mechanism.
 */
@InterfaceAudience.Private
@InterfaceStability.Unstable
public class NativeIO {
  public static class POSIX {
    // Flags for open() call from bits/fcntl.h - Set by JNI
    public static int O_RDONLY = -1;
    public static int O_WRONLY = -1;
    public static int O_RDWR = -1;
    public static int O_CREAT = -1;
    public static int O_EXCL = -1;
    public static int O_NOCTTY = -1;
    public static int O_TRUNC = -1;
    public static int O_APPEND = -1;
    public static int O_NONBLOCK = -1;
    public static int O_SYNC = -1;

    // Flags for posix_fadvise() from bits/fcntl.h - Set by JNI
    /* No further special treatment.  */
    public static int POSIX_FADV_NORMAL = -1;
    /* Expect random page references.  */
    public static int POSIX_FADV_RANDOM = -1;
    /* Expect sequential page references.  */
    public static int POSIX_FADV_SEQUENTIAL = -1;
    /* Will need these pages.  */
    public static int POSIX_FADV_WILLNEED = -1;
    /* Don't need these pages.  */
    public static int POSIX_FADV_DONTNEED = -1;
    /* Data will be accessed once.  */
    public static int POSIX_FADV_NOREUSE = -1;


    // Updated by JNI when supported by glibc.  Leave defaults in case kernel
    // supports sync_file_range, but glibc does not.
    /* Wait upon writeout of all pages
       in the range before performing the
       write.  */
    public static int SYNC_FILE_RANGE_WAIT_BEFORE = 1;
    /* Initiate writeout of all those
       dirty pages in the range which are
       not presently under writeback.  */
    public static int SYNC_FILE_RANGE_WRITE = 2;
    /* Wait upon writeout of all pages in
       the range after performing the
       write.  */
    public static int SYNC_FILE_RANGE_WAIT_AFTER = 4;

    private static final Log LOG = LogFactory.getLog(NativeIO.class);

    // Set to true via JNI if possible
    public static boolean fadvisePossible = false;

    private static boolean nativeLoaded = false;
    private static boolean syncFileRangePossible = true;

    static final String WORKAROUND_NON_THREADSAFE_CALLS_KEY =
      "hadoop.workaround.non.threadsafe.getpwuid";
    static final boolean WORKAROUND_NON_THREADSAFE_CALLS_DEFAULT = true;

    private static long cacheTimeout = -1;

    private static CacheManipulator cacheManipulator = new CacheManipulator();

    public static CacheManipulator getCacheManipulator() {
      return cacheManipulator;
    }

    public static void setCacheManipulator(CacheManipulator cacheManipulator) {
      POSIX.cacheManipulator = cacheManipulator;
    }

    /**
     * Used to manipulate the operating system cache.
     */
    @VisibleForTesting
    public static class CacheManipulator {
      public void mlock(String identifier, ByteBuffer buffer,
          long len) throws IOException {
        POSIX.mlock(buffer, len);
      }

      public long getMemlockLimit() {
        return NativeIO.getMemlockLimit();
      }

      public long getOperatingSystemPageSize() {
        return NativeIO.getOperatingSystemPageSize();
      }

      public void posixFadviseIfPossible(String identifier,
        FileDescriptor fd, long offset, long len, int flags)
            throws NativeIOException {
        POSIX.posixFadviseIfPossible(identifier, fd, offset,
            len, flags);
      }

      public boolean verifyCanMlock() {
        return NativeIO.isAvailable();
      }
    }

    /**
     * A CacheManipulator used for testing which does not actually call mlock.
     * This allows many tests to be run even when the operating system does not
     * allow mlock, or only allows limited mlocking.
     */
    @VisibleForTesting
    public static class NoMlockCacheManipulator extends CacheManipulator {
      public void mlock(String identifier, ByteBuffer buffer,
          long len) throws IOException {
        LOG.info("mlocking " + identifier);
      }

      public long getMemlockLimit() {
        return 1125899906842624L;
      }

      public long getOperatingSystemPageSize() {
        return 4096;
      }

      public boolean verifyCanMlock() {
        return true;
      }
    }

    static {
      if (NativeCodeLoader.isNativeCodeLoaded()) {
        try {
          Configuration conf = new Configuration();
          workaroundNonThreadSafePasswdCalls = conf.getBoolean(
            WORKAROUND_NON_THREADSAFE_CALLS_KEY,
            WORKAROUND_NON_THREADSAFE_CALLS_DEFAULT);

          initNative();
          nativeLoaded = true;

          cacheTimeout = conf.getLong(
            CommonConfigurationKeys.HADOOP_SECURITY_UID_NAME_CACHE_TIMEOUT_KEY,
            CommonConfigurationKeys.HADOOP_SECURITY_UID_NAME_CACHE_TIMEOUT_DEFAULT) *
            1000;
          LOG.debug("Initialized cache for IDs to User/Group mapping with a " +
            " cache timeout of " + cacheTimeout/1000 + " seconds.");

        } catch (Throwable t) {
          // This can happen if the user has an older version of libhadoop.so
          // installed - in this case we can continue without native IO
          // after warning
          PerformanceAdvisory.LOG.debug("Unable to initialize NativeIO libraries", t);
        }
      }
    }

    /**
     * Return true if the JNI-based native IO extensions are available.
     */
    public static boolean isAvailable() {
      return NativeCodeLoader.isNativeCodeLoaded() && nativeLoaded;
    }

    private static void assertCodeLoaded() throws IOException {
      if (!isAvailable()) {
        throw new IOException("NativeIO was not loaded");
      }
    }

    /** Wrapper around open(2) */
    public static native FileDescriptor open(String path, int flags, int mode) throws IOException;
    /** Wrapper around fstat(2) */
    private static native Stat fstat(FileDescriptor fd) throws IOException;

    /** Native chmod implementation. On UNIX, it is a wrapper around chmod(2) */
    private static native void chmodImpl(String path, int mode) throws IOException;

    public static void chmod(String path, int mode) throws IOException {
      if (!Shell.WINDOWS) {
        chmodImpl(path, mode);
      } else {
        try {
          chmodImpl(path, mode);
        } catch (NativeIOException nioe) {
          if (nioe.getErrorCode() == 3) {
            throw new NativeIOException("No such file or directory",
                Errno.ENOENT);
          } else {
            LOG.warn(String.format("NativeIO.chmod error (%d): %s",
                nioe.getErrorCode(), nioe.getMessage()));
            throw new NativeIOException("Unknown error", Errno.UNKNOWN);
          }
        }
      }
    }

    /** Wrapper around posix_fadvise(2) */
    static native void posix_fadvise(
      FileDescriptor fd, long offset, long len, int flags) throws NativeIOException;

    /** Wrapper around sync_file_range(2) */
    static native void sync_file_range(
      FileDescriptor fd, long offset, long nbytes, int flags) throws NativeIOException;

    /**
     * Call posix_fadvise on the given file descriptor. See the manpage
     * for this syscall for more information. On systems where this
     * call is not available, does nothing.
     *
     * @throws NativeIOException if there is an error with the syscall
     */
    static void posixFadviseIfPossible(String identifier,
        FileDescriptor fd, long offset, long len, int flags)
        throws NativeIOException {
      if (nativeLoaded && fadvisePossible) {
        try {
          posix_fadvise(fd, offset, len, flags);
        } catch (UnsatisfiedLinkError ule) {
          fadvisePossible = false;
        }
      }
    }

    /**
     * Call sync_file_range on the given file descriptor. See the manpage
     * for this syscall for more information. On systems where this
     * call is not available, does nothing.
     *
     * @throws NativeIOException if there is an error with the syscall
     */
    public static void syncFileRangeIfPossible(
        FileDescriptor fd, long offset, long nbytes, int flags)
        throws NativeIOException {
      if (nativeLoaded && syncFileRangePossible) {
        try {
          sync_file_range(fd, offset, nbytes, flags);
        } catch (UnsupportedOperationException uoe) {
          syncFileRangePossible = false;
        } catch (UnsatisfiedLinkError ule) {
          syncFileRangePossible = false;
        }
      }
    }

    static native void mlock_native(
        ByteBuffer buffer, long len) throws NativeIOException;

    /**
     * Locks the provided direct ByteBuffer into memory, preventing it from
     * swapping out. After a buffer is locked, future accesses will not incur
     * a page fault.
     * 
     * See the mlock(2) man page for more information.
     * 
     * @throws NativeIOException
     */
    static void mlock(ByteBuffer buffer, long len)
        throws IOException {
      assertCodeLoaded();
      if (!buffer.isDirect()) {
        throw new IOException("Cannot mlock a non-direct ByteBuffer");
      }
      mlock_native(buffer, len);
    }
    
    /**
     * Unmaps the block from memory. See munmap(2).
     *
     * There isn't any portable way to unmap a memory region in Java.
     * So we use the sun.nio method here.
     * Note that unmapping a memory region could cause crashes if code
     * continues to reference the unmapped code.  However, if we don't
     * manually unmap the memory, we are dependent on the finalizer to
     * do it, and we have no idea when the finalizer will run.
     *
     * @param buffer    The buffer to unmap.
     */
    public static void munmap(MappedByteBuffer buffer) {
      if (buffer instanceof sun.nio.ch.DirectBuffer) {
        sun.misc.Cleaner cleaner =
            ((sun.nio.ch.DirectBuffer)buffer).cleaner();
        cleaner.clean();
      }
    }

    /** Linux only methods used for getOwner() implementation */
    private static native long getUIDforFDOwnerforOwner(FileDescriptor fd) throws IOException;
    private static native String getUserName(long uid) throws IOException;

    /**
     * Result type of the fstat call
     */
    public static class Stat {
      private int ownerId, groupId;
      private String owner, group;
      private int mode;

      // Mode constants - Set by JNI
      public static int S_IFMT = -1;    /* type of file */
      public static int S_IFIFO  = -1;  /* named pipe (fifo) */
      public static int S_IFCHR  = -1;  /* character special */
      public static int S_IFDIR  = -1;  /* directory */
      public static int S_IFBLK  = -1;  /* block special */
      public static int S_IFREG  = -1;  /* regular */
      public static int S_IFLNK  = -1;  /* symbolic link */
      public static int S_IFSOCK = -1;  /* socket */
      public static int S_ISUID = -1;  /* set user id on execution */
      public static int S_ISGID = -1;  /* set group id on execution */
      public static int S_ISVTX = -1;  /* save swapped text even after use */
      public static int S_IRUSR = -1;  /* read permission, owner */
      public static int S_IWUSR = -1;  /* write permission, owner */
      public static int S_IXUSR = -1;  /* execute/search permission, owner */

      Stat(int ownerId, int groupId, int mode) {
        this.ownerId = ownerId;
        this.groupId = groupId;
        this.mode = mode;
      }
      
      Stat(String owner, String group, int mode) {
        if (!Shell.WINDOWS) {
          this.owner = owner;
        } else {
          this.owner = stripDomain(owner);
        }
        if (!Shell.WINDOWS) {
          this.group = group;
        } else {
          this.group = stripDomain(group);
        }
        this.mode = mode;
      }
      
      @Override
      public String toString() {
        return "Stat(owner='" + owner + "', group='" + group + "'" +
          ", mode=" + mode + ")";
      }

      public String getOwner() {
        return owner;
      }
      public String getGroup() {
        return group;
      }
      public int getMode() {
        return mode;
      }
    }

    /**
     * Returns the file stat for a file descriptor.
     *
     * @param fd file descriptor.
     * @return the file descriptor file stat.
     * @throws IOException thrown if there was an IO error while obtaining the file stat.
     */
    public static Stat getFstat(FileDescriptor fd) throws IOException {
      Stat stat = null;
      if (!Shell.WINDOWS) {
        stat = fstat(fd); 
        stat.owner = getName(IdCache.USER, stat.ownerId);
        stat.group = getName(IdCache.GROUP, stat.groupId);
      } else {
        try {
          stat = fstat(fd);
        } catch (NativeIOException nioe) {
          if (nioe.getErrorCode() == 6) {
            throw new NativeIOException("The handle is invalid.",
                Errno.EBADF);
          } else {
            LOG.warn(String.format("NativeIO.getFstat error (%d): %s",
                nioe.getErrorCode(), nioe.getMessage()));
            throw new NativeIOException("Unknown error", Errno.UNKNOWN);
          }
        }
      }
      return stat;
    }

    private static String getName(IdCache domain, int id) throws IOException {
      Map<Integer, CachedName> idNameCache = (domain == IdCache.USER)
        ? USER_ID_NAME_CACHE : GROUP_ID_NAME_CACHE;
      String name;
      CachedName cachedName = idNameCache.get(id);
      long now = System.currentTimeMillis();
      if (cachedName != null && (cachedName.timestamp + cacheTimeout) > now) {
        name = cachedName.name;
      } else {
        name = (domain == IdCache.USER) ? getUserName(id) : getGroupName(id);
        if (LOG.isDebugEnabled()) {
          String type = (domain == IdCache.USER) ? "UserName" : "GroupName";
          LOG.debug("Got " + type + " " + name + " for ID " + id +
            " from the native implementation");
        }
        cachedName = new CachedName(name, now);
        idNameCache.put(id, cachedName);
      }
      return name;
    }

    static native String getUserName(int uid) throws IOException;
    static native String getGroupName(int uid) throws IOException;

    private static class CachedName {
      final long timestamp;
      final String name;

      public CachedName(String name, long timestamp) {
        this.name = name;
        this.timestamp = timestamp;
      }
    }

    private static final Map<Integer, CachedName> USER_ID_NAME_CACHE =
      new ConcurrentHashMap<Integer, CachedName>();

    private static final Map<Integer, CachedName> GROUP_ID_NAME_CACHE =
      new ConcurrentHashMap<Integer, CachedName>();

    private enum IdCache { USER, GROUP }

    public final static int MMAP_PROT_READ = 0x1; 
    public final static int MMAP_PROT_WRITE = 0x2; 
    public final static int MMAP_PROT_EXEC = 0x4; 

    public static native long mmap(FileDescriptor fd, int prot,
        boolean shared, long length) throws IOException;

    public static native void munmap(long addr, long length)
        throws IOException;
  }

  private static boolean workaroundNonThreadSafePasswdCalls = false;


  public static class Windows {
    // Flags for CreateFile() call on Windows
    public static final long GENERIC_READ = 0x80000000L;
    public static final long GENERIC_WRITE = 0x40000000L;

    public static final long FILE_SHARE_READ = 0x00000001L;
    public static final long FILE_SHARE_WRITE = 0x00000002L;
    public static final long FILE_SHARE_DELETE = 0x00000004L;

    public static final long CREATE_NEW = 1;
    public static final long CREATE_ALWAYS = 2;
    public static final long OPEN_EXISTING = 3;
    public static final long OPEN_ALWAYS = 4;
    public static final long TRUNCATE_EXISTING = 5;

    public static final long FILE_BEGIN = 0;
    public static final long FILE_CURRENT = 1;
    public static final long FILE_END = 2;
    
    public static final long FILE_ATTRIBUTE_NORMAL = 0x00000080L;

    /**
     * Create a directory with permissions set to the specified mode.  By setting
     * permissions at creation time, we avoid issues related to the user lacking
     * WRITE_DAC rights on subsequent chmod calls.  One example where this can
     * occur is writing to an SMB share where the user does not have Full Control
     * rights, and therefore WRITE_DAC is denied.
     *
     * @param path directory to create
     * @param mode permissions of new directory
     * @throws IOException if there is an I/O error
     */
    public static void createDirectoryWithMode(File path, int mode)
        throws IOException {
      createDirectoryWithMode0(path.getAbsolutePath(), mode);
    }

    /** Wrapper around CreateDirectory() on Windows */
    private static native void createDirectoryWithMode0(String path, int mode)
        throws NativeIOException;

    /** Wrapper around CreateFile() on Windows */
    public static native FileDescriptor createFile(String path,
        long desiredAccess, long shareMode, long creationDisposition)
        throws IOException;

    /**
     * Create a file for write with permissions set to the specified mode.  By
     * setting permissions at creation time, we avoid issues related to the user
     * lacking WRITE_DAC rights on subsequent chmod calls.  One example where
     * this can occur is writing to an SMB share where the user does not have
     * Full Control rights, and therefore WRITE_DAC is denied.
     *
     * This method mimics the semantics implemented by the JDK in
     * {@link FileOutputStream}.  The file is opened for truncate or
     * append, the sharing mode allows other readers and writers, and paths
     * longer than MAX_PATH are supported.  (See io_util_md.c in the JDK.)
     *
     * @param path file to create
     * @param append if true, then open file for append
     * @param mode permissions of new directory
     * @return FileOutputStream of opened file
     * @throws IOException if there is an I/O error
     */
    public static FileOutputStream createFileOutputStreamWithMode(File path,
        boolean append, int mode) throws IOException {
      long desiredAccess = GENERIC_WRITE;
      long shareMode = FILE_SHARE_READ | FILE_SHARE_WRITE;
      long creationDisposition = append ? OPEN_ALWAYS : CREATE_ALWAYS;
      return new FileOutputStream(createFileWithMode0(path.getAbsolutePath(),
          desiredAccess, shareMode, creationDisposition, mode));
    }

    /** Wrapper around CreateFile() with security descriptor on Windows */
    private static native FileDescriptor createFileWithMode0(String path,
        long desiredAccess, long shareMode, long creationDisposition, int mode)
        throws NativeIOException;

    /** Wrapper around SetFilePointer() on Windows */
    public static native long setFilePointer(FileDescriptor fd,
        long distanceToMove, long moveMethod) throws IOException;

    /** Windows only methods used for getOwner() implementation */
    private static native String getOwner(FileDescriptor fd) throws IOException;

    /** Supported list of Windows access right flags */
    public static enum AccessRight {
      ACCESS_READ (0x0001),      // FILE_READ_DATA
      ACCESS_WRITE (0x0002),     // FILE_WRITE_DATA
      ACCESS_EXECUTE (0x0020);   // FILE_EXECUTE

      private final int accessRight;
      AccessRight(int access) {
        accessRight = access;
      }

      public int accessRight() {
        return accessRight;
      }
    };

    /** Windows only method used to check if the current process has requested
     *  access rights on the given path. */
    private static native boolean access0(String path, int requestedAccess);

    /**
     * Checks whether the current process has desired access rights on
     * the given path.
     * 
     * Longer term this native function can be substituted with JDK7
     * function Files#isReadable, isWritable, isExecutable.
     *
     * @param path input path
     * @param desiredAccess ACCESS_READ, ACCESS_WRITE or ACCESS_EXECUTE
     * @return true if access is allowed
     * @throws IOException I/O exception on error
     */
    public static boolean access(String path, AccessRight desiredAccess)
        throws IOException {
      return true;
    }

    /**
     * Extends both the minimum and maximum working set size of the current
     * process.  This method gets the current minimum and maximum working set
     * size, adds the requested amount to each and then sets the minimum and
     * maximum working set size to the new values.  Controlling the working set
     * size of the process also controls the amount of memory it can lock.
     *
     * @param delta amount to increment minimum and maximum working set size
     * @throws IOException for any error
     * @see POSIX#mlock(ByteBuffer, long)
     */
    public static native void extendWorkingSetSize(long delta) throws IOException;

    static {
      if (NativeCodeLoader.isNativeCodeLoaded()) {
        try {
          initNative();
          nativeLoaded = true;
        } catch (Throwable t) {
          // This can happen if the user has an older version of libhadoop.so
          // installed - in this case we can continue without native IO
          // after warning
          PerformanceAdvisory.LOG.debug("Unable to initialize NativeIO libraries", t);
        }
      }
    }
  }

  private static final Log LOG = LogFactory.getLog(NativeIO.class);

  private static boolean nativeLoaded = false;

  static {
    if (NativeCodeLoader.isNativeCodeLoaded()) {
      try {
        initNative();
        nativeLoaded = true;
      } catch (Throwable t) {
        // This can happen if the user has an older version of libhadoop.so
        // installed - in this case we can continue without native IO
        // after warning
        PerformanceAdvisory.LOG.debug("Unable to initialize NativeIO libraries", t);
      }
    }
  }

  /**
   * Return true if the JNI-based native IO extensions are available.
   */
  public static boolean isAvailable() {
    return NativeCodeLoader.isNativeCodeLoaded() && nativeLoaded;
  }

  /** Initialize the JNI method ID and class ID cache */
  private static native void initNative();

  /**
   * Get the maximum number of bytes that can be locked into memory at any
   * given point.
   *
   * @return 0 if no bytes can be locked into memory;
   *         Long.MAX_VALUE if there is no limit;
   *         The number of bytes that can be locked into memory otherwise.
   */
  static long getMemlockLimit() {
    return isAvailable() ? getMemlockLimit0() : 0;
  }

  private static native long getMemlockLimit0();
  
  /**
   * @return the operating system's page size.
   */
  static long getOperatingSystemPageSize() {
    try {
      Field f = Unsafe.class.getDeclaredField("theUnsafe");
      f.setAccessible(true);
      Unsafe unsafe = (Unsafe)f.get(null);
      return unsafe.pageSize();
    } catch (Throwable e) {
      LOG.warn("Unable to get operating system page size.  Guessing 4096.", e);
      return 4096;
    }
  }

  private static class CachedUid {
    final long timestamp;
    final String username;
    public CachedUid(String username, long timestamp) {
      this.timestamp = timestamp;
      this.username = username;
    }
  }
  private static final Map<Long, CachedUid> uidCache =
      new ConcurrentHashMap<Long, CachedUid>();
  private static long cacheTimeout;
  private static boolean initialized = false;
  

  private static String stripDomain(String name) {
    int i = name.indexOf('\\');
    if (i != -1)
      name = name.substring(i + 1);
    return name;
  }

  public static String getOwner(FileDescriptor fd) throws IOException {
    ensureInitialized();
    if (Shell.WINDOWS) {
      String owner = Windows.getOwner(fd);
      owner = stripDomain(owner);
      return owner;
    } else {
      long uid = POSIX.getUIDforFDOwnerforOwner(fd);
      CachedUid cUid = uidCache.get(uid);
      long now = System.currentTimeMillis();
      if (cUid != null && (cUid.timestamp + cacheTimeout) > now) {
        return cUid.username;
      }
      String user = POSIX.getUserName(uid);
      LOG.info("Got UserName " + user + " for UID " + uid
          + " from the native implementation");
      cUid = new CachedUid(user, now);
      uidCache.put(uid, cUid);
      return user;
    }
  }

  /**
   * Create a FileInputStream that shares delete permission on the
   * file opened, i.e. other process can delete the file the
   * FileInputStream is reading. Only Windows implementation uses
   * the native interface.
   */
  public static FileInputStream getShareDeleteFileInputStream(File f)
      throws IOException {
    if (!Shell.WINDOWS) {
      // On Linux the default FileInputStream shares delete permission
      // on the file opened.
      //
      return new FileInputStream(f);
    } else {
      // Use Windows native interface to create a FileInputStream that
      // shares delete permission on the file opened.
      //
      FileDescriptor fd = Windows.createFile(
          f.getAbsolutePath(),
          Windows.GENERIC_READ,
          Windows.FILE_SHARE_READ |
              Windows.FILE_SHARE_WRITE |
              Windows.FILE_SHARE_DELETE,
          Windows.OPEN_EXISTING);
      return new FileInputStream(fd);
    }
  }

  /**
   * Create a FileInputStream that shares delete permission on the
   * file opened at a given offset, i.e. other process can delete
   * the file the FileInputStream is reading. Only Windows implementation
   * uses the native interface.
   */
  public static FileInputStream getShareDeleteFileInputStream(File f, long seekOffset)
      throws IOException {
    if (!Shell.WINDOWS) {
      RandomAccessFile rf = new RandomAccessFile(f, "r");
      if (seekOffset > 0) {
        rf.seek(seekOffset);
      }
      return new FileInputStream(rf.getFD());
    } else {
      // Use Windows native interface to create a FileInputStream that
      // shares delete permission on the file opened, and set it to the
      // given offset.
      //
      FileDescriptor fd = Windows.createFile(
          f.getAbsolutePath(),
          Windows.GENERIC_READ,
          Windows.FILE_SHARE_READ |
              Windows.FILE_SHARE_WRITE |
              Windows.FILE_SHARE_DELETE,
          Windows.OPEN_EXISTING);
      if (seekOffset > 0)
        Windows.setFilePointer(fd, seekOffset, Windows.FILE_BEGIN);
      return new FileInputStream(fd);
    }
  }

  /**
   * Create the specified File for write access, ensuring that it does not exist.
   * @param f the file that we want to create
   * @param permissions we want to have on the file (if security is enabled)
   *
   * @throws AlreadyExistsException if the file already exists
   * @throws IOException if any other error occurred
   */
  public static FileOutputStream getCreateForWriteFileOutputStream(File f, int permissions)
      throws IOException {
    if (!Shell.WINDOWS) {
      // Use the native wrapper around open(2)
      try {
        FileDescriptor fd = POSIX.open(f.getAbsolutePath(),
            POSIX.O_WRONLY | POSIX.O_CREAT
                | POSIX.O_EXCL, permissions);
        return new FileOutputStream(fd);
      } catch (NativeIOException nioe) {
        if (nioe.getErrno() == Errno.EEXIST) {
          throw new AlreadyExistsException(nioe);
        }
        throw nioe;
      }
    } else {
      // Use the Windows native APIs to create equivalent FileOutputStream
      try {
        FileDescriptor fd = Windows.createFile(f.getCanonicalPath(),
            Windows.GENERIC_WRITE,
            Windows.FILE_SHARE_DELETE
                | Windows.FILE_SHARE_READ
                | Windows.FILE_SHARE_WRITE,
            Windows.CREATE_NEW);
        POSIX.chmod(f.getCanonicalPath(), permissions);
        return new FileOutputStream(fd);
      } catch (NativeIOException nioe) {
        if (nioe.getErrorCode() == 80) {
          // ERROR_FILE_EXISTS
          // 80 (0x50)
          // The file exists
          throw new AlreadyExistsException(nioe);
        }
        throw nioe;
      }
    }
  }

  private synchronized static void ensureInitialized() {
    if (!initialized) {
      cacheTimeout =
          new Configuration().getLong("hadoop.security.uid.cache.secs",
              4*60*60) * 1000;
      LOG.info("Initialized cache for UID to User mapping with a cache" +
          " timeout of " + cacheTimeout/1000 + " seconds.");
      initialized = true;
    }
  }
  
  /**
   * A version of renameTo that throws a descriptive exception when it fails.
   *
   * @param src                  The source path
   * @param dst                  The destination path
   * 
   * @throws NativeIOException   On failure.
   */
  public static void renameTo(File src, File dst)
      throws IOException {
    if (!nativeLoaded) {
      if (!src.renameTo(dst)) {
        throw new IOException("renameTo(src=" + src + ", dst=" +
          dst + ") failed.");
      }
    } else {
      renameTo0(src.getAbsolutePath(), dst.getAbsolutePath());
    }
  }


  @Deprecated
  public static void link(File src, File dst) throws IOException {
    if (!nativeLoaded) {
      HardLink.createHardLink(src, dst);
    } else {
      link0(src.getAbsolutePath(), dst.getAbsolutePath());
    }
  }

  /**
   * A version of renameTo that throws a descriptive exception when it fails.
   *
   * @param src                  The source path
   * @param dst                  The destination path
   * 
   * @throws NativeIOException   On failure.
   */
  private static native void renameTo0(String src, String dst)
      throws NativeIOException;

  private static native void link0(String src, String dst)
      throws NativeIOException;

  /**
   * Unbuffered file copy from src to dst without tainting OS buffer cache
   *
   * In POSIX platform:
   * It uses FileChannel#transferTo() which internally attempts
   * unbuffered IO on OS with native sendfile64() support and falls back to
   * buffered IO otherwise.
   *
   * It minimizes the number of FileChannel#transferTo call by passing the the
   * src file size directly instead of a smaller size as the 3rd parameter.
   * This saves the number of sendfile64() system call when native sendfile64()
   * is supported. In the two fall back cases where sendfile is not supported,
   * FileChannle#transferTo already has its own batching of size 8 MB and 8 KB,
   * respectively.
   *
   * In Windows Platform:
   * It uses its own native wrapper of CopyFileEx with COPY_FILE_NO_BUFFERING
   * flag, which is supported on Windows Server 2008 and above.
   *
   * Ideally, we should use FileChannel#transferTo() across both POSIX and Windows
   * platform. Unfortunately, the wrapper(Java_sun_nio_ch_FileChannelImpl_transferTo0)
   * used by FileChannel#transferTo for unbuffered IO is not implemented on Windows.
   * Based on OpenJDK 6/7/8 source code, Java_sun_nio_ch_FileChannelImpl_transferTo0
   * on Windows simply returns IOS_UNSUPPORTED.
   *
   * Note: This simple native wrapper does minimal parameter checking before copy and
   * consistency check (e.g., size) after copy.
   * It is recommended to use wrapper function like
   * the Storage#nativeCopyFileUnbuffered() function in hadoop-hdfs with pre/post copy
   * checks.
   *
   * @param src                  The source path
   * @param dst                  The destination path
   * @throws IOException
   */
  public static void copyFileUnbuffered(File src, File dst) throws IOException {
    if (nativeLoaded && Shell.WINDOWS) {
      copyFileUnbuffered0(src.getAbsolutePath(), dst.getAbsolutePath());
    } else {
      FileInputStream fis = null;
      FileOutputStream fos = null;
      FileChannel input = null;
      FileChannel output = null;
      try {
        fis = new FileInputStream(src);
        fos = new FileOutputStream(dst);
        input = fis.getChannel();
        output = fos.getChannel();
        long remaining = input.size();
        long position = 0;
        long transferred = 0;
        while (remaining > 0) {
          transferred = input.transferTo(position, remaining, output);
          remaining -= transferred;
          position += transferred;
        }
      } finally {
        IOUtils.cleanup(LOG, output);
        IOUtils.cleanup(LOG, fos);
        IOUtils.cleanup(LOG, input);
        IOUtils.cleanup(LOG, fis);
      }
    }
  }

  private static native void copyFileUnbuffered0(String src, String dst)
      throws NativeIOException;
}

然後我們的執行流程寫好之後我們要驗證程式碼是否正確

我們需要開啟虛擬機器

啟動hadoop服務

然後我們執行程式碼

如果程式碼不報錯我們就已經成功了一大半

我們剩下只需要輸入命令檢測就可以了

如果和你之前的配的一樣那就代表你成功了。

Linux 單詞計數 WordCount 以及程式碼案例

WordCount 首先是命令列的： WordCount(單詞計數) 1:啟動hadoop 使用 start-all.sh 命令啟動hdfs 2:在hadoop的安裝目錄下新建一個目錄，使用hdfs的shell命令 cd /usr/local

Hadoop之MapReduce過程，單詞計數WordCount

單詞計數是最簡單也是最能體現MapReduce思想的程式之一，可以稱為MapReduce版“Hello World”，該程式的完整程式碼可以在Hadoop安裝包的src/example目錄下找到。單詞計數主要完成的功能：統計一系列文字檔案中每個單詞出現的次數，如下圖所示。 WordCo

Hadoop WordCount單詞計數原理

clas oop 圖片 tput 進行打包 red div src 計算文件中出現每個單詞的頻數輸入結果按照字母順序進行排序編寫WordCount.java 包含Mapper類和Reducer類編譯WordCount.java javac -classp

Scala +Spark+Hadoop+Zookeeper+IDEA實現WordCount單詞計數（簡單例項）

IDEA+Scala +Spark實現wordCount單詞計數一、新建一個Scala的object單例物件，修改pom檔案（1）下面文章可以幫助參考安裝 IDEA 和新建一個Scala程式。（2）pom檔案 <?xml

搶紅包案例分析以及程式碼實現

概述電商的秒殺、搶購，春運搶票，微信QQ搶紅包，從技術的角度來說，這對於Web 系統是一個很大的

搶紅包案例分析以及程式碼實現（三）

前文回顧接下來我們使用樂觀鎖的方式來修復紅包超發的bug樂觀鎖樂觀鎖是一種不會阻塞其他執行緒併發

linux find下如何統計一個目錄下的檔案個數以及程式碼總行數的命令

知道指定字尾名的檔案總個數命令： find . -name "*.html" | wc -l 知道一個目錄下程式碼總行數以及單個檔案行數： find . -name "*.html"

搶紅包案例分析以及程式碼實現（三）侵立刪

轉自：https://mp.weixin.qq.com/s/Pp-nCYrzXXXfLcFFS_ttWg 前文回顧搶紅包案例分析以及程式碼實現（一）搶紅包案例分析以及程式碼實現（二）接下來我們使用樂觀鎖的方式來修復紅包超發的bug

搶紅包案例分析以及程式碼實現（二）侵立刪

轉自：https://mp.weixin.qq.com/s/F1U1nUK2KF5R0nxT8lmfBg 概述上一篇文章中使用ssm+mysql實現，存在併發超發問題，這裡我們使用悲觀鎖的方式來解決這個邏輯錯誤，並驗證資料一致性和效能狀況。超發問題分析針對

搶紅包案例分析以及程式碼實現（一）侵立刪

轉自：https://mp.weixin.qq.com/s/d3HyAtWua38TSpelF-v6nQ 概述電商的秒殺、搶購，春運搶票，微信QQ搶紅包，從技術的角度來說，這對於Web 系統是一個很大的考驗. 高併發場景下，系統的優化和穩定是至關重要的. 網際網路的開

hadoop入門（六）JavaAPI+Mapreduce例項wordCount單詞計數詳解

剛剛研究了一下haoop官網單詞計數的例子，把詳細步驟解析貼在下面：準備工作： 1、haoop叢集環境搭建完成 2、新建一個檔案hello,並寫入2行單詞，如下： [[email protected] hadoop-2.6.0]# vi hello hello

storm1.2.1-wordcount可靠的單詞計數

專案原始碼下載：https://download.csdn.net/download/adam_zs/10294019測試程式運行了5次，每次失敗的訊息都會再次傳送。SentenceSpout->SplitSentenceBolt->WordCountBolt-&

Scala+Spark+Hadoop+IDEA實現WordCount單詞計數，上傳並執行任務（簡單例項-下）

Scala+Spark+Hadoop+IDEA上傳並執行任務本文接續上一篇文章，已經在IDEA中執行Spark任務執行完畢，測試成功。一、打包 1.1 將setMaster註釋掉 package day05 import

linux下如何統計一個目錄下的檔案個數以及程式碼總行數的命令

知道指定字尾名的檔案總個數命令： find . -name "*.cpp" | wc -l 知道一個目錄下程式碼總行數以及單個檔案行數： find . -name "*.h" | xargs wc -l linux統計資料夾中檔案數目第一種方法：ls -l|grep “^-”|wc -lls -l 長列表

Hadoop實戰（一），單詞計數（wordcount）

目的通過特定Hadoop Demo實戰，瞭解、學習、掌握大資料框架日常使用及嘗試挑戰大資料研發過程中遇到的挑戰等。場景描述運用MapReduce 進行簡單的單詞計數統計。實驗

map/reduce例項wordCount單詞計數實現功能

hadoop hadoop hadoop dajiangtai dajiangtai dajiangtai hsg qq.com hello you hello me her map/reduce處理功能執行步驟： 1. map任務處理

linux服務器使用以及SCP命令

scp 命令 span 比較 windows 復制。 png http 端口 ace p.p1 { margin: 0.0px 0.0px 2.0px 0.0px; font: 14.0px ".PingFang SC"; color: #454545 } p.p2 { m

cookie的基礎以及小案例

腳本元素 direct his 常用變量所有 arr 編譯 date 1.會話技術　　用戶打開一個瀏覽器訪問頁面,訪問網站的很多頁面,訪問完成後將瀏覽器關閉的過程稱為是一次會話 cookie:將數據保存到客戶端瀏覽器 session:將數據保存到服務器端向瀏覽器保

linux下memcached安裝以及啟動

啟動參數 actor 是否設置查看解決 bsp rem 內存 1.下載memcached服務器端安裝文件版本： memcached-1.4.2.tar.gz 下載地址：http://www.danga.com/memcached/download.

storm單詞計數本地運行

cep cal txt wordcount ioe ktr ren pos 分割 import java.io.File; import java.io.IOException; import java.util.Collection; import java.u

Linux 單詞計數 WordCount 以及程式碼案例

WordCount(單詞計數)

相關推薦