Spring Hadoop Yarn HA問題調研

阿新 • • 發佈：2018-12-23

Spring XD on Yarn在使用過程中發現不論是YarnClient還是AppMaster對Yarn HA的支援都不好。在Yarn的RM重啟或切換的情況下，YarnClient必須修改配置檔案中的RM地址才可以繼續使用，即使在配置檔案中配置了Yarn HA的相關配置也不生效。而AppMaster同樣會因為長時間監測不到心跳而被Yarn Kill掉，導致XD服務掛掉。因此，此調研的目標實際是為了解決XD on Yarn對RM HA的支援問題。

YarnClient問題調研

Apache YarnClient

官方的YarnClient通過 ClientRMProxy#createRMProxy 來建立底層通訊用proxy。而在改函式中，會判斷是否開啟了Yarn HA。即

<property>
    <name>yarn.resourcemanager.ha.enabled</name>
    <value>true</value>
</property>

/**
 * Create a proxy for the specified protocol. For non-HA,
 * this is a direct connection to the ResourceManager address. When HA is
 * enabled, the proxy handles the failover between the ResourceManagers as
 * well.
 */ 

@Private
protected static <T> T createRMProxy(final Configuration configuration,
    final Class<T> protocol, RMProxy instance) throws IOException {
  YarnConfiguration conf = (configuration instanceof YarnConfiguration)
      ? (YarnConfiguration) configuration
      : new YarnConfiguration 
(configuration);
  RetryPolicy retryPolicy = createRetryPolicy(conf);
  if (HAUtil.isHAEnabled(conf)) {
    RMFailoverProxyProvider<T> provider =
        instance.createRMFailoverProxyProvider(conf, protocol);
    return (T) RetryProxy.create(protocol, provider, retryPolicy);
  } else {
    InetSocketAddress rmAddress = instance.getRMAddress(conf, protocol);
    LOG.info("Connecting to ResourceManager at " + rmAddress);
    T proxy = RMProxy.<T>getProxy(conf, protocol, rmAddress);
    return (T) RetryProxy.create(protocol, proxy, retryPolicy);
  }
}

如果開啟了HA，則會建立一個Proxy的動態程式碼 RetryInvocationHandler。Client執行具體的任務時，會執行RetryInvocationHandler#invoke方法

@Override
public Object invoke(Object proxy, Method method, Object[] args)
  throws Throwable {
  RetryPolicy policy = methodNameToPolicyMap.get(method.getName());
  if (policy == null) {
    policy = defaultPolicy;
  }
  
  // The number of times this method invocation has been failed over.
  int invocationFailoverCount = 0;
  final boolean isRpc = isRpcInvocation(currentProxy.proxy);
  final int callId = isRpc? Client.nextCallId(): RpcConstants.INVALID_CALL_ID;
  int retries = 0;
  while (true) {
    // The number of times this invocation handler has ever been failed over,
    // before this method invocation attempt. Used to prevent concurrent
    // failed method invocations from triggering multiple failover attempts.
    long invocationAttemptFailoverCount;
    synchronized (proxyProvider) {
      invocationAttemptFailoverCount = proxyProviderFailoverCount;
    }

    if (isRpc) {
      Client.setCallIdAndRetryCount(callId, retries);
    }
    try {
      Object ret = invokeMethod(method, args);
      hasMadeASuccessfulCall = true;
      return ret;
    } catch (Exception e) {
      boolean isIdempotentOrAtMostOnce = proxyProvider.getInterface()
          .getMethod(method.getName(), method.getParameterTypes())
          .isAnnotationPresent(Idempotent.class);
      if (!isIdempotentOrAtMostOnce) {
        isIdempotentOrAtMostOnce = proxyProvider.getInterface()
            .getMethod(method.getName(), method.getParameterTypes())
            .isAnnotationPresent(AtMostOnce.class);
      }
      RetryAction action = policy.shouldRetry(e, retries++,
          invocationFailoverCount, isIdempotentOrAtMostOnce);
      if (action.action == RetryAction.RetryDecision.FAIL) {
        if (action.reason != null) {
          LOG.warn("Exception while invoking " + currentProxy.proxy.getClass()
              + "." + method.getName() + " over " + currentProxy.proxyInfo
              + ". Not retrying because " + action.reason, e);
        }
        throw e;
      } else { // retry or failover
        // avoid logging the failover if this is the first call on this
        // proxy object, and we successfully achieve the failover without
        // any flip-flopping
        boolean worthLogging = 
          !(invocationFailoverCount == 0 && !hasMadeASuccessfulCall);
        worthLogging |= LOG.isDebugEnabled();
        if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY &&
            worthLogging) {
          String msg = "Exception while invoking " + method.getName()
              + " of class " + currentProxy.proxy.getClass().getSimpleName()
              + " over " + currentProxy.proxyInfo;

          if (invocationFailoverCount > 0) {
            msg += " after " + invocationFailoverCount + " fail over attempts"; 
          }
          msg += ". Trying to fail over " + formatSleepMessage(action.delayMillis);
          LOG.info(msg, e);
        } else {
          if(LOG.isDebugEnabled()) {
            LOG.debug("Exception while invoking " + method.getName()
                + " of class " + currentProxy.proxy.getClass().getSimpleName()
                + " over " + currentProxy.proxyInfo + ". Retrying "
                + formatSleepMessage(action.delayMillis), e);
          }
        }
        
        if (action.delayMillis > 0) {
          Thread.sleep(action.delayMillis);
        }
        
        if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
          // Make sure that concurrent failed method invocations only cause a
          // single actual fail over.
          synchronized (proxyProvider) {
            if (invocationAttemptFailoverCount == proxyProviderFailoverCount) {
              proxyProvider.performFailover(currentProxy.proxy);
              proxyProviderFailoverCount++;
            } else {
              LOG.warn("A failover has occurred since the start of this method"
                  + " invocation attempt.");
            }
            currentProxy = proxyProvider.getProxy();
          }
          invocationFailoverCount++;
        }
      }
    }
  }
}

在該方法中處理了如果訪問失敗後會根據配置進行重試處理。具體的FailoverProvider配置在：

<property>
    <name>yarn.client.failover-proxy-provider</name>
    <value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>

因此官方的YarnClient實際是支援Yarn HA的。 Spring YarnClient 在Spring Hadoop中，底層操作封裝在xxTemplate中，在YarnRpcAccessor中，建立底層通訊proxy：

@Override
public void afterPropertiesSet() throws Exception {
   Assert.notNull(configuration, "Yarn configuration must be set");
   Assert.notNull(protocolClazz, "Rpc protocol class must be set");
   if (UserGroupInformation.isSecurityEnabled()) {
      UserGroupInformation.setConfiguration(configuration);
   }
   address = getRpcAddress(configuration);
   proxy = createProxy();
}

而此處建立的proxy，並沒有相關HA邏輯的處理。修改方案修改YarnRpcAccessor類中，建立proxy的邏輯如下：

@Override
    public void afterPropertiesSet() throws Exception {
        Assert.notNull(configuration, "Yarn configuration must be set");
        Assert.notNull(protocolClazz, "Rpc protocol class must be set");
        if (UserGroupInformation.isSecurityEnabled()) {
            UserGroupInformation.setConfiguration(configuration);
        }
        address = getRpcAddress(configuration);
//    proxy = createProxy();
        if (protocolClazz.isAssignableFrom(ClientRMProtocols.class)) {
            proxy = ClientRMProxy.createRMProxy(configuration, protocolClazz);
        } else {
            proxy = createProxy();
        }
    }

對於Client和AM-RM通訊，走原生的ClientRMProxy建立邏輯，對於AM-NM通訊，走原Spring邏輯。

AppMaster HA

AppMaster的高可用，除了需要支援多RM配置和連線重試外，還需要支援在RM重啟後，re-register AM。這個同樣在Apache原生的AMRMClientAsyncImpl中，有相應處理：

@Override
public AllocateResponse allocate(float progressIndicator) 
    throws YarnException, IOException {
  Preconditions.checkArgument(progressIndicator >= 0,
      "Progress indicator should not be negative");
  AllocateResponse allocateResponse = null;
  List<ResourceRequest> askList = null;
  List<ContainerId> releaseList = null;
  AllocateRequest allocateRequest = null;
  List<String> blacklistToAdd = new ArrayList<String>();
  List<String> blacklistToRemove = new ArrayList<String>();
  
  try {
    synchronized (this) {
      askList = new ArrayList<ResourceRequest>(ask.size());
      for(ResourceRequest r : ask) {
        // create a copy of ResourceRequest as we might change it while the 
        // RPC layer is using it to send info across
        askList.add(ResourceRequest.newInstance(r.getPriority(),
            r.getResourceName(), r.getCapability(), r.getNumContainers(),
            r.getRelaxLocality(), r.getNodeLabelExpression()));
      }
      releaseList = new ArrayList<ContainerId>(release);
      // optimistically clear this collection assuming no RPC failure
      ask.clear();
      release.clear();

      blacklistToAdd.addAll(blacklistAdditions);
      blacklistToRemove.addAll(blacklistRemovals);
      
      ResourceBlacklistRequest blacklistRequest =
          ResourceBlacklistRequest.newInstance(blacklistToAdd,
              blacklistToRemove);
      
      allocateRequest =
          AllocateRequest.newInstance(lastResponseId, progressIndicator,
            askList, releaseList, blacklistRequest);
      // clear blacklistAdditions and blacklistRemovals before 
      // unsynchronized part
      blacklistAdditions.clear();
      blacklistRemovals.clear();
    }

    try {
      allocateResponse = rmClient.allocate(allocateRequest);
    } catch (ApplicationMasterNotRegisteredException e) {
      LOG.warn("ApplicationMaster is out of sync with ResourceManager,"
          + " hence resyncing.");
      synchronized (this) {
        release.addAll(this.pendingRelease);
        blacklistAdditions.addAll(this.blacklistedNodes);
        for (Map<String, TreeMap<Resource, ResourceRequestInfo>> rr : remoteRequestsTable
          .values()) {
          for (Map<Resource, ResourceRequestInfo> capabalities : rr.values()) {
            for (ResourceRequestInfo request : capabalities.values()) {
              addResourceRequestToAsk(request.remoteRequest);
            }
          }
        }
      }
      // re register with RM
      registerApplicationMaster();
      allocateResponse = allocate(progressIndicator);
      return allocateResponse;
    }

    synchronized (this) {
      // update these on successful RPC
      clusterNodeCount = allocateResponse.getNumClusterNodes();
      lastResponseId = allocateResponse.getResponseId();
      clusterAvailableResources = allocateResponse.getAvailableResources();
      if (!allocateResponse.getNMTokens().isEmpty()) {
        populateNMTokens(allocateResponse.getNMTokens());
      }
      if (allocateResponse.getAMRMToken() != null) {
        updateAMRMToken(allocateResponse.getAMRMToken());
      }
      if (!pendingRelease.isEmpty()
          && !allocateResponse.getCompletedContainersStatuses().isEmpty()) {
        removePendingReleaseRequests(allocateResponse
            .getCompletedContainersStatuses());
      }
    }
  } finally {
    // TODO how to differentiate remote yarn exception vs error in rpc
    if(allocateResponse == null) {
      // we hit an exception in allocate()
      // preserve ask and release for next call to allocate()
      synchronized (this) {
        release.addAll(releaseList);
        // requests could have been added or deleted during call to allocate
        // If requests were added/removed then there is nothing to do since
        // the ResourceRequest object in ask would have the actual new value.
        // If ask does not have this ResourceRequest then it was unchanged and
        // so we can add the value back safely.
        // This assumes that there will no concurrent calls to allocate() and
        // so we dont have to worry about ask being changed in the
        // synchronized block at the beginning of this method.
        for(ResourceRequest oldAsk : askList) {
          if(!ask.contains(oldAsk)) {
            ask.add(oldAsk);
          }
        }
        
        blacklistAdditions.addAll(blacklistToAdd);
        blacklistRemovals.addAll(blacklistToRemove);
      }
    }
  }
  return allocateResponse;
}

而Spring Hadoop 所使用的AM**Template沒有如此邏輯，因此修改如下，在AppmasterRmTemplate中增加如下介面和實現：

@Override
public AllocateResponse allocate(final AllocateRequest request, final String host, final Integer rpcPort, final String trackUrl) {
    return execute(new YarnRpcCallback<AllocateResponse, ApplicationMasterProtocol>() {
        @Override
        public AllocateResponse doInYarn(ApplicationMasterProtocol proxy) throws YarnException, IOException {
            return doAllocate(proxy, request, host, rpcPort, trackUrl);
        }
        
        private AllocateResponse doAllocate(ApplicationMasterProtocol proxy, AllocateRequest request, final String host, final Integer rpcPort, final String trackUrl) throws IOException, YarnException {
            AllocateResponse allocateResponse = null;
            try {
                allocateResponse = proxy.allocate(request);
            } catch (ApplicationMasterNotRegisteredException e) {
                log.warn("ApplicationMaster is out of sync with ResourceManager,"
                                 + " hence resyncing.");
                // re register with RM
                log.info("Re-register am with RM.");
                registerApplicationMaster(host, rpcPort, trackUrl);
                allocateResponse = doAllocate(proxy, request, host, rpcPort, trackUrl);
                return allocateResponse;
            }
            return allocateResponse;
        }
    });
}

呼叫者DefaultContainerAllocator修改：

AppmasterService appmasterClientService = YarnContextUtils.getAppmasterClientService(getBeanFactory());
AppmasterTrackService appmasterTrackService = YarnContextUtils.getAppmasterTrackService(getBeanFactory());
String host = appmasterClientService == null ? "" : appmasterClientService.getHost();
int port = appmasterClientService == null ? 0 : appmasterClientService.getPort();
String trackUrl = appmasterTrackService == null ? null : appmasterTrackService.getTrackUrl();
log.info("Host: " + host + " ,port: " + port + ", trackUrl: " + trackUrl);
AllocateResponse allocate = getRmTemplate().allocate(request, host, port, trackUrl);

即可。重新打包次此spring-yarn-core.jar，替換springxd中的jar，即可實現Yarn的HA支援。

配置檔案

# Hadoop properties
spring:
  hadoop:
    fsUri: hdfs://xxx
    resourceManagerHost: xxx
#    resourceManagerHost: yarn-cluster
    resourceManagerPort: 8032
 #   rmAddress: yarn-cluster
#    resourceManagerSchedulerAddress: ${spring.hadoop.resourceManagerHost}:8030
 #   jobHistoryAddress: xxx
 
 ## For phd30 only (values for version 3.0.1.0, also change resourceManagerPort above to 8050)
#    config:
#      mapreduce.application.framework.path: '/phd/apps/3.0.1.0-1/mapreduce/mapreduce.tar.gz#mr-framework'
#      mapreduce.application.classpath: '$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/phd/3.0.1.0-1/hadoop/lib/hadoop-lzo-0.6.0.3.0.1.0-1.jar:/etc/hadoop/conf/secure'
## For hdp22 only (values for version 2.2.8.0, also change resourceManagerPort above to 8050)
    config:
      mapreduce.application.framework.path: ${spring.yarn.config.mapreduce.application.framework.path}
      mapreduce.application.classpath: ${spring.yarn.config.mapreduce.application.classpath}
      net.topology.script.file.name: /etc/hadoop/conf/topology_script.py
      dfs.namenode.rpc-address: xxx.xxx.xxx.xxx:8020
      dfs.nameservices: xxx
      dfs.ha.namenodes.xxx: nn1,nn2
      dfs.namenode.rpc-address.xxx.nn1: xxx.xxx.xxx.xxx:8020
      dfs.namenode.rpc-address.xxx.nn2: xxx.xxx.xxx.xxx:8020
      dfs.client.failover.proxy.provider.xxx: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
      yarn.resourcemanager.ha.enabled: true
      yarn.resourcemanager.ha.rm-ids: rm1,rm2
      yarn.resourcemanager.cluster-id: yarn-cluster
      yarn.resourcemanager.address.rm1: xxx.xxx.xxx.xxx
      yarn.resourcemanager.scheduler.address.rm1: xxx.xxx.xxx.xxx:8030
      yarn.resourcemanager.admin.address.rm1: xxx.xxx.xxx.xxx:8033
      yarn.resourcemanager.webapp.address.rm1: xxx.xxx.xxx.xxx:8088
      yarn.resource.resource-tracker.address.rm1: xxx.xxx.xxx.xxx:8031
      yarn.resourcemanager.address.rm2: xxx.xxx.xxx.xxx
      yarn.resourcemanager.scheduler.address.rm2: xxx.xxx.xxx.xxx:8030
      yarn.resourcemanager.admin.address.rm2: xxx.xxx.xxx.xxx:8033
      yarn.resourcemanager.webapp.address.rm2: xxx.xxx.xxx.xxx:8088
      yarn.resource.resource-tracker.address.rm2: xxx.xxx.xxx.xxx:8031
      yarn.resourcemanager.zk-address: xxx.xxx.xxx.xxx:2181,xxx.xxx.xxx.xxx:2181,xxx.xxx.xxx.xxx:2181
      yarn.resourcemanager.recovery.enabled: true

需要在Spring->hadoop->config 下，增加yarn高可用相關配置。

Spring Hadoop Yarn HA問題調研

YarnClient問題調研

Apache YarnClient

AppMaster HA

配置檔案

Spring Hadoop Yarn HA問題調研

Hadoop 2.5.2 HDFS HA+YARN HA 應用配置

Hadoop 叢集之HDFS HA、Yarn HA

Hadoop(25)-高可用叢集配置,HDFS-HA和YARN-HA

Hadoop - HDFS - MapReduce - YARN - HA詳解

Hadoop 叢集之HDFS HA、Yarn HA

Hadoop - YARN 通信協議

【Hadoop】HA 場景下訪問 HDFS JAVA API Client

17-hadoop-yarn安裝

hadoop YARN

Hadoop YARN中內存的設置

你的數據安全麽？Hadoop再曝安全漏洞| 黑客利用Hadoop Yarn資源管理系統未授權訪問漏洞

Hadoop Yarn調度器的選擇和使用

Hadoop的HA環境搭建

Hadoop NameNode HA模式的搭建以及原理

Apache Hadoop YARN

傳統應用遷移到kubernetes（Hadoop YARN）

大資料之Spark（二）--- RDD，RDD變換，RDD的Action，解決spark的資料傾斜問題，spark整合hadoop的HA

部署hadoop叢集ha模式常見的問題，以及解決方案

Apache Hadoop YARN （官網文章）

Spring Hadoop Yarn HA問題調研

YarnClient問題調研

Apache YarnClient

AppMaster HA

配置檔案

相關推薦