Spring Hadoop Yarn HA問題調研
Spring XD on Yarn在使用過程中發現不論是YarnClient還是AppMaster對Yarn HA的支援都不好。在Yarn的RM重啟或切換的情況下,YarnClient必須修改配置檔案中的RM地址才可以繼續使用,即使在配置檔案中配置了Yarn HA的相關配置也不生效。而AppMaster同樣會因為長時間監測不到心跳而被Yarn Kill掉,導致XD服務掛掉。 因此,此調研的目標實際是為了解決XD on Yarn對RM HA的支援問題。
YarnClient問題調研
Apache YarnClient
官方的YarnClient通過 ClientRMProxy#createRMProxy 來建立底層通訊用proxy。而在改函式中,會判斷是否開啟了Yarn HA。即
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
/**
* Create a proxy for the specified protocol. For non-HA,
* this is a direct connection to the ResourceManager address. When HA is
* enabled, the proxy handles the failover between the ResourceManagers as
* well.
*/
@Private
protected static <T> T createRMProxy(final Configuration configuration,
final Class<T> protocol, RMProxy instance) throws IOException {
YarnConfiguration conf = (configuration instanceof YarnConfiguration)
? (YarnConfiguration) configuration
: new YarnConfiguration (configuration);
RetryPolicy retryPolicy = createRetryPolicy(conf);
if (HAUtil.isHAEnabled(conf)) {
RMFailoverProxyProvider<T> provider =
instance.createRMFailoverProxyProvider(conf, protocol);
return (T) RetryProxy.create(protocol, provider, retryPolicy);
} else {
InetSocketAddress rmAddress = instance.getRMAddress(conf, protocol);
LOG.info("Connecting to ResourceManager at " + rmAddress);
T proxy = RMProxy.<T>getProxy(conf, protocol, rmAddress);
return (T) RetryProxy.create(protocol, proxy, retryPolicy);
}
}
如果開啟了HA,則會建立一個Proxy的動態程式碼 RetryInvocationHandler。Client執行具體的任務時,會執行RetryInvocationHandler#invoke方法
@Override
public Object invoke(Object proxy, Method method, Object[] args)
throws Throwable {
RetryPolicy policy = methodNameToPolicyMap.get(method.getName());
if (policy == null) {
policy = defaultPolicy;
}
// The number of times this method invocation has been failed over.
int invocationFailoverCount = 0;
final boolean isRpc = isRpcInvocation(currentProxy.proxy);
final int callId = isRpc? Client.nextCallId(): RpcConstants.INVALID_CALL_ID;
int retries = 0;
while (true) {
// The number of times this invocation handler has ever been failed over,
// before this method invocation attempt. Used to prevent concurrent
// failed method invocations from triggering multiple failover attempts.
long invocationAttemptFailoverCount;
synchronized (proxyProvider) {
invocationAttemptFailoverCount = proxyProviderFailoverCount;
}
if (isRpc) {
Client.setCallIdAndRetryCount(callId, retries);
}
try {
Object ret = invokeMethod(method, args);
hasMadeASuccessfulCall = true;
return ret;
} catch (Exception e) {
boolean isIdempotentOrAtMostOnce = proxyProvider.getInterface()
.getMethod(method.getName(), method.getParameterTypes())
.isAnnotationPresent(Idempotent.class);
if (!isIdempotentOrAtMostOnce) {
isIdempotentOrAtMostOnce = proxyProvider.getInterface()
.getMethod(method.getName(), method.getParameterTypes())
.isAnnotationPresent(AtMostOnce.class);
}
RetryAction action = policy.shouldRetry(e, retries++,
invocationFailoverCount, isIdempotentOrAtMostOnce);
if (action.action == RetryAction.RetryDecision.FAIL) {
if (action.reason != null) {
LOG.warn("Exception while invoking " + currentProxy.proxy.getClass()
+ "." + method.getName() + " over " + currentProxy.proxyInfo
+ ". Not retrying because " + action.reason, e);
}
throw e;
} else { // retry or failover
// avoid logging the failover if this is the first call on this
// proxy object, and we successfully achieve the failover without
// any flip-flopping
boolean worthLogging =
!(invocationFailoverCount == 0 && !hasMadeASuccessfulCall);
worthLogging |= LOG.isDebugEnabled();
if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY &&
worthLogging) {
String msg = "Exception while invoking " + method.getName()
+ " of class " + currentProxy.proxy.getClass().getSimpleName()
+ " over " + currentProxy.proxyInfo;
if (invocationFailoverCount > 0) {
msg += " after " + invocationFailoverCount + " fail over attempts";
}
msg += ". Trying to fail over " + formatSleepMessage(action.delayMillis);
LOG.info(msg, e);
} else {
if(LOG.isDebugEnabled()) {
LOG.debug("Exception while invoking " + method.getName()
+ " of class " + currentProxy.proxy.getClass().getSimpleName()
+ " over " + currentProxy.proxyInfo + ". Retrying "
+ formatSleepMessage(action.delayMillis), e);
}
}
if (action.delayMillis > 0) {
Thread.sleep(action.delayMillis);
}
if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {
// Make sure that concurrent failed method invocations only cause a
// single actual fail over.
synchronized (proxyProvider) {
if (invocationAttemptFailoverCount == proxyProviderFailoverCount) {
proxyProvider.performFailover(currentProxy.proxy);
proxyProviderFailoverCount++;
} else {
LOG.warn("A failover has occurred since the start of this method"
+ " invocation attempt.");
}
currentProxy = proxyProvider.getProxy();
}
invocationFailoverCount++;
}
}
}
}
}
在該方法中處理了如果訪問失敗後會根據配置進行重試處理。具體的FailoverProvider配置在:
<property>
<name>yarn.client.failover-proxy-provider</name>
<value>org.apache.hadoop.yarn.client.ConfiguredRMFailoverProxyProvider</value>
</property>
因此官方的YarnClient實際是支援Yarn HA的。 Spring YarnClient 在Spring Hadoop中,底層操作封裝在xxTemplate中,在YarnRpcAccessor中,建立底層通訊proxy:
@Override
public void afterPropertiesSet() throws Exception {
Assert.notNull(configuration, "Yarn configuration must be set");
Assert.notNull(protocolClazz, "Rpc protocol class must be set");
if (UserGroupInformation.isSecurityEnabled()) {
UserGroupInformation.setConfiguration(configuration);
}
address = getRpcAddress(configuration);
proxy = createProxy();
}
而此處建立的proxy,並沒有相關HA邏輯的處理。 修改方案 修改YarnRpcAccessor類中,建立proxy的邏輯如下:
@Override
public void afterPropertiesSet() throws Exception {
Assert.notNull(configuration, "Yarn configuration must be set");
Assert.notNull(protocolClazz, "Rpc protocol class must be set");
if (UserGroupInformation.isSecurityEnabled()) {
UserGroupInformation.setConfiguration(configuration);
}
address = getRpcAddress(configuration);
// proxy = createProxy();
if (protocolClazz.isAssignableFrom(ClientRMProtocols.class)) {
proxy = ClientRMProxy.createRMProxy(configuration, protocolClazz);
} else {
proxy = createProxy();
}
}
對於Client和AM-RM通訊,走原生的ClientRMProxy建立邏輯,對於AM-NM通訊,走原Spring邏輯。
AppMaster HA
AppMaster的高可用,除了需要支援多RM配置和連線重試外,還需要支援在RM重啟後,re-register AM。這個同樣在Apache原生的AMRMClientAsyncImpl中,有相應處理:
@Override
public AllocateResponse allocate(float progressIndicator)
throws YarnException, IOException {
Preconditions.checkArgument(progressIndicator >= 0,
"Progress indicator should not be negative");
AllocateResponse allocateResponse = null;
List<ResourceRequest> askList = null;
List<ContainerId> releaseList = null;
AllocateRequest allocateRequest = null;
List<String> blacklistToAdd = new ArrayList<String>();
List<String> blacklistToRemove = new ArrayList<String>();
try {
synchronized (this) {
askList = new ArrayList<ResourceRequest>(ask.size());
for(ResourceRequest r : ask) {
// create a copy of ResourceRequest as we might change it while the
// RPC layer is using it to send info across
askList.add(ResourceRequest.newInstance(r.getPriority(),
r.getResourceName(), r.getCapability(), r.getNumContainers(),
r.getRelaxLocality(), r.getNodeLabelExpression()));
}
releaseList = new ArrayList<ContainerId>(release);
// optimistically clear this collection assuming no RPC failure
ask.clear();
release.clear();
blacklistToAdd.addAll(blacklistAdditions);
blacklistToRemove.addAll(blacklistRemovals);
ResourceBlacklistRequest blacklistRequest =
ResourceBlacklistRequest.newInstance(blacklistToAdd,
blacklistToRemove);
allocateRequest =
AllocateRequest.newInstance(lastResponseId, progressIndicator,
askList, releaseList, blacklistRequest);
// clear blacklistAdditions and blacklistRemovals before
// unsynchronized part
blacklistAdditions.clear();
blacklistRemovals.clear();
}
try {
allocateResponse = rmClient.allocate(allocateRequest);
} catch (ApplicationMasterNotRegisteredException e) {
LOG.warn("ApplicationMaster is out of sync with ResourceManager,"
+ " hence resyncing.");
synchronized (this) {
release.addAll(this.pendingRelease);
blacklistAdditions.addAll(this.blacklistedNodes);
for (Map<String, TreeMap<Resource, ResourceRequestInfo>> rr : remoteRequestsTable
.values()) {
for (Map<Resource, ResourceRequestInfo> capabalities : rr.values()) {
for (ResourceRequestInfo request : capabalities.values()) {
addResourceRequestToAsk(request.remoteRequest);
}
}
}
}
// re register with RM
registerApplicationMaster();
allocateResponse = allocate(progressIndicator);
return allocateResponse;
}
synchronized (this) {
// update these on successful RPC
clusterNodeCount = allocateResponse.getNumClusterNodes();
lastResponseId = allocateResponse.getResponseId();
clusterAvailableResources = allocateResponse.getAvailableResources();
if (!allocateResponse.getNMTokens().isEmpty()) {
populateNMTokens(allocateResponse.getNMTokens());
}
if (allocateResponse.getAMRMToken() != null) {
updateAMRMToken(allocateResponse.getAMRMToken());
}
if (!pendingRelease.isEmpty()
&& !allocateResponse.getCompletedContainersStatuses().isEmpty()) {
removePendingReleaseRequests(allocateResponse
.getCompletedContainersStatuses());
}
}
} finally {
// TODO how to differentiate remote yarn exception vs error in rpc
if(allocateResponse == null) {
// we hit an exception in allocate()
// preserve ask and release for next call to allocate()
synchronized (this) {
release.addAll(releaseList);
// requests could have been added or deleted during call to allocate
// If requests were added/removed then there is nothing to do since
// the ResourceRequest object in ask would have the actual new value.
// If ask does not have this ResourceRequest then it was unchanged and
// so we can add the value back safely.
// This assumes that there will no concurrent calls to allocate() and
// so we dont have to worry about ask being changed in the
// synchronized block at the beginning of this method.
for(ResourceRequest oldAsk : askList) {
if(!ask.contains(oldAsk)) {
ask.add(oldAsk);
}
}
blacklistAdditions.addAll(blacklistToAdd);
blacklistRemovals.addAll(blacklistToRemove);
}
}
}
return allocateResponse;
}
而Spring Hadoop 所使用的AM**Template沒有如此邏輯,因此修改如下,在AppmasterRmTemplate中增加如下介面和實現:
@Override
public AllocateResponse allocate(final AllocateRequest request, final String host, final Integer rpcPort, final String trackUrl) {
return execute(new YarnRpcCallback<AllocateResponse, ApplicationMasterProtocol>() {
@Override
public AllocateResponse doInYarn(ApplicationMasterProtocol proxy) throws YarnException, IOException {
return doAllocate(proxy, request, host, rpcPort, trackUrl);
}
private AllocateResponse doAllocate(ApplicationMasterProtocol proxy, AllocateRequest request, final String host, final Integer rpcPort, final String trackUrl) throws IOException, YarnException {
AllocateResponse allocateResponse = null;
try {
allocateResponse = proxy.allocate(request);
} catch (ApplicationMasterNotRegisteredException e) {
log.warn("ApplicationMaster is out of sync with ResourceManager,"
+ " hence resyncing.");
// re register with RM
log.info("Re-register am with RM.");
registerApplicationMaster(host, rpcPort, trackUrl);
allocateResponse = doAllocate(proxy, request, host, rpcPort, trackUrl);
return allocateResponse;
}
return allocateResponse;
}
});
}
呼叫者DefaultContainerAllocator修改:
AppmasterService appmasterClientService = YarnContextUtils.getAppmasterClientService(getBeanFactory());
AppmasterTrackService appmasterTrackService = YarnContextUtils.getAppmasterTrackService(getBeanFactory());
String host = appmasterClientService == null ? "" : appmasterClientService.getHost();
int port = appmasterClientService == null ? 0 : appmasterClientService.getPort();
String trackUrl = appmasterTrackService == null ? null : appmasterTrackService.getTrackUrl();
log.info("Host: " + host + " ,port: " + port + ", trackUrl: " + trackUrl);
AllocateResponse allocate = getRmTemplate().allocate(request, host, port, trackUrl);
即可。 重新打包次此spring-yarn-core.jar,替換springxd中的jar,即可實現Yarn的HA支援。
配置檔案
# Hadoop properties
spring:
hadoop:
fsUri: hdfs://xxx
resourceManagerHost: xxx
# resourceManagerHost: yarn-cluster
resourceManagerPort: 8032
# rmAddress: yarn-cluster
# resourceManagerSchedulerAddress: ${spring.hadoop.resourceManagerHost}:8030
# jobHistoryAddress: xxx
## For phd30 only (values for version 3.0.1.0, also change resourceManagerPort above to 8050)
# config:
# mapreduce.application.framework.path: '/phd/apps/3.0.1.0-1/mapreduce/mapreduce.tar.gz#mr-framework'
# mapreduce.application.classpath: '$PWD/mr-framework/hadoop/share/hadoop/mapreduce/*:$PWD/mr-framework/hadoop/share/hadoop/mapreduce/lib/*:$PWD/mr-framework/hadoop/share/hadoop/common/*:$PWD/mr-framework/hadoop/share/hadoop/common/lib/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/*:$PWD/mr-framework/hadoop/share/hadoop/yarn/lib/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/*:$PWD/mr-framework/hadoop/share/hadoop/hdfs/lib/*:/usr/phd/3.0.1.0-1/hadoop/lib/hadoop-lzo-0.6.0.3.0.1.0-1.jar:/etc/hadoop/conf/secure'
## For hdp22 only (values for version 2.2.8.0, also change resourceManagerPort above to 8050)
config:
mapreduce.application.framework.path: ${spring.yarn.config.mapreduce.application.framework.path}
mapreduce.application.classpath: ${spring.yarn.config.mapreduce.application.classpath}
net.topology.script.file.name: /etc/hadoop/conf/topology_script.py
dfs.namenode.rpc-address: xxx.xxx.xxx.xxx:8020
dfs.nameservices: xxx
dfs.ha.namenodes.xxx: nn1,nn2
dfs.namenode.rpc-address.xxx.nn1: xxx.xxx.xxx.xxx:8020
dfs.namenode.rpc-address.xxx.nn2: xxx.xxx.xxx.xxx:8020
dfs.client.failover.proxy.provider.xxx: org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
yarn.resourcemanager.ha.enabled: true
yarn.resourcemanager.ha.rm-ids: rm1,rm2
yarn.resourcemanager.cluster-id: yarn-cluster
yarn.resourcemanager.address.rm1: xxx.xxx.xxx.xxx
yarn.resourcemanager.scheduler.address.rm1: xxx.xxx.xxx.xxx:8030
yarn.resourcemanager.admin.address.rm1: xxx.xxx.xxx.xxx:8033
yarn.resourcemanager.webapp.address.rm1: xxx.xxx.xxx.xxx:8088
yarn.resource.resource-tracker.address.rm1: xxx.xxx.xxx.xxx:8031
yarn.resourcemanager.address.rm2: xxx.xxx.xxx.xxx
yarn.resourcemanager.scheduler.address.rm2: xxx.xxx.xxx.xxx:8030
yarn.resourcemanager.admin.address.rm2: xxx.xxx.xxx.xxx:8033
yarn.resourcemanager.webapp.address.rm2: xxx.xxx.xxx.xxx:8088
yarn.resource.resource-tracker.address.rm2: xxx.xxx.xxx.xxx:8031
yarn.resourcemanager.zk-address: xxx.xxx.xxx.xxx:2181,xxx.xxx.xxx.xxx:2181,xxx.xxx.xxx.xxx:2181
yarn.resourcemanager.recovery.enabled: true
需要在Spring->hadoop->config 下,增加yarn高可用相關配置。