Open Sourcing TonY: Native Support of TensorFlow on Hadoop

阿新 • • 發佈：2018-12-29

LinkedIn heavily relies on artificial intelligence to deliver content and create economic opportunities for its 575+ million members. Following recent rapid advances of deep learning technologies, our AI engineers have started adopting deep neural networks in LinkedIn’s relevance-driven products, including

feeds and smart-replies. Many of these use cases are built on TensorFlow, a popular deep learning framework written by Google.

In the beginning, our internal TensorFlow users ran the framework on small and unmanaged “bare metal” clusters. But we quickly realized the need to connect TensorFlow to the massive compute and storage power of our

Hadoop-based big data platform. With hundreds of petabytes of data stored on our Hadoop clusters that could be leveraged for deep learning, we needed a scalable way to process all of this information. Fortunately, TensorFlow supports distributed training, a useful technique for processing large datasets. However, orchestrating distributed TensorFlow is not a trivial task and not something that all data scientists and relevance engineers have the expertise, or desire, to do—particularly since it must be done manually. We wanted a flexible and sustainable way to bridge the gap between the analytic powers of distributed TensorFlow and the scaling powers of Hadoop.

Open sourcing TonY

To meet our needs, and because we know there are many others interested in running distributed machine learning who are also running large Hadoop deployments, we have built TensorFlow on YARN (TonY), which we are open sourcing today. Please check out the TonY project on GitHub for details on how to use it. Contributions and suggestions from the community are welcome!

In the rest of this blog post, we will cover the internal details of TonY, the features we have implemented and leveraged to scale distributed TensorFlow on Hadoop, and experimental results.

Existing solutions In our initial investigation into running distributed TensorFlow on Hadoop, we found a few existing solutions. However, we ultimately determined that none met our particular requirements, leading to our decision to build TonY.

TensorFlow on Spark is an open source solution that enables you to run TensorFlow on the Apache Spark computing engine. We were able to onboard a couple of our internal deep learning applications on this framework, but ran into a few issues, most notably a lack of both GPU scheduling and heterogeneous container scheduling. Also, any scheduling and application lifecycle enhancements we wanted to make in the future would have to be done in Spark, which is much more difficult than making the change in a self-contained YARN application.

TensorFlowOnYARN is another open source solution that runs as a separate library. Unfortunately, fault tolerance support and usability in this project did not fit our needs. Furthermore, this project is no longer maintained.

For these reasons, we decided to build TonY to give us complete control over the resources in our Hadoop clusters. Also, since TonY is running directly on YARN and runs as a lightweight dependency, we can easily evolve it with both the lower-level part of the stack in YARN, or the higher-level part of the stack in TensorFlow.

How does TonY work?

Similar to how MapReduce provides the engine for running Pig/Hive scripts on Hadoop, and Spark provides the engine for running scala code that uses Spark APIs, TonY aims to provide the same first-class support for running TensorFlow jobs on Hadoop by handling tasks such as resource negotiation and container environment setup.

Open Sourcing TonY: Native Support of TensorFlow on Hadoop

LinkedIn heavily relies on artificial intelligence to deliver content and create economic opportunities for its 575+ million members. Following recent rap

Open sourcing TRFL: a library of reinforcement learning building blocks

Today we are open sourcing a new library of useful building blocks for writing reinforcement learning (RL) agents in TensorFlow. Named TRFL (pronounced ‘tr

A Newbie’s Install of Keras & Tensorflow on Windows 10 with R

tool per nvi real whole tutorial power suppose rom This weekend, I decided it was time: I was going to update my Python environment and g

open數據庫報錯ERROR at line 1: ORA-03113: end-of-file on communication channel Process ID: 3880 Session ID: 125 Serial number: 3

conn ret enc ora- 日誌記錄 line comm per and 1.今天打開數據時，失敗，報錯 ERROR at line 1:ORA-03113: end-of-file on communication channelProcess ID: 3880S

Nvidia and IBM team up on open source machine learning Internet of Business

NEWSBYTE IBM has announced a new partnership with AI and GPU hardware giant Nvidia, bringing the latter's Rapids open source data science toolkit into IBM'

"garbage at end of line" on Windows 10

mes 一個bug lin 原因 window github 運行創建空格 ?在windows 10上運行docker-machine scp myvm1 docker-compose.yml myvm1:~的時候報錯： "garbage at end of li

ORA-03113: end-of-file on communication channel 解決辦法

好象是之前清理了trace檔案，之後再重啟oracle，出現： SQL> ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance ORACLE instance started. Total S

ORACLE啟動報錯ORA-03113: end-of-file on communication channel

使用過程中發現oracle執行很慢（其實應該先關注空間問題），就準備關機重啟一下，關不掉就強制關閉，然後啟動就報錯了。 1.SQL> startup ORACLE instance started. Total System Global

tensorflow on win7 / win10 64

主要步驟： 0，安裝 Anaconda3； 1， protobuf 3.1.0 pip install protobuf-3.1.0-py2.py3-none-any.whl 吧這個檔案下載到當前活動路徑中。 2. numpy [A

Usage of P4V on Ubuntu 16.04

SOLUTION Double-click the file to unpack the archive. If this does not extract the files on your Linux system, use this command to unpack

New Evidence on the Origins of Life on wi fi Earth

Hot springs and geysers at Yellowstone National Park. Two newly published studies reveal evidence for how the genetic code develop

TensorFlow on Android：物體識別

說在前面：達人課是GitChat的一款輕閱讀產品，由特約講師獨家釋出。每一個課程你都可獲得6-12篇的深度文章，同時可在讀者圈與講師互動交流。GitChat達人課，讓技術分享更簡單。進入我的GitChat 作者介紹付強，十餘年從業經驗，

6 Years of Thoughts on Programming

It is now more than 6 years since I started blogging about software development. It has been a great experience, and I thought I would reflect on what

A new way to manufacture small batches of biopharmaceuticals on demand

Biopharmaceuticals, a class of drugs comprising proteins such as antibodies and hormones, represent a fast-growing sector of the pharmaceutical industry. T

GoLand 2018.3 Early Access Program is open: Change Signature refactoring, support for Testify, debugging GAE applications, and m

Today we are opening our Early Access Program for the upcoming 2018.3 release of GoLand. We cordially welcome you to try the new features and

Open Sourcing TonY: Native Support of TensorFlow on Hadoop

Open sourcing TonY

How does TonY work?

Open Sourcing TonY: Native Support of TensorFlow on Hadoop

Open sourcing TRFL: a library of reinforcement learning building blocks

A Newbie’s Install of Keras & Tensorflow on Windows 10 with R

open數據庫報錯ERROR at line 1: ORA-03113: end-of-file on communication channel Process ID: 3880 Session ID: 125 Serial number: 3

Nvidia and IBM team up on open source machine learning Internet of Business

"garbage at end of line" on Windows 10

ORA-03113: end-of-file on communication channel 解決辦法

ORACLE啟動報錯ORA-03113: end-of-file on communication channel

tensorflow on win7 / win10 64

Usage of P4V on Ubuntu 16.04

New Evidence on the Origins of Life on wi fi Earth

TensorFlow on Android：物體識別

6 Years of Thoughts on Programming

A new way to manufacture small batches of biopharmaceuticals on demand

GoLand 2018.3 Early Access Program is open: Change Signature refactoring, support for Testify, debugging GAE applications, and m

Open sourcing apiron

Cloud Native Geoprocessing of Earth Observation Satellite Data with Pangeo

The Chairman of Nokia on Ensuring Every Employee Has a Basic Understanding of Machine Learning

Lockheed Martin partners with Uni of Adelaide on machine learning

Open source and the demise of proprietary software

Open Sourcing TonY: Native Support of TensorFlow on Hadoop

Open sourcing TonY

How does TonY work?

相關推薦