Introducing Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation and Beyond

阿新 • • 發佈：2018-12-28

Introducing Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation and Beyond

Crossposted on the Petuum blog.

We are excited to introduce Texar, an open-source, general-purpose toolkit that supports a broad set of machine learning applications with a focus on text generation tasks. Texar is particularly suitable for researchers and practitioners of fast model prototyping and experimentation.

Text Generation at a Glance

Text generation spans a broad set of natural language processing (NLP) tasks that aim to generate natural language from input data or machine representations. Such tasks include machine translation, dialog systems, text summarization, article writing, text paraphrasing and manipulation, image captioning, and more. While this field has undergone rapid progress in both academic and industry settings, in part due to the integration of modern deep learning approaches, considerable research efforts are still needed in order to improve techniques and enable real-world applications.

Text generation tasks have many common properties and share two central goals:

Generating human-like, grammatical, and readable text.
Generating text that contains all relevant information inferred from inputs. For example, in machine translation, the translated sentence that is generated must express the same meaning as the source sentence.

To this end, a few key techniques are increasingly widely-used, such as neural encoder- decoders, attentions, memory networks, adversarial methods, reinforcement learning, and structured supervision, as well as optimization, data pre-processing and result post-processing procedures, evaluations, and etc. These techniques are often combined together in various ways to tackle different problems (Figure 1).

Figure 1. An example of various model architectures used in text generation tasks, where E refers to encoder, D to decoder, C to Classifier, A to attention, Prior to prior distribution, and M to memory.

It is therefore highly desirable to have an open-source platform that unifies the development of these diverse yet closely-related text generation applications, backed with clean and consistent implementations of the core algorithms. Such a unified platform would enable reuse of common components and functionalities; standardize design, implementation, and experimentation; foster reproducible research; and, importantly, encourage technique sharing among different text generation tasks so that an algorithmic advance developed for a specific task can quickly be evaluated and generalized to many other tasks.

Introducing Texar

To that end, we have developed Texar, an open-source toolkit focused on text generation tasks, using the TensorFlow language. Texar is modular, versatile, and extensible. It extracts common patterns underlying the diverse tasks and methodologies within text generation and creates a library of highly reusable modules and functionalities.

Figure 2. Texar’s main modules and functionalities.

Versatility

Texar contains a wide range of modules and functionalities for composing arbitrary model architectures and implementing various learning algorithms such as maximum likelihood learning, reinforcement learning, adversarial learning, probabilistic modeling, and so forth (Figure 2).

Modularity

Texar decomposes diverse complex machine learning models/algorithms into highly-reusable model architecture, loss, and learning process modules, among others.

Users can easily construct their own models at a high conceptual level by assembling Texar’s modules like building blocks. Texar makes plugging-in and swapping-out modules simple — for example, switching between maximum likelihood learning and reinforcement learning only involves changing a few lines of code.

Extensibility

Texar can be effortlessly integrated with any user-customized, external modules, and is fully compatible with the TensorFlow open source community, including TensorFlow-native interfaces, features, and resources.

Usability

With Texar, users can customize models with templates/examples and simple Python/YAML configuration files, or program from Texar’s Python Library APIs for maximal customizability.

Texar provides convenient automatic variable reuse (no need to worry about complicated TensorFlow variable scopes), simple function-like calls to perform module logic, and rich configuration options with sensible default values for every module.

Texar emphasizes well-structured, highly-readable code with uniform design patterns and consistent styles, along with clean documentation and rich tutorial examples.

Texar is currently supporting several research and engineering projects at Petuum, Inc. We hope the toolkit can also empower the community to accelerate technique development in text generation and beyond. We also invite researchers and practitioners to join and further enrich the toolkit so that, together, we can advance text generation research and applications.

Please check out the following resources to learn more about Texar:

Introducing Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation and Beyond

Introducing Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation and BeyondCrossposted on the Petuum blog.We are excited to introduc

Opinionated openness: Facebook AI research strategy, ecosystem, and target audience for Deep Learning, and the nuances of using

Chintala's take is that some people would have to be assigned on something like this anyway. If PyTorch had not been created, the other option would be to

[React] Configure a React & Redux Application For Production Deployment and Deploy to Now

hang scrip app class pack pub relative pts als In this lesson, we’ll make a few small changes to our scripts and add some environme

stu--CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

stanford ted ets question prior main sed and not https://cs.stanford.edu/people/jcjohns/clevr/ 用於視覺推理的診斷數據集摘要介紹 visual question

論文閱讀之FaceNet: A Unified Embedding for Face Recognition and Clustering

深度 clust 識別傳統創新圖像進行 rec 大創名稱：FaceNet: A Unified Embedding for Face Recognition and Clustering 時間：2015.04.13 來源：CVPR 2015 來自谷歌的

A Pulmonary Nodule Detection Model Based on Progressive Resolution and Hierarchical Saliency

paper：https://arxiv.org/abs/1807.00598 摘要胸部CT上肺結節的檢出是肺癌早期診斷的重要步驟，對於患者是至關重要的。雖然在文獻中已經發表了一些計算機輔助結節檢測方法，但是這些方法仍然有兩個主要的缺點:

轉：springboot專案啟動報錯Failed to configure a DataSource: 'url' attribute is not specified and no embedde

*************************** APPLICATION FAILED TO START *************************** Description: Failed to configure a DataSource: 'url' attribute

springboot報錯 Failed to configure a DataSource: 'url' attribute is not specified and no embedded data

報錯 Description: Failed to configure a DataSource: 'url' attribute is not specified and no embedded datasource could be configured.

譯文 FaceNet: A Unified Embedding for Face Recognition and Clustering

摘要 Despite significant recent advances in the field of face recognition [10, 14, 15, 17], implementing face verification and recognition effic

Filezilla報錯：You appear to be behind a NAT router. Please configure the passive mode settings and ...

1、今天第一次用filezilla安裝檔案伺服器，出現了這樣一個警告： You appear to be behind a NAT router. Please configure the passive mode settings and forward a range of ports in

DAVIS2016-A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation

專案網址：https://davischallenge.org/davis2016/code.html 稠密標註的視訊目標分割資料集可以用於半監督和無監督的方法檢驗可以用於目標分割問題，也可以用於運動檢測問題有訓練集，但測試集包含訓練集後續會更新如何使用該

Introducing Makisu: Uber’s Fast, Reliable Docker Image Builder for Apache Mesos and Kubernetes

轉自：https://eng.uber.com/makisu/?amp To ensure the stable, scalable growth of our diverse tech stack, we leverage a microservices-oriented archite

A feasibility study on SSVEP-based interaction with motivating and immersive virtual and augmented r

A feasibility study on SSVEP-based interaction with motivating and immersive virtual and augmented reality 基於SSVEP的互動與刺激和沉浸式虛擬和增強現實的可行性研究

論文筆記：IRGAN：A Minimax Game for Unifying Generative and Discriminative Information

2017 SIGIR 簡單介紹 IRGAN將GAN用在資訊檢索（Information Retrieval）領域，通過GAN的思想將生成檢索模型和判別檢索模型統一起來，對於生成器採用了基於策略梯度的強化學習來訓練，在三種典型的IR任務上（四個資料集）得到了更顯著的效果。生成式和判別式的檢索模型生成式檢索模

A Hierarchical End-to-End Model for Jointly Improving Text Summarization and Sentiment Classificatio

abstract 文字摘要和情感分類都是要捕獲文字的重要資訊，但是在不同的水平上的。文字摘要是用一些句子表示原始文件，情感分類是給文字貼標籤。提出層次級的端到端模型進行摘要抽取和情感分類的聯合學習，標籤是作為文字摘要抽取的輸出，情感分類依賴於摘要抽取，情感分類放在摘要

閱讀筆記——《FFDNet Toward a Fast and Flexible Solution for CNN based Image Denoising》

本博文屬於閱讀筆記，僅供本人學習理解用論文連結：https://ieeexplore.ieee.org/abstract/document/8365806 給出程式碼（https://github.com/cszn/FFDNet） Many methods mos

QuadriFlow : A Scalable and Robust Method for Quadrangulation

雖然三角網格是最常用的表面模型，但四邊形網格也很重要，因為它們對Catmull-Clark細分曲面，紋理，網格編輯，視覺化和基於物理的模擬特別有用。方向場確定四邊形網格邊緣的方向，位置場確定網格頂點的放置位置。理想情況下，兩個場應該在表面上平滑地變化，同時遵守有助於使網格邊緣銳化和物件的曲率對齊的約束。

【人臉識別】FaceNet: A Unified Embedding for Face Recognition and Clustering 翻譯

FaceNet:人臉識別和聚類的統一嵌入摘要儘管人臉識別領域最近取得了重大進展[10、14、15、17]，但在規模上有效地實施人臉驗證和識別，對當前的研究方法提出了嚴峻的挑戰。在本文中，我們提出了一個叫做FaceNet的系統，它直接從臉部影象學習到一個緊湊的歐幾里得空

Presidential alert: Why did Trump just text me about a 'test of the National Wireless Emergency Alert System' and how does it wo

Donald Trump is texting everyone in the US the exact same message. "THIS IS A TEST of the National Wireless Emergency Alert System," the message will begin

Ask HN: A tool that tracks various websites for keywords and give you updates?

I'm looking for a tool that would search a number of websites for certain keywords and show me the aggregated results on a page.For instance, something tha

Introducing Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation and Beyond

Introducing Texar: A Modularized, Versatile, and Extensible Toolkit for Text Generation and Beyond

Text Generation at a Glance

Introducing Texar

Versatility

Modularity

Extensibility

Usability

相關推薦