1. 程式人生 > >AWS Glue Features

AWS Glue Features

Q: When should I use AWS Glue vs. AWS Data Pipeline?

AWS Glue provides a managed ETL service that runs on a serverless Apache Spark environment. This allows you to focus on your ETL job and not worry about configuring and managing the underlying compute resources. AWS Glue takes a data first approach and allows you to focus on the data properties and data manipulation to transform the data to a form where you can derive business insights. It provides an integrated data catalog that makes metadata available for ETL as well as querying via

Amazon Athena and Amazon Redshift Spectrum.

AWS Data Pipeline provides a managed orchestration service that gives you greater flexibility in terms of the execution environment, access and control over the compute resources that run your code, as well as the code itself that does data processing. AWS Data Pipeline launches compute resources in your account allowing you direct access to the Amazon EC2 instances or

Amazon EMR clusters.

Furthermore, AWS Glue ETL jobs are Scala or Python based. If your use case requires you to use an engine other than Apache Spark or if you want to run a heterogeneous set of jobs that run on a variety of engines like Hive, Pig, etc., then AWS Data Pipeline would be a better choice.

相關推薦

AWS Glue Features

Q: When should I use AWS Glue vs. AWS Data Pipeline? AWS Glue provides a managed ETL service that runs on a serverless Apache Spar

記一次aws glue建立連線遇到的錯誤

使用的驅動是jdbc,然後該填的都填了,測試連線的時候彈出如下錯誤提示 "1 validation error detected: Value '25-十月-2018-7-02-上午-UTC' at 'logProperties.logStreamName' failed

Restrict access to your AWS Glue Data Catalog with resource

A data lake provides a centralized repository that you can use to store all your structured and unstructured data at any scale. A data lake can in

premises data stores using AWS Glue | AWS Big Data Blog

AWS Glue is a fully managed ETL (extract, transform, and load) service to catalog your data, clean it, enrich it, and move it reliably between var

Create cross-account and cross-region AWS Glue connections

AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy to prepare and load data for analytics. AWS Glue uses co

AWS CloudFormation Features

AWS CloudFormation gives developers and systems administrators an easy way to create and manage a collection of related AWS resources, provisioni

AWS Snowball Features Page

The Snowball device is equipped with tamper-resistant seals and includes a built-in Trusted Platform Module (TPM) that uses a dedicated processor

AWS PrivateLink Features

AWS PrivateLink is integrated with AWS Marketplace through an easy lookup of the services that are available over AWS PrivateLink. To facilitate

AWS Batch Features

AWS Batch provides Managed Compute Environments that dynamically provision and scale compute resources based the volume and resource requirements

Возможности AWS Glue 

Вопрос: В каких случаях стоит использовать AWS Glue, а в каких AWS Data Pipeline? AWS Glue предлагает управляемый ETL‑сервис, рабо

AWS CloudTrail Features

Data events provide insights into the resource (“data plane”) operations performed on or within the resource itself. Data events are often high v

New AWS AppSync features and whitelist removal

At AWS re:Invent 2017, we announced AWS AppSync, which is a managed GraphQL service with offline and real-time data capabilities. Based on custome

AWS CodeStar Features

AWS CodeStar projects include a unified dashboard, so you can easily track and manage your end-to-end development toolchain. With the project das

AWS Glue Pricing

With AWS Glue, you only pay for the time your ETL job takes to run. There are no resources to manage, no upfront costs, and you are not charge

AWS Glue ELT服務_資料倉庫技術(ELT)

AWS Glue 是一項完全託管的提取、轉換和載入 (ETL) 服務,讓客戶能夠輕鬆準備和載入資料進行分析。您只需在 AWS 管理控制檯中單擊幾次,即可建立並執行 ETL 作業。您只需將 AWS Glue 指向儲存在 AWS 上的資料,AWS Glue 便會發現您的資料,並將關聯的元資料(

AWS Systems Manager Features

AWS Systems Manager allows you to centralize operational data from multiple AWS services and automate tasks across your AWS resources. You can cr

AWS OpsWorks for Chef Automate Features

The Chef server acts as the hub for configuration data and distributes information about desired configurations to nodes. It stores your cookbook

AWS Storage Gateway Features

Volume Gateway The Volume Gateway presents your applications storage volumes using the iSCSI block protocol. Data written to these volum

AWS Direct Connect features

AWS Direct Connect is available at locations around the world. In some campus settings, AWS Direct Connect is accessible via a standard cross-co