【原創】大數據基礎之Logstash（4）高可用

阿新 • • 發佈：2019-04-08

htm 無法 sep fsync sage tin www cert upd

logstash高可用體現為不丟數據（前提為服務器短時間內不可用後可恢復比如重啟服務器或重啟進程），具體有兩個方面：

進程重啟（服務器重啟）
事件消息處理失敗

在logstash中對應的解決方案為：

Persistent Queues
Dead Letter Queues

默認都沒有開啟；

As data flows through the event processing pipeline, Logstash may encounter situations that prevent it from delivering events to the configured output. For example, the data might contain unexpected data types, or Logstash might terminate abnormally.

To guard against data loss and ensure that events flow through the pipeline without interruption, Logstash provides the following data resiliency features.

Persistent Queues protect against data loss by storing events in an internal queue on disk.
Dead Letter Queues provide on-disk storage for events that Logstash is unable to process. You can easily reprocess events in the dead letter queue by using the dead_letter_queue input plugin.

These resiliency features are disabled by default.

1 Persistent Queues

By default, Logstash uses in-memory bounded queues between pipeline stages (inputs → pipeline workers) to buffer events. The size of these in-memory queues is fixed and not configurable. If Logstash experiences a temporary machine failure, the contents of the in-memory queue will be lost. Temporary machine failures are scenarios where Logstash or its host machine are terminated abnormally but are capable of being restarted.

In order to protect against data loss during abnormal termination, Logstash has a persistent queue feature which will store the message queue on disk. Persistent queues provide durability of data within Logstash.

logstash默認使用內存queue來緩沖事件消息，一旦進程重啟則內存queue裏的數據全部丟失；

好處

Absorbs bursts of events without needing an external buffering mechanism like Redis or Apache Kafka.
Provides an at-least-once delivery guarantee against message loss during a normal shutdown as well as when Logstash is terminated abnormally.

實現

The queue sits between the input and filter stages in the same process:

input → queue → filter + output

When an input has events ready to process, it writes them to the queue. When the write to the queue is successful, the input can send an acknowledgement to its data source.
When processing events from the queue, Logstash acknowledges events as completed, within the queue, only after filters and outputs have completed. The queue keeps a record of events that have been processed by the pipeline. An event is recorded as processed (in this document, called "acknowledged" or "ACKed") if, and only if, the event has been processed completely by the Logstash pipeline.

配置

queue.type: persisted
path.queue: "path/to/data/persistent_queue"

其他配置

queue.page_capacity
queue.drain
queue.max_events
queue.max_bytes

更進一步

First, the queue itself is a set of pages. There are two kinds of pages: head pages and tail pages. The head page is where new events are written. There is only one head page. When the head page is of a certain size (see queue.page_capacity), it becomes a tail page, and a new head page is created. Tail pages are immutable, and the head page is append-only. Second, the queue records details about itself (pages, acknowledgements, etc) in a separate file called a checkpoint file.
When recording a checkpoint, Logstash will:

Call fsync on the head page.
Atomically write to disk the current state of the queue.
The process of checkpointing is atomic, which means any update to the file is saved if successful.

If Logstash is terminated, or if there is a hardware-level failure, any data that is buffered in the persistent queue, but not yet checkpointed, is lost.
You can force Logstash to checkpoint more frequently by setting queue.checkpoint.writes. This setting specifies the maximum number of events that may be written to disk before forcing a checkpoint. The default is 1024. To ensure maximum durability and avoid losing data in the persistent queue, you can set queue.checkpoint.writes: 1 to force a checkpoint after each event is written. Keep in mind that disk writes have a resource cost. Setting this value to 1 can severely impact performance.

即使開啟persistent queue，也有可能會有數據丟失，影響因素是flush間隔（checkpoint），默認是1024個事件flush一次，設置為1則每個事件flush一次，雖然不丟消息，但是對性能影響較大；

queue.checkpoint.writes: 1

2 Dead Letter Queues

By default, when Logstash encounters an event that it cannot process because the data contains a mapping error or some other issue, the Logstash pipeline either hangs or drops the unsuccessful event. In order to protect against data loss in this situation, you can configure Logstash to write unsuccessful events to a dead letter queue instead of dropping them.
Each event written to the dead letter queue includes the original event, along with metadata that describes the reason the event could not be processed, information about the plugin that wrote the event, and the timestamp for when the event entered the dead letter queue.
To process events in the dead letter queue, you simply create a Logstash pipeline configuration that uses the dead_letter_queue input plugin to read from the queue.

當logstash遇到無法處理的數據（mapping錯誤等），logstash要麽卡住，要麽丟掉不成功的事件；為了避免這種情況下的數據丟失，可以配置logstash將不成功的事件寫到一個dead letter queue而不是直接丟掉；

使用限制

The dead letter queue feature is currently supported for the elasticsearch output only. Additionally, The dead letter queue is only used where the response code is either 400 or 404, both of which indicate an event that cannot be retried. Support for additional outputs will be available in future releases of the Logstash plugins. Before configuring Logstash to use this feature, refer to the output plugin documentation to verify that the plugin supports the dead letter queue feature.

目前dead letter queue只支持elasticsearch output；其他output將在未來支持；

配置

dead_letter_queue.enable: true
path.dead_letter_queue: "path/to/data/dead_letter_queue"

參考：
https://www.elastic.co/guide/en/logstash/current/resiliency.html
https://www.elastic.co/guide/en/logstash/current/persistent-queues.html
https://www.elastic.co/guide/en/logstash/current/dead-letter-queues.html

【原創】大數據基礎之Logstash（4）高可用

htm 無法 sep fsync sage tin www cert upd logstash高可用體現為不丟數據（前提為服務器短時間內不可用後可恢復比如重啟服務器或重啟進程），具體有兩個方面：進程重啟（服務器重啟）事件消息處理失敗在logstash中對

【原創】大數據基礎之Logstash（4）高可用

1 Persistent Queues

好處

實現

配置

其他配置

更進一步

2 Dead Letter Queues

使用限制

配置

【原創】大數據基礎之Logstash（4）高可用

【原創】大數據基礎之Spark（4）RDD原理及代碼解析

【原創】大數據基礎之Benchmark（4）TPC-DS測試結果（hive spark impala）

【原創】大數據基礎之Spark（7）spark讀取文件split過程（即RDD分區數量）

【原創】大數據基礎之Kudu（1）簡介、安裝

【原創】大數據基礎之Mesos（1）簡介、安裝、使用

【原創】大數據基礎之Spark（9）spark部署方式yarn/mesos

【原創】大數據基礎之Presto（1）簡介、安裝、使用

【原創】大數據基礎之ElasticSearch（5）重要配置及調優

【原創】大資料基礎之Spark（4）RDD原理及程式碼解析

【原創】大數據基礎之集群搭建

【原創】大資料基礎之Spark（5）Shuffle實現原理及程式碼解析

【原創】大資料基礎之Hive（1）Hive SQL執行過程

【原創】大資料基礎之Spark（6）rdd sort實現原理

【原創】大資料基礎之Spark（7）spark讀取檔案split過程（即RDD分割槽數量）

【原創】運維基礎之Ansible（1）簡介、安裝和使用

【原創】運維基礎之Nginx（1）簡介、安裝、使用

【原創】算法基礎之Anaconda（1）簡介、安裝、使用

【原創】運維基礎之Redis（1）簡介、安裝、使用

【原創】運維基礎之Nginx（3）location

【原創】大數據基礎之Logstash（4）高可用

1 Persistent Queues

好處

實現

配置

其他配置

更進一步

2 Dead Letter Queues

使用限制

配置

相關推薦