時序資料庫InfluxDB的基本語法
一 瞭解InfluxDB的必要性
時序資料庫主要存放的資料
Time series data is a series of data points each associated with a specific time. Examples include:
- Server performance metrics
- Financial averages over time
- Sensor data, such as temperature, barometric pressure, wind speeds, etc.
時序資料庫和關係資料庫的區別
Relational databases can be used to store and analyze time series data, but depending on the precision of your data, a query can involve potentially millions of rows. InfluxDB is purpose-built to store and query data by time, providing out-of-the-box functionality that optionally downsamples data after a specific age and a query engine optimized for time-based data.
二 基本概念
2.1 database&duration
database
A logical container for users, retention policies, continuous queries, and time series data.
duration
The attribute of the retention policy that determines how long InfluxDB stores data. Data older than the duration are automatically dropped from the database.
2.2 field
The key-value pair in an InfluxDB data structure that records metadata and the actual data value. Fields are required in InfluxDB data structures and they are not indexed - queries on field values scan all points that match the specified time range and, as a result, are not performant relative to tags.
Field keys are strings and they store metadata.Field values are the actual data; they can be strings, floats, integers, or booleans. A field value is always associated with a timestamp.
2.3 Tags
Tags are optional. The key-value pair in the InfluxDB data structure that records metadata.You don’t need to have tags in your data structure, but it’s generally a good idea to make use of them because, unlike fields, tags are indexed. This means that queries on tags are faster and that tags are ideal for storing commonly-queried metadata.
Tags 與fields 的區別
Tagsare indexed andfieldsare not indexed. This means that queries on tags are more performant than those on fields.
Tags 與fields 的使用場景
(1)Store commonly-queried meta data in tags
(2)Store data in tags if you plan to use them with the InfluxQLGROUP BY
clause
(3)Store data in fields if you plan to use them with anInfluxQLfunction
(4)Store numeric values as fields (tag valuesonly support string values)
2.4 measurement
Themeasurementacts as a container for tags, fields, and thetime
column, and the measurement name is the description of the data that are stored in the associated fields. Measurement names are strings, and, for any SQL users out there, a measurement is conceptually similar to a table.
2.5 point
In InfluxDB, a point represents a single data record, similar to a row in a SQL database table. Each point:
- has a measurement, a tag set, a field key, a field value, and a timestamp;
- is uniquely identified by its series and timestamp.
You cannot store more than one point with the same timestamp in a series. If you write a point to a series with a timestamp that matches an existing point, the field set becomes a union of the old and new field set, and any ties go to the new field set.
2.6 series
In InfluxDB, aseriesis a collection of points that share a measurement, tag set, and field key.Apointrepresents a single data record that has four components: a measurement, tag set, field set, and a timestamp. A point is uniquely identified by its series and timestamp.
series key
A series key identifies a particular series by measurement, tag set, and field key.
三 查詢
3.1 正則模糊查詢
1.實現查詢以給定欄位開始的資料
select fieldName from measurementName where fieldName=~/^給定欄位/
2.實現查詢以給定欄位結束的資料
select fieldName from measurementName where fieldName=~/給定欄位$/
3.實現查詢包含給定欄位資料
select fieldName from measurementName where fieldName=~/給定欄位/
3.2 Select 注意事項:
必須包含field key
A query requires at least onefield keyin theSELECT
clause to return data. If theSELECT
clause only includes a singletag keyor several tag keys, the query returns an empty response. This behavior is a result of how the system stores data.
3.3 Where 限定
使用單引號,否則無資料返回或報錯
(1)Single quote string field values in theWHERE
clause. Queries with unquoted string field values or double quoted string field values will not return any data and, in most cases,will not return an error.
(2)Single quotetag valuesin theWHERE
clause. Queries with unquoted tag values or double quoted tag values will not return any data and, in most cases,will not return an error.
3.4 Group By
(1)Note that theGROUP BY
clause must come after theWHERE
clause.
(2)TheGROUP BY
clause groups query results by:one or more specifiedtags ;specified time interval。
(3)You cannot useGROUP BY
to group fields.
(4)fill()
changes the value reported for time intervals that have no data.
By default, aGROUP BY time()
interval with no data reportsnull
as its value in the output column.fill()
changes the value reported for time intervals that have no data. Note thatfill()
must go at the end of theGROUP BY
clause if you’reGROUP(ing) BY
several things (for example, bothtagsand a time interval).
3.5ORDER BY time DESC
By default, InfluxDB returns results in ascending time order; the firstpointreturned has the oldesttimestampand the last point returned has the most recent timestamp.ORDER BY time DESC
reverses that order such that InfluxDB returns the points with the most recent timestamps first.
注意:ORDER by time DESC
must appear after theGROUP BY
clauseif the query includes aGROUP BY
clause.ORDER by time DESC
must appear after theWHERE
clauseif the query includes aWHERE
clause and noGROUP BY
clause.
四.SHOW CARDINALITY
是用於估計或精確計算measurement、序列、tag key、tag value和field key的基數的一組命令。
SHOW CARDINALITY命令有兩種可用的版本:估計和精確。估計值使用草圖進行計算,對於所有基數大小來說,這是一個安全預設值。精確值是直接對TSM(Time-Structured Merge Tree)資料進行計數,但是,對於基數大的資料來說,執行成本很高。
下面以tag key、tag value為例。
4.1 SHOW TAG KEY CARDINALITY
估計或精確計算tag key集的基數。
ON <database>、FROM <sources>、WITH KEY = <key>、WHERE <condition>、GROUP BY <dimensions>和LIMIT/OFFSET子句是可選的。當使用這些查詢子句時,查詢將回退到精確計數(exect count)。當啟用Time Series Index(TSI)時,才支援對time進行過濾。不支援在WHERE子句中使用time。
舉例:
-- show estimated tag key cardinality SHOW TAG KEY CARDINALITY
----計算精確值 -- show exact tag key cardinality SHOW TAG KEY EXACT CARDINALITY
4.2 SHOW TAG VALUES CARDINALITY
估計或精確計算指定tag key對應的tag value的基數。
ON <database>、FROM <sources>、WITH KEY = <key>、WHERE <condition>、GROUP BY <dimensions>和LIMIT/OFFSET子句是可選的。當使用這些查詢子句時,查詢將回退到精確計數(exect count)。當啟用Time Series Index(TSI)時,才支援對time進行過濾。
舉例
-- show estimated tag key values cardinality for a specified tag key SHOW TAG VALUES CARDINALITY WITH KEY = "myTagKey" -- show estimated tag key values cardinality for a specified tag key SHOW TAG VALUES CARDINALITY WITH KEY = "myTagKey"
-----計算精確值
-- show exact tag key values cardinality for a specified tag key SHOW TAG VALUES EXACT CARDINALITY WITH KEY = "myTagKey" -- show exact tag key values cardinality for a specified tag key SHOW TAG VALUES EXACT CARDINALITY WITH KEY = "myTagKey"
4.3 應用場景舉例
例如,前面的分享,我們通過Telegraf 將server的監控資料儲存到了InfluxDB中,其中CPU指標是必不可少的(telegraf.conf 設定)。假如有一天,我們需要統計telegraf一共部署了多少臺。其實就可以通過SHOW TAG VALUES EXACT CARDINALITY 獲得。
SQL 語句如下:
SHOW TAG VALUES EXACT CARDINALITY from "cpu" WITH KEY = "host"
即檢視cpu 中 host 的key值有多少個。因為通過telegraf.conf的設定,一臺Server 對應一個唯一的host值,host值有多少個,就有多少臺Server已部署了telegraf。
5 Drop 與 Delete
5.1series
TheDROP SERIES
query deletes all points from aseriesin a database, and it drops the series from the index.
The query takes the following form, where you must specify either theFROM
clause or theWHERE
clause.
語法如下:
DROP SERIES FROM <measurement_name[,measurement_name]> WHERE <tag_key>='<tag_value>'
A successfulDROP SERIES
query returns an empty result.
Drop all points in the series that have a specific tag pair from all measurements in the database(即,如不指定from,將會把符合條件的所有表tag資料刪除).
與Delete series 的區別是:
TheDELETE
query deletes all points from aseriesin a database. UnlikeDROP SERIES
,DELETE
does not drop the series from the index.
5.2measurement_name
DELETE FROM <measurement_name> WHERE [<tag_key>='<tag_value>'] | [<time interval>]
只允許根據tag和時間來進行刪除操作.
measurement的drop,是比較消耗資源的,並且操作時間相對較長。看有網友的分享,建議 在 drop measurement 之前先刪除所有的 tag。
即先執行:
DROP SERIES FROM 'measurement_name'
然後再執行:
drop measurement <measurement_name>
六 常用函式部分
常用函式彙總如下:
型別 | 函式名 | 備註說明1 | 備註說明2 |
聚合類 | COUNT() | Returns the number of non-nullfield values. | |
DISTINCT() | Returns the list of uniquefield values. | DISTINCT() often returns several results with the same timestamp; InfluxDB assumespointswith the sameseriesand timestamp are duplicate points and simply overwrites any duplicate point with the most recent point in the destination measurement. |
|
INTEGRAL() | Returns the area under the curve for subsequentfield values. | InfluxDB calculates the area under the curve for subsequent field values and converts those results into the summed area perunit . Theunit argument is an integer followed by aduration literaland it is optional. If the query does not specify theunit , the unit defaults to one second (1s ). |
|
MEAN() | Returns the arithmetic mean (average) offield values. | ||
MEDIAN() | Returns the middle value from a sorted list offield values. | MEDIAN() is nearly equivalent toPERCENTILE(field_key, 50) , exceptMEDIAN() returns the average of the two middle field values if the field contains an even number of values. |
|
MODE() | Returns the most frequent value in a list offield values. | MODE() returns the field value with the earliesttimestampif there’s a tie between two or more values for the maximum number of occurrences. |
|
SPREAD() | Returns the difference between the minimum and maximumfield values. | ||
STDDEV() | Returns the standard deviation offield values. | ||
SUM() | Returns the sum offield values. | ||
查詢選擇類 | BOTTOM() | Returns the smallestN field values. |
BOTTOM() returns the field value with the earliest timestamp if there’s a tie between two or more values for the smallest value. |
FIRST() | Returns thefield valuewith the oldest timestamp. | ||
LAST() | Returns thefield valuewith the most recent timestamp. | ||
MAX() | Returns the greatestfield value. | ||
MIN() | Returns the lowestfield value. | ||
PERCENTILE() | Returns theN th percentilefield value. |
||
SAMPLE() | Returns a random sample ofN field values. |
SAMPLE() usesreservoir samplingto generate the random points. |
|
TOP() | Returns the greatestN field values. |
TOP() returns the field value with the earliest timestamp if there’s a tie between two or more values for the greatest value. |
|
轉換類 | ABS() | Returns the absolute value of the field value. | |
ACOS() | Returns the arccosine (in radians) of the field value. | Field values must be between -1 and 1. | |
ASIN() | Returns the arcsine (in radians) of the field value. | Field values must be between -1 and 1. | |
ATAN() | Returns the arctangent (in radians) of the field value. | Field values must be between -1 and 1. | |
ATAN2() | Returns the the arctangent ofy/x in radians. |
||
CEIL() | Returns the subsequent value rounded up to the nearest integer. | ||
COS() | Returns the cosine of the field value. | ||
CUMULATIVE_SUM() | Returns the running total of subsequentfield values. | ||
DERIVATIVE() | Returns the rate of change between subsequentfield values. | InfluxDB calculates the difference between subsequent field values and converts those results into the rate of change perunit . Theunit argument is an integer followed by aduration literaland it is optional. If the query does not specify theunit the unit defaults to one second (1s ). |
|
DIFFERENCE() | Returns the result of subtraction between subsequentfield values. | ||
ELAPSED() | Returns the difference between subsequentfield value’stimestamps. | InfluxDB calculates the difference between subsequent timestamps. Theunit option is an integer followed by aduration literaland it determines the unit of the returned difference. If the query does not specify theunit option the query returns the difference between timestamps in nanoseconds. |
|
EXP() | Returns the exponential of the field value. | ||
FLOOR() | Returns the subsequent value rounded down to the nearest integer. | ||
LN() | Returns the natural logarithm of the field value. | ||
LOG() | Returns the logarithm of the field value with baseb . |
||
LOG2() | Returns the logarithm of the field value to the base 2. | ||
LOG10() | Returns the logarithm of the field value to the base 10. | ||
MOVING_AVERAGE() | Returns the rolling average across a window of subsequentfield values. | ||
POW() | Returns the field value to the power ofx |
||
ROUND() | Returns the subsequent value rounded to the nearest integer. | ||
SIN() | Returns the sine of the field value. | ||
SQRT() | Returns the square root of field value. | ||
TAN() | Returns the tangent of the field value. | ||
推測類 | HOLT_WINTERS() | Returns N number of predictedfield values |
Predict when data values will cross a given threshold; Compare predicted values with actual values to detect anomalies in your data. |
技術分析類 | CHANDE_MOMENTUM_OSCILLATOR() | The Chande Momentum Oscillator (CMO) is a technical momentum indicator developed by Tushar Chande. The CMO indicator is created by calculating the difference between the sum of all recent higher data points and the sum of all recent lower data points, then dividing the result by the sum of all data movement over a given time period. The result is multiplied by 100 to give the -100 to +100 range. | |
EXPONENTIAL_MOVING_AVERAGE() | An exponential moving average (EMA) is a type of moving average that is similar to asimple moving average, except that more weight is given to the latest data. It’s also known as the “exponentially weighted moving average.” This type of moving average reacts faster to recent data changes than a simple moving average. | ||
DOUBLE_EXPONENTIAL_MOVING_AVERAGE() | The Double Exponential Moving Average (DEMA) attempts to remove the inherent lag associated to Moving Averages by placing more weight on recent values. The name suggests this is achieved by applying a double exponential smoothing which is not the case. The name double comes from the fact that the value of anEMAis doubled. To keep it in line with the actual data and to remove the lag, the value “EMA of EMA” is subtracted from the previously doubled EMA. | ||
KAUFMANS_EFFICIENCY_RATIO() | Kaufman’s Efficiency Ration, or simply “Efficiency Ratio” (ER), is calculated by dividing the data change over a period by the absolute sum of the data movements that occurred to achieve that change. The resulting ratio ranges between 0 and 1 with higher values representing a more efficient or trending market.
The ER is very similar to theChande Momentum Oscillator(CMO). The difference is that the CMO takes market direction into account, but if you take the absolute CMO and divide by 100, you you get the Efficiency Ratio. |
||
KAUFMANS_ADAPTIVE_MOVING_AVERAGE() | Kaufman’s Adaptive Moving Average (KAMA) is a moving average designed to account for sample noise or volatility. KAMA will closely follow data points when the data swings are relatively small and noise is low. KAMA will adjust when the data swings widen and follow data from a greater distance. This trend-following indicator can be used to identify the overall trend, time turning points and filter data movements. | ||
TRIPLE_EXPONENTIAL_MOVING_AVERAGE() | The triple exponential moving average (TEMA) was developed to filter out volatility from conventional moving averages. While the name implies that it’s a triple exponential smoothing, it’s actually a composite of asingle exponential moving average, adouble exponential moving average, and a triple exponential moving average. | ||
TRIPLE_EXPONENTIAL_DERIVATIVE() | The triple exponential derivative indicator, commonly referred to as “TRIX,” is an oscillator used to identify oversold and overbought markets, and can also be used as a momentum indicator. TRIX calculates atriple exponential moving averageof thelogof the data input over the period of time. The previous value is subtracted from the previous value. This prevents cycles that are shorter than the defined period from being considered by the indicator.
Like many oscillators, TRIX oscillates around a zero line. When used as an oscillator, a positive value indicates an overbought market while a negative value indicates an oversold market. When used as a momentum indicator, a positive value suggests momentum is increasing while a negative value suggests momentum is decreasing. Many analysts believe that when the TRIX crosses above the zero line it gives a buy signal, and when it closes below the zero line, it gives a sell signal. |
||
RELATIVE_STRENGTH_INDEX() | The relative strength index (RSI) is a momentum indicator that compares the magnitude of recent increases and decreases over a specified time period to measure speed and change of data movements. |
參考網址:
https://blog.csdn.net/xuxiannian/article/details/103559246
https://blog.csdn.net/funnyPython/article/details/89888972
https://docs.influxdata.com/influxdb/v1.8/query_language/explore-data/ https://docs.influxdata.com/influxdb/v1.8/query_language/manage-database/#drop-series-from-the-index-with-drop-series https://docs.influxdata.com/influxdb/v1.8/query_language/functions/ https://help.aliyun.com/document_detail/113127.html?spm=5176.21213303.J_6704733920.12.345d3eda8r81jQ&scm=20140722.S_help%40%40%E6%96%87%E6%A1%A3%40%40113127.S_0%2Bos.ID_113127-RL_show%20tag%20values-OR_helpmain-V_2-P0_1