FFmpeg學習4：音訊格式轉換

阿新 • • 發佈：2019-02-05

前段時間，在學習試用FFmpeg播放音訊的時候總是有雜音，網上的很多教程是基於之前版本的FFmpeg的，而新的FFmepg3中audio增加了平面（planar）格式，而SDL播放音訊是不支援平面格式的，所以通過FFmpeg解碼出來的資料不能直接傳送到SDL進行播放，需要進行一個格式轉換。通過網上一些資料，也能夠正確的播放音訊了，但是對具體的音訊轉換過程不是很瞭解，這裡就對FFmpeg的對音訊的儲存格式及格式轉換做個總結。本文主要有以下幾個方面的內容：
* AVSampleFormat 音訊sample的儲存格式
* channel layout 各個通道儲存順序
* 使用FFmpeg對音訊資料進行格式轉換
* 音訊解碼API avcodec_decode_audio4

在新版中已廢棄，替換為使用更為簡單的avcodec_send_packet和avcodec_receive_frame。本文簡單的介紹了該API的使用。

AVSampleFormat

在FFmpeg中使用列舉AVSampleFormat表示音訊的取樣格式，其宣告如下：

enum AVSampleFormat {
    AV_SAMPLE_FMT_NONE = -1,
    AV_SAMPLE_FMT_U8,          ///< unsigned 8 bits
    AV_SAMPLE_FMT_S16,         ///< signed 16 bits
    AV_SAMPLE_FMT_S32,         ///< signed 32 bits 

    AV_SAMPLE_FMT_FLT,         ///< float
    AV_SAMPLE_FMT_DBL,         ///< double

    AV_SAMPLE_FMT_U8P,         ///< unsigned 8 bits, planar
    AV_SAMPLE_FMT_S16P,        ///< signed 16 bits, planar
    AV_SAMPLE_FMT_S32P,        ///< signed 32 bits, planar
    AV_SAMPLE_FMT_FLTP,        ///< float, planar 

    AV_SAMPLE_FMT_DBLP,        ///< double, planar

    AV_SAMPLE_FMT_NB           ///< Number of sample formats. DO NOT USE if linking dynamically
};

和影象的畫素儲存格式類似，可以使用8位無符號整數、16位有符號整數、32位有符號整數以及單精度浮點數，雙精度浮點數表示一個取樣。但是，沒有使用
24位的有符號整數，這是因為這些不同的格式使用的是原生的C型別，而C中是沒有24位的長度的型別的。

Sample value can be expressed by native C types,hence the lack of a signed 24-bit sample format even though
it is a common raw audio data format.

對於浮點格式，其值在[-1.0,1.0]之間，任何在該區間之外的值都超過了最大音量的範圍。
和YUV的影象格式格式，音訊的取樣格式分為平面（planar）和打包（packed）兩種型別，在列舉值中上半部分是packed型別，後面（有P字尾的）是planar型別。
對於planar格式的，每一個通道的值都有一個單獨的plane，所有的plane必須有相同的大小；對於packed型別，所有的資料在同一個資料平面中，不同通道的資料
交叉儲存。
另外，在AVFrame中表示音訊取樣格式的欄位format是一個int型，在使用AVSampleFormat時候需要進行一個型別轉換，將int轉換為AVSampleFormat列舉值。

在標頭檔案samplefmt.h提供了和音訊取樣格式相關的一些函式，現列舉一些如下：
* const char *av_get_sample_fmt_name(enum AVSampleFormat sample_fmt)
根據列舉值獲取其相應的格式名稱（字串）
* enum AVSampleFormat av_get_sample_fmt(const char *name)
根據格式名字（字串）獲取相應的列舉值
* enum AVSampleFormat av_get_packed_sample_fmt(enum AVSampleFormat sample_fmt)
傳入planar型別的取樣格式，返回其可轉換的packed型別的取樣格式。例如傳入 AV_SAMPLE_FMT_S32P，其返回值為 AV_SAMPLE_FMT_S32。
* enum AVSampleFormat av_get_planar_sample_fmt(enum AVSampleFormat sample_fmt)
和上面函式類似，不同的是傳入的是packed型別的格式。
* int av_sample_fmt_is_planar(enum AVSampleFormat sample_fmt
判斷一個取樣格式是不是planar型別的
* int av_get_bytes_per_sample(enum AVSampleFormat sample_fmt)
每個取樣值所佔用的位元組數
* int av_samples_get_buffer_size(int *linesize, int nb_channels, int nb_samples,enum AVSampleFormat sample_fmt, int align)
根據輸入的引數，計算其所佔用空間的大小（位元組數）。linesize可設為null，align是buff空間的對齊格式（0=default，1 = no alignment）

channel_layout

從上面可知，sample有兩種型別的儲存方式：平面（planar）和打包（packed），在planar中每一個通道獨自佔用一個儲存平面；在packed中，所有通道的sample交織儲存在同一個
平面。但是，對於planar格式不知道具體的某一通道所在的平面；對於packed格式各個通道的資料是以怎麼樣的順序交織儲存的。這就需要藉助於channel_layout。
首先來看下FFmpeg對channel_layout的定義：
channel_layout是一個64位整數，每個值為1的位對應一個通道。也就說，channel_layout的位模式中值為1的個數等於其通道數量。

A channel_layout is a 64-bits interget with a bit set for every channel.The number of bits set must be equal to the number of channels.

在標頭檔案channel_layout.h中為將每個通道定義了一個mask，其定義如下：

#define AV_CH_FRONT_LEFT             0x00000001
#define AV_CH_FRONT_RIGHT            0x00000002
#define AV_CH_FRONT_CENTER           0x00000004
#define AV_CH_LOW_FREQUENCY          0x00000008
#define AV_CH_BACK_LEFT              0x00000010
#define AV_CH_BACK_RIGHT             0x00000020
#define AV_CH_FRONT_LEFT_OF_CENTER   0x00000040
#define AV_CH_FRONT_RIGHT_OF_CENTER  0x00000080
#define AV_CH_BACK_CENTER            0x00000100
#define AV_CH_SIDE_LEFT              0x00000200
#define AV_CH_SIDE_RIGHT             0x00000400
#define AV_CH_TOP_CENTER             0x00000800
#define AV_CH_TOP_FRONT_LEFT         0x00001000
#define AV_CH_TOP_FRONT_CENTER       0x00002000
#define AV_CH_TOP_FRONT_RIGHT        0x00004000
#define AV_CH_TOP_BACK_LEFT          0x00008000
#define AV_CH_TOP_BACK_CENTER        0x00010000
#define AV_CH_TOP_BACK_RIGHT         0x00020000
#define AV_CH_STEREO_LEFT            0x20000000  ///< Stereo downmix.
#define AV_CH_STEREO_RIGHT           0x40000000  ///< See AV_CH_STEREO_LEFT.

這樣，一個channel_layout就是上述channel mask的組合，部分定義如下：

#define AV_CH_LAYOUT_MONO              (AV_CH_FRONT_CENTER)
#define AV_CH_LAYOUT_STEREO            (AV_CH_FRONT_LEFT|AV_CH_FRONT_RIGHT)
#define AV_CH_LAYOUT_2POINT1           (AV_CH_LAYOUT_STEREO|AV_CH_LOW_FREQUENCY)
#define AV_CH_LAYOUT_2_1               (AV_CH_LAYOUT_STEREO|AV_CH_BACK_CENTER)
#define AV_CH_LAYOUT_SURROUND          (AV_CH_LAYOUT_STEREO|AV_CH_FRONT_CENTER)
#define AV_CH_LAYOUT_3POINT1           (AV_CH_LAYOUT_SURROUND|AV_CH_LOW_FREQUENCY)
#define AV_CH_LAYOUT_4POINT0           (AV_CH_LAYOUT_SURROUND|AV_CH_BACK_CENTER)
#define AV_CH_LAYOUT_4POINT1           (AV_CH_LAYOUT_4POINT0|AV_CH_LOW_FREQUENCY)
#define AV_CH_LAYOUT_2_2               (AV_CH_LAYOUT_STEREO|AV_CH_SIDE_LEFT|AV_CH_SIDE_RIGHT)
#define AV_CH_LAYOUT_QUAD              (AV_CH_LAYOUT_STEREO|AV_CH_BACK_LEFT|AV_CH_BACK_RIGHT)
#define AV_CH_LAYOUT_5POINT0           (AV_CH_LAYOUT_SURROUND|AV_CH_SIDE_LEFT|AV_CH_SIDE_RIGHT)
#define AV_CH_LAYOUT_5POINT1           (AV_CH_LAYOUT_5POINT0|AV_CH_LOW_FREQUENCY)
...

AV_CH_LAYOUT_STEREO是立體聲（2通道），其通道的存放順序為LEFT | RIGHT；AV_CH_LAYOUT_4POINT0是4通道，其通道的存放順序為
LEFT|RIGHT|FRONT-CENTER|BACK-CENTER；其它數量的聲道與此類似。
下面列舉一些和channel_layout相關的函式
* uint64_t av_get_channel_layout(const char *name) 根據傳入的字串，返回相對應的channel_layout。傳入的引數可以是：
* 常用的channel layout的名稱：mono,stereo,4.0,quad,5.0,5.0(side),5.1等。
* 一個單通道的名稱：FL,FR,FC,BL,BR,FLC,FRC等
* 通道的數量
* channel_layout mask,以”0x”開頭的十六進位制串。
更多詳細的說明，參見該函式的文件。
* int av_get_channel_layout_nb_channels(uint64_t channel_layout) 根據通道的layout返回通道的個數
* int64_t av_get_default_channel_layout(int nb_channels) 根據通道的個數返回預設的layout
* int av_get_channel_layout_channel_index(uint64_t channel_layout,uint64_t channel); 返回通道在layout中的index，也就是某一通道
在layout的儲存位置。
av_get_channel_layout_channel_index的實現如下：

int av_get_channel_layout_channel_index(uint64_t channel_layout,
                                        uint64_t channel)
{
    if (!(channel_layout & channel) ||
        av_get_channel_layout_nb_channels(channel) != 1)
        return AVERROR(EINVAL);
    channel_layout &= channel - 1;
    return av_get_channel_layout_nb_channels(channel_layout);
}

首先判斷傳入的layout包含該通道，並且保證該傳入的通道是一個單通道。
以4通道AV_CH_LAYOUT_4POINT0為例，說明下計算方法。AV_CH_LAYOUT_4POINT0 = AV_CH_FRONT_LEFT | AV_CH_FRONT_RIGHT | AV_CH_FRONT_CENTER | AV_CH_BACK_CENTER
其二進位制表示為0001,0000,0111，假如想找AV_CH_BACK_CENTER在該layout中的index。AV_CH_BACK_CENTER的十六進位制為0x0100，二進位制為0001,0000,0000，那麼
AV_CH_BACK_CENTER - 1 = 1111,1111。 0001,0000,0111 & 0000,1111,1111 = 0111，函式av_get_channel_layout_nb_channels是獲取某個layout對應的通道的數量，
前面提到，layout中值為1的位的個數和通道的數量相等，所以AV_CH_BACK_CENTER在layoutAV_CH_LAYOUT_4POINT0的index為3。

Audio 格式轉換

在FFmpeg中進行音訊的格式轉換主要有三個步驟
1. 例項化SwrContext，並設定轉換所需的引數：通道數量、channel layout、sample rate

有以下兩種方式來例項SwrContext，並設定引數：
* 使用swr_alloc

 SwrContext *swr = swr_alloc();
 av_opt_set_channel_layout(swr, "in_channel_layout",  AV_CH_LAYOUT_5POINT1, 0);
 av_opt_set_channel_layout(swr, "out_channel_layout", AV_CH_LAYOUT_STEREO,  0);
 av_opt_set_int(swr, "in_sample_rate",     48000,                0);
 av_opt_set_int(swr, "out_sample_rate",    44100,                0);
 av_opt_set_sample_fmt(swr, "in_sample_fmt",  AV_SAMPLE_FMT_FLTP, 0);
 av_opt_set_sample_fmt(swr, "out_sample_fmt", AV_SAMPLE_FMT_S16,  0);

使用 swr_alloc_set_opts

 SwrContext *swr = swr_alloc_set_opts(NULL,  // we're allocating a new context
                        AV_CH_LAYOUT_STEREO,  // out_ch_layout
                        AV_SAMPLE_FMT_S16,    // out_sample_fmt
                        44100,                // out_sample_rate
                        AV_CH_LAYOUT_5POINT1, // in_ch_layout
                        AV_SAMPLE_FMT_FLTP,   // in_sample_fmt
                        48000,                // in_sample_rate
                        0,                    // log_offset
                        NULL);                // log_ctx

上述兩種方法設定那個的引數是將5.1聲道，channel layout為AV_CH_LAYOUT_5POINT1，取樣率為48KHz轉換為2聲道，channel_layout為AV_SAMPLE_FMT_S16，取樣率為44.1KHz。
2. 計算轉換後的sample個數
轉後後的sample個數的計算公式為：src_nb_samples * dst_sample_rate / src_sample_rate，其計算如下：

int dst_nb_samples = av_rescale_rnd(swr_get_delay(swr_ctx, frame->sample_rate) + frame->nb_samples, frame->sample_rate, frame->sample_rate, AVRounding(1));

函式av_rescale_rnd是按照指定的舍入方式計算a * b / c 。
函式swr_get_delay得到輸入sample和輸出sample之間的延遲，並且其返回值的根據傳入的第二個引數不同而不同。如果是輸入的取樣率，則返回值是輸入sample個數；如果輸入的是輸出取樣率，則返回值是輸出sample個數。
3. 呼叫 swr_convert進行轉換

int nb = swr_convert(swr_ctx, &audio_buf, dst_nb_samples, (const uint8_t**)frame->data, frame->nb_samples);

其返回值為轉換的sample個數。

SDL播放音訊時的格式轉換

首先使用avcodec_send_packet和avcodec_receive_frame獲取解碼後的原始資料

    int ret = avcodec_send_packet(aCodecCtx, &pkt);
    if (ret < 0 && ret != AVERROR(EAGAIN) && ret != AVERROR_EOF)
        return -1;

    ret = avcodec_receive_frame(aCodecCtx, frame);
    if (ret < 0 && ret != AVERROR_EOF)
        return -1;

這裡不再使用avcodec_decode_audio4進行音訊的解碼，在FFmpeg3中該函式已被廢棄，使用avcodec_send_packet和avcodec_receive_frame替代。新的解碼API使用更為方便，
具體參見官方文件send/receive encoding and decoding API overview。

設定通道數量和channel layout
在編碼的時候有可能丟失通道數量或者channel layout ，這裡根據獲取的引數設定其預設值

    if (frame->channels > 0 && frame->channel_layout == 0)
        frame->channel_layout = av_get_default_channel_layout(frame->channels);
    else if (frame->channels == 0 && frame->channel_layout > 0)
        frame->channels = av_get_channel_layout_nb_channels(frame->channel_layout);

如果channel layout未知（channel_layout = 0），根據通道數量獲取其預設的channel layout；如同通道的數量未知，則根據其channel layout得到其通道數量。

設定輸出格式
由於SDL2的sample格式不支援浮點型（FFmpeg中是支援的浮點型的），這裡簡單的設定輸出格式為AV_SAMPLE_FMT_S16（16位有符號整型），輸出的channel layout也
根據通道數量設定為預設值 dst_layout = av_get_default_channel_layout(frame->channels)（SDL2不支援planar格式）。例項化SwrContext

    swr_ctx = swr_alloc_set_opts(nullptr, dst_layout, dst_format, frame->sample_rate,
        frame->channel_layout, (AVSampleFormat)frame->format, frame->sample_rate, 0, nullptr);
    if (!swr_ctx || swr_init(swr_ctx) < 0)
        return -1;

在設定完引數後，一定要呼叫swr_init進行初始化。

轉換

    // 計算轉換後的sample個數 a * b / c
    int dst_nb_samples = av_rescale_rnd(swr_get_delay(swr_ctx, frame->sample_rate) + frame->nb_samples, frame->sample_rate, frame->sample_rate, AVRounding(1));
    // 轉換，返回值為轉換後的sample個數
    int nb = swr_convert(swr_ctx, &audio_buf, dst_nb_samples, (const uint8_t**)frame->data, frame->nb_samples);
    data_size = frame->channels * nb * av_get_bytes_per_sample(dst_format);

最後data_size中儲存的是轉換的資料的位元組數：通道數 * sample個數 * 每個sample的位元組數。

總結

本文主要介紹了在FFmepg中對音訊兩個重要屬性：取樣格式和channel layout的表示方法，並簡單的實現了一個音訊的格式轉換。

取樣格式使用AVSampleFormat列舉值表示，並可分為planar和packed兩類。
channel layout 是一個64位的整數，表示各個通道資料的存放順序，其二進位制位中1的個數等於其通道的數量。

FFmpeg學習4：音訊格式轉換

AVSampleFormat

channel_layout

Audio 格式轉換

SDL播放音訊時的格式轉換

總結

FFmpeg學習4：音訊格式轉換

ffmpeg學習十三：影象資料格式的轉換與影象的縮放

ffmpeg命令操作音訊格式轉換

FFmpeg學習6：視音訊同步

iOS音訊格式轉換工具庫：ExtAudioConverter

ffmpeg學習八：軟體生成yuv420p視訊並將其編碼為H264格式

視音訊編解碼學習工程：AAC格式分析器

Bat腳本學習-4：Oracle自動備份還原腳本

PDF文件格式轉換攻略：PDF格式轉換圖片格式

Python3 學習4：使用Beautiful Soup爬取小說

C#實踐問題：日期格式轉換以及日期比較（日期函式使用大全）

【11】Caffe學習系列：影象資料轉換成db（leveldb/lmdb)檔案

PDF檔案格式轉換攻略：PDF格式轉換圖片格式

機器學習4：邏輯迴歸與線性迴歸

Django學習4：form, generic views

訊息中介軟體學習4：Kafka

JAVA課程學習七：帶格式字串反轉

MySQL初步學習4：處理大資料物件

CocosCreator學習4：Android平臺打包

IOS中的音訊格式轉換

FFmpeg學習4：音訊格式轉換

AVSampleFormat

channel_layout

Audio 格式轉換

SDL播放音訊時的格式轉換

總結

相關推薦