基於FFmpeg的封裝格式MP4(TS)

阿新 • • 發佈：2019-02-05

一、封裝MP4原理：

每一幀音訊或視訊都有一個持續時間：duration：
取樣頻率是指將模擬聲音波形進行數字化時，每秒鐘抽取聲波幅度樣本的次數。
。正常人聽覺的頻率範圍大約在20Hz~20kHz之間，根據奈奎斯特取樣理論，為了保證聲音不失真，取樣頻率應該在40kHz左右。常用的音訊取樣頻率有8kHz、

11.025kHz、22.05kHz、16kHz、37.8kHz、44.1kHz、48kHz等，如果採用更高的取樣頻率，還可以達到DVD的音質
對取樣率為44.1kHz的AAC音訊進行解碼時，一幀的解碼時間須控制在23.22毫秒內。
背景知識:
(一個AAC原始幀包含一段時間內1024個取樣及相關資料)
分析：
1) AAC
音訊幀的播放時間=一個AAC幀對應的取樣樣本的個數/取樣頻率(單位為s)
一幀 1024個 sample。取樣率 Samplerate 44100KHz，每秒44100個sample, 所以根據公式音訊幀的播放時間=一個AAC幀對應的取樣樣本的個數/取樣頻率
當前AAC一幀的播放時間是= 1024*1000000/44100= 22.32ms(單位為ms)
2) MP3
mp3 每幀均為1152個位元組，則：
frame_duration = 1152 * 1000000 / sample_rate
例如：sample_rate = 44100HZ時，計算出的時長為26.122ms，這就是經常聽到的mp3每幀播放時間固定為26ms的由來。
3)H264
視訊的播放時間跟幀率有關 frame_duration = 1000/fps
例如：fps = 25.00 ，計算出來的時常為40ms，這就是同行所說的40ms一幀視訊資料。

理論上的音視訊(播放)同步是這樣的：
由此得到了每一幀資料的持續時間，音視訊交叉儲存在容器中：一個時間軸：
時間軸：0   22.32   40     44.62    66.96    80     89.16      111.48    120       ................
音   頻：0   22.32            44.62    66.96             89.16      111.48                ................
視   頻：0              40                              80                                   120       ................
即視訊的持續時間相加和音訊的持續時間相加作比較，誰小寫入哪個。

但實際情況(播放)是不成立的

1：首先解決一個問題

為什麼不音訊播音訊的視訊播視訊的即上面的到第22.32ms播一幀音訊，到40ms播一幀視訊。

因為這個22.32ms 或40ms是算不準的或者說和音效卡播的時間是不一樣的。這裡就需要知道音效卡播一幀/或者說播放一個buf音訊需要多長時間。

2：音效卡每次播一個取樣點而不是一幀。聲音當一個取樣點丟失了都可以聽出來，視訊則不然。

3：音視訊同步方式：1----回撥方式

假設音效卡有兩塊快取都是存放要播放的聲音pcm的一直在播放"B"buf 首先確定幾點

(1)buf大小是固定的這樣播放一個buf的時間就是固定的，假設30ms;

(2)當buf“B”播放完畢即buf用完，再播放buf“A",保證音訊pcm一直都連續

(3)當一個buf播放完畢,那說明系統(音效卡)過了30ms, 這時候有可能真正的時間過了40ms(這裡不用關心),這裡則通過回撥得到一次時間30ms;

(4)再去用視訊對應音訊的30ms,這時候的時間就是準確的：

時間軸：0                   30                         60                         90                                       120       ................
音   頻：0    22.32                 44.62                 66.96     89.16                       111.48                    ................
視   頻：0                         40                                    80                                                 120       ................

(5)這裡有個問題就是視訊中 30ms 到40ms 這中間的10ms是怎麼算出來的，這個是不用關心的，因為人的眼睛10ms是看不出來的，

即當音訊的30ms一次回撥時，就可以播放第二幀視訊，如上圖

第一次回撥(30ms)---播(40ms)視訊，

第一次回撥(60ms)---播(80ms)視訊，

第一次回撥(90ms)---不播視訊，

第一次回撥(120ms)---播(120ms)視訊。

4：音視訊同步方式：1----阻塞方式

還是看上面的圖

(1)buf"B"一直在播放，傳入buf"A"的外部buf把資料給buf"A"後不立即返回，等到buf"B"播放完成再返回，

這時從傳入到經過阻塞出來就是一個buf的時間例如上面的30ms。

(2)然後buf"A"一直在播放，傳入buf"B"的外部buf把資料給buf"B"後不立即返回，等到buf"A"播放完成再返回，

這時從傳入到經過阻塞出來就是一個buf的時間例如上面的30ms。

(3)迴圈上面(1)(2),即得到了如回撥方式同樣的那個30ms時間。下面和回撥方式一樣，見回撥方式(4)(5)。

二、基於FFmpeg的封裝格式處理：

本文記錄一個基於FFmpeg的視音訊複用器（Simplest FFmpeg muxer）。視音訊複用器（Muxer）即是將視訊壓縮資料（例如H.264）和音訊壓縮資料（例如AAC）合併到一個封裝格式資料（例如MKV）中去。如圖所示。在這個過程中並不涉及到編碼和解碼。

本文記錄的程式將一個H.264編碼的視訊碼流檔案和一個MP3編碼的音訊碼流檔案，合成為一個MP4封裝格式的檔案。

流程

程式的流程如下圖所示。從流程圖中可以看出，一共初始化了3個AVFormatContext，其中2個用於輸入，1個用於輸出。3個AVFormatContext初始化之後，通過avcodec_copy_context()函式可以將輸入視訊/音訊的引數拷貝至輸出視訊/音訊的AVCodecContext結構體。然後分別呼叫視訊輸入流和音訊輸入流的av_read_frame()，從視訊輸入流中取出視訊的AVPacket，音訊輸入流中取出音訊的AVPacket，分別將取出的AVPacket寫入到輸出檔案中即可。其間用到了一個不太常見的函式av_compare_ts()，是比較時間戳用的。通過該函式可以決定該寫入視訊還是音訊。

本文介紹的視音訊複用器，輸入的視訊不一定是H.264裸流檔案，音訊也不一定是純音訊檔案。可以選擇兩個封裝過的視音訊檔案作為輸入。程式會從視訊輸入檔案中“挑”出視訊流，音訊輸入檔案中“挑”出音訊流，再將“挑選”出來的視音訊流複用起來。
PS1：對於某些封裝格式（例如MP4/FLV/MKV等）中的H.264，需要用到名稱為“h264_mp4toannexb”的bitstream filter。
PS2：對於某些封裝格式（例如MP4/FLV/MKV等）中的AAC，需要用到名稱為“aac_adtstoasc”的bitstream filter。

簡單介紹一下流程中各個重要函式的意義：

avformat_open_input()：開啟輸入檔案。
avcodec_copy_context()：賦值AVCodecContext的引數。
avformat_alloc_output_context2()：初始化輸出檔案。
avio_open()：開啟輸出檔案。
avformat_write_header()：寫入檔案頭。
av_compare_ts()：比較時間戳，決定寫入視訊還是寫入音訊。這個函式相對要少見一些。
av_read_frame()：從輸入檔案讀取一個AVPacket。
av_interleaved_write_frame()：寫入一個AVPacket到輸出檔案。
av_write_trailer()：寫入檔案尾。

程式碼

下面貼上程式碼：

 /** 
 * 最簡單的基於FFmpeg的視音訊複用器 
 * Simplest FFmpeg Muxer 
 * 本程式可以將視訊碼流和音訊碼流打包到一種封裝格式中。 
 * 程式中將AAC編碼的音訊碼流和H.264編碼的視訊碼流打包成 
 * MPEG2TS封裝格式的檔案。 
 * 需要注意的是本程式並不改變視音訊的編碼格式。 
 * 
 * This software mux a video bitstream and a audio bitstream  
 * together into a file. 
 * In this example, it mux a H.264 bitstream (in MPEG2TS) and  
 * a AAC bitstream file together into MP4 format file. 
 * 
 */ 
 
#include <stdio.h>  
 
#define __STDC_CONSTANT_MACROS  
 
#ifdef _WIN32  
//Windows  
extern "C"  
{  
#include "libavformat/avformat.h"  
};  
#else  
//Linux...  
#ifdef __cplusplus  
extern "C"  
{  
#endif  
#include <libavformat/avformat.h>  
#ifdef __cplusplus  
};  
#endif  
#endif  
  
/* 
FIX: H.264 in some container format (FLV, MP4, MKV etc.) need  
"h264_mp4toannexb" bitstream filter (BSF) 
  *Add SPS,PPS in front of IDR frame 
  *Add start code ("0,0,0,1") in front of NALU 
H.264 in some container (MPEG2TS) don't need this BSF. 
*/  
//'1': Use H.264 Bitstream Filter   
#define USE_H264BSF 0  
  
/* 
FIX:AAC in some container format (FLV, MP4, MKV etc.) need  
"aac_adtstoasc" bitstream filter (BSF) 
*/  
//'1': Use AAC Bitstream Filter   
#define USE_AACBSF 0  
  
  
  
int main(int argc, char* argv[])  
{  
    AVOutputFormat *ofmt = NULL;  
    //Input AVFormatContext and Output AVFormatContext  
    AVFormatContext *ifmt_ctx_v = NULL, *ifmt_ctx_a = NULL,*ofmt_ctx = NULL;  
    AVPacket pkt;  
    int ret, i;  
    int videoindex_v=-1,videoindex_out=-1;  
    int audioindex_a=-1,audioindex_out=-1;  
    int frame_index=0;  
    int64_t cur_pts_v=0,cur_pts_a=0;  
  
    //const char *in_filename_v = "cuc_ieschool.ts";//Input file URL  
    const char *in_filename_v = "cuc_ieschool.h264";  
    //const char *in_filename_a = "cuc_ieschool.mp3";  
    //const char *in_filename_a = "gowest.m4a";  
    //const char *in_filename_a = "gowest.aac";  
    const char *in_filename_a = "huoyuanjia.mp3";  
  
    const char *out_filename = "cuc_ieschool.mp4";//Output file URL  
    av_register_all();  
    //Input  
    if ((ret = avformat_open_input(&ifmt_ctx_v, in_filename_v, 0, 0)) < 0) {  
        printf( "Could not open input file.");  
        goto end;  
    }  
    if ((ret = avformat_find_stream_info(ifmt_ctx_v, 0)) < 0) {  
        printf( "Failed to retrieve input stream information");  
        goto end;  
    }  
  
    if ((ret = avformat_open_input(&ifmt_ctx_a, in_filename_a, 0, 0)) < 0) {  
        printf( "Could not open input file.");  
        goto end;  
    }  
    if ((ret = avformat_find_stream_info(ifmt_ctx_a, 0)) < 0) {  
        printf( "Failed to retrieve input stream information");  
        goto end;  
    }  
    printf("===========Input Information==========\n");  
    av_dump_format(ifmt_ctx_v, 0, in_filename_v, 0);  
    av_dump_format(ifmt_ctx_a, 0, in_filename_a, 0);  
    printf("======================================\n");  
    //Output  
    avformat_alloc_output_context2(&ofmt_ctx, NULL, NULL, out_filename);  
    if (!ofmt_ctx) {  
        printf( "Could not create output context\n");  
        ret = AVERROR_UNKNOWN;  
        goto end;  
    }  
    ofmt = ofmt_ctx->oformat;  
  
    for (i = 0; i < ifmt_ctx_v->nb_streams; i++) {  
        //Create output AVStream according to input AVStream  
        if(ifmt_ctx_v->streams[i]->codec->codec_type==AVMEDIA_TYPE_VIDEO){  
        AVStream *in_stream = ifmt_ctx_v->streams[i];  
        AVStream *out_stream = avformat_new_stream(ofmt_ctx, in_stream->codec->codec);  
        videoindex_v=i;  
        if (!out_stream) {  
            printf( "Failed allocating output stream\n");  
            ret = AVERROR_UNKNOWN;  
            goto end;  
        }  
        videoindex_out=out_stream->index;  
        //Copy the settings of AVCodecContext  
        if (avcodec_copy_context(out_stream->codec, in_stream->codec) < 0) {  
            printf( "Failed to copy context from input to output stream codec context\n");  
            goto end;  
        }  
        out_stream->codec->codec_tag = 0;  
        if (ofmt_ctx->oformat->flags & AVFMT_GLOBALHEADER)  
            out_stream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;  
        break;  
        }  
    }  
  
    for (i = 0; i < ifmt_ctx_a->nb_streams; i++) {  
        //Create output AVStream according to input AVStream  
        if(ifmt_ctx_a->streams[i]->codec->codec_type==AVMEDIA_TYPE_AUDIO){  
            AVStream *in_stream = ifmt_ctx_a->streams[i];  
            AVStream *out_stream = avformat_new_stream(ofmt_ctx, in_stream->codec->codec);  
            audioindex_a=i;  
            if (!out_stream) {  
                printf( "Failed allocating output stream\n");  
                ret = AVERROR_UNKNOWN;  
                goto end;  
            }  
            audioindex_out=out_stream->index;  
            //Copy the settings of AVCodecContext  
            if (avcodec_copy_context(out_stream->codec, in_stream->codec) < 0) {  
                printf( "Failed to copy context from input to output stream codec context\n");  
                goto end;  
            }  
            out_stream->codec->codec_tag = 0;  
            if (ofmt_ctx->oformat->flags & AVFMT_GLOBALHEADER)  
                out_stream->codec->flags |= CODEC_FLAG_GLOBAL_HEADER;  
  
            break;  
        }  
    }  
  
    printf("==========Output Information==========\n");  
    av_dump_format(ofmt_ctx, 0, out_filename, 1);  
    printf("======================================\n");  
    //Open output file  
    if (!(ofmt->flags & AVFMT_NOFILE)) {  
        if (avio_open(&ofmt_ctx->pb, out_filename, AVIO_FLAG_WRITE) < 0) {  
            printf( "Could not open output file '%s'", out_filename);  
            goto end;  
        }  
    }  
    //Write file header  
    if (avformat_write_header(ofmt_ctx, NULL) < 0) {  
        printf( "Error occurred when opening output file\n");  
        goto end;  
    }  
  
  
    //FIX  
#if USE_H264BSF  
    AVBitStreamFilterContext* h264bsfc =  av_bitstream_filter_init("h264_mp4toannexb");   
#endif  
#if USE_AACBSF  
    AVBitStreamFilterContext* aacbsfc =  av_bitstream_filter_init("aac_adtstoasc");   
#endif  
  
    while (1) {  
        AVFormatContext *ifmt_ctx;  
        int stream_index=0;  
        AVStream *in_stream, *out_stream;  
  
        //Get an AVPacket  
        if(av_compare_ts(cur_pts_v,ifmt_ctx_v->streams[videoindex_v]->time_base,cur_pts_a,ifmt_ctx_a->streams[audioindex_a]->time_base) <= 0){  
            ifmt_ctx=ifmt_ctx_v;  
            stream_index=videoindex_out;  
  
            if(av_read_frame(ifmt_ctx, &pkt) >= 0){  
                do{  
                    in_stream  = ifmt_ctx->streams[pkt.stream_index];  
                    out_stream = ofmt_ctx->streams[stream_index];  
  
                    if(pkt.stream_index==videoindex_v){  
                        //FIX：No PTS (Example: Raw H.264)  
                        //Simple Write PTS  
                        if(pkt.pts==AV_NOPTS_VALUE){  
                            //Write PTS  
                            AVRational time_base1=in_stream->time_base;  
                            //Duration between 2 frames (us)  
                            int64_t calc_duration=(double)AV_TIME_BASE/av_q2d(in_stream->r_frame_rate);  
                            //Parameters  
                            pkt.pts=(double)(frame_index*calc_duration)/(double)(av_q2d(time_base1)*AV_TIME_BASE);  
                            pkt.dts=pkt.pts;  
                            pkt.duration=(double)calc_duration/(double)(av_q2d(time_base1)*AV_TIME_BASE);  
                            frame_index++;  
                        }  
  
                        cur_pts_v=pkt.pts;  
                        break;  
                    }  
                }while(av_read_frame(ifmt_ctx, &pkt) >= 0);  
            }else{  
                break;  
            }  
        }else{  
            ifmt_ctx=ifmt_ctx_a;  
            stream_index=audioindex_out;  
            if(av_read_frame(ifmt_ctx, &pkt) >= 0){  
                do{  
                    in_stream  = ifmt_ctx->streams[pkt.stream_index];  
                    out_stream = ofmt_ctx->streams[stream_index];  
  
                    if(pkt.stream_index==audioindex_a){  
  
                        //FIX：No PTS  
                        //Simple Write PTS  
                        if(pkt.pts==AV_NOPTS_VALUE){  
                            //Write PTS  
                            AVRational time_base1=in_stream->time_base;  
                            //Duration between 2 frames (us)  
                            int64_t calc_duration=(double)AV_TIME_BASE/av_q2d(in_stream->r_frame_rate);  
                            //Parameters  
                            pkt.pts=(double)(frame_index*calc_duration)/(double)(av_q2d(time_base1)*AV_TIME_BASE);  
                            pkt.dts=pkt.pts;  
                            pkt.duration=(double)calc_duration/(double)(av_q2d(time_base1)*AV_TIME_BASE);  
                            frame_index++;  
                        }  
                        cur_pts_a=pkt.pts;  
  
                        break;  
                    }  
                }while(av_read_frame(ifmt_ctx, &pkt) >= 0);  
            }else{  
                break;  
            }  
  
        }  
  
        //FIX:Bitstream Filter  
#if USE_H264BSF  
        av_bitstream_filter_filter(h264bsfc, in_stream->codec, NULL, &pkt.data, &pkt.size, pkt.data, pkt.size, 0);  
#endif  
#if USE_AACBSF  
        av_bitstream_filter_filter(aacbsfc, out_stream->codec, NULL, &pkt.data, &pkt.size, pkt.data, pkt.size, 0);  
#endif  
  
  
        //Convert PTS/DTS  
        pkt.pts = av_rescale_q_rnd(pkt.pts, in_stream->time_base, out_stream->time_base, (AVRounding)(AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX));  
        pkt.dts = av_rescale_q_rnd(pkt.dts, in_stream->time_base, out_stream->time_base, (AVRounding)(AV_ROUND_NEAR_INF|AV_ROUND_PASS_MINMAX));  
        pkt.duration = av_rescale_q(pkt.duration, in_stream->time_base, out_stream->time_base);  
        pkt.pos = -1;  
        pkt.stream_index=stream_index;  
  
        printf("Write 1 Packet. size:%5d\tpts:%lld\n",pkt.size,pkt.pts);  
        //Write  
        if (av_interleaved_write_frame(ofmt_ctx, &pkt) < 0) {  
            printf( "Error muxing packet\n");  
            break;  
        }  
        av_free_packet(&pkt);  
  
    }  
    //Write file trailer  
    av_write_trailer(ofmt_ctx);  
  
#if USE_H264BSF  
    av_bitstream_filter_close(h264bsfc);  
#endif  
#if USE_AACBSF  
    av_bitstream_filter_close(aacbsfc);  
#endif  
  
end:  
    avformat_close_input(&ifmt_ctx_v);  
    avformat_close_input(&ifmt_ctx_a);  
    /* close output */  
    if (ofmt_ctx && !(ofmt->flags & AVFMT_NOFILE))  
        avio_close(ofmt_ctx->pb);  
    avformat_free_context(ofmt_ctx);  
    if (ret < 0 && ret != AVERROR_EOF) {  
        printf( "Error occurred.\n");  
        return -1;  
    }  
    return 0;  
}

結果

輸入檔案為：
視訊：cuc_ieschool.ts

音訊：huoyuanjia.mp3

輸出檔案為：
cuc_ieschool.mp4
輸出的檔案視訊為“cuc_ieschool”，配合“霍元甲”的音訊。

基於FFmpeg的封裝格式MP4(TS)

二、基於FFmpeg的封裝格式處理：

流程

程式碼

結果

基於FFmpeg的封裝格式MP4(TS)

最簡單的基於FFmpeg的封裝格式處理視音訊複用器 muxer

最簡單的基於FFmpeg的封裝格式處理視音訊分離器簡化版（demuxer-simple）

使用ffmpeg開源庫將h264封裝為mp4格式

ffmpeg:將YUV原始資料編碼封裝為mp4格式

最簡單的基於FFmpeg的封裝格式處理：視音訊分離器簡化版（demuxer-simple）

最簡單的基於FFMPEG的封裝格式轉換器（無編解碼）

[轉]多媒體封裝格式詳解---MP4

FFMPEG實現對AAC解碼(不採用封裝格式實現)

FFMPEG實現對AAC解碼(採用封裝格式實現)

FFMPEG實現PCM編碼(不採用封裝格式實現)

FFMPEG實現PCM編碼(採用封裝格式實現)

MP4封裝格式

ISO/IEC 15444-12 MP4 封裝格式標準摘錄 3

ISO/IEC 15444-12 MP4 封裝格式標準摘錄 4

android平臺下基於ffmpeg對相機採集的NV21資料編碼為MP4視訊檔案

呼叫FFmpeg SDK解析封裝格式的視訊為音訊流和視訊流

H264編碼封裝成MP4格式

TS封裝格式

FFmpeg，H.264，Directshow，和opencv及視訊編碼與封裝格式

基於FFmpeg的封裝格式MP4(TS)

二 、基於FFmpeg的封裝格式處理：

流程

程式碼

結果

相關推薦

二、基於FFmpeg的封裝格式處理：