1. 程式人生 > >OGG音訊格式分析

OGG音訊格式分析

一.OGG音訊格式概述

Ogg是一個自由且開放標準的容器格式,由Xiph.Org基金會所維護。Ogg格式並不受到軟體專利的限制,並設計用於有效率地流媒體和處理高品質的數字多媒體。

Ogg”意指一種檔案格式,可以納入各式各樣自由和開放原始碼的編解碼器,包含音效、視訊、文字(像字幕)與元資料的處理。

Ogg的多媒體框架下,Theora提供有損的影象層面,而通常用音樂導向的Vorbis編解碼器作為音效層面。針對語音設計的壓縮編解碼器Speex和無損的音效壓縮編解碼器FLACOggPCM也可能作為音效層面使用。

Ogg”這個詞彙通常意指Ogg Vorbis此一音訊檔案格式,也就是將Vorbis

編碼的音效包含在Ogg的容器中所成的格式。在以往,.ogg此一副檔名曾經被用在任何Ogg支援格式下的內容;但在2007年,Xiph.Org基金會為了向後相容的考量,提出請求,將.ogg只留給Vorbis格式來使用。Xiph.Org基金會決定創造一些新的副檔名和媒體格式來描述不同型別的內容,像是隻包含音效所用的.oga、包含或不含聲音的影片(涵蓋Theora)所用的.ogv和程式所用的.ogx

OGGVobis(oggVorbis)是一種新的音訊壓縮格式,類似於MP3等的音樂格式。OggVobis是完全免費、開放和沒有專利限制的。OggVorbis檔案的副檔名是.OGGOgg檔案格式可以不斷地進行大小和音質的改良,而不影響舊有的編碼器或播放器。

OGG Vorbis有一個特點是支援多聲道。

二.OGG音訊格式剖析

1.OGG檔案的組織形式

OGG是以頁(page)為單位將邏輯流組織連結起來,每個頁都有pageheaderpagedata。如下圖1所示:

A*

B*

C*

..

A#

B#

C#

D*

D#

Bos   bos   bos              eos             eos    eos bos             eos

1 OGG檔案的組織形式

上圖中的檔案連結了兩個物理流,ABC三個邏輯流組成一個物理流,邏輯流D單獨是一個物理流。一個物理流中的所有邏輯流的

bos_page都必須在物理位置上相鄰,如圖1所示*A**B**C*三個bos_page的位置。

bosbeginning of stream;   

eosend of stream

2.OGG page頁結構

每個頁之間相互獨立,都包含了各自應有的資訊,頁的大小是可變的,通常為4K8KB,最大值不能超過65307bytes27255255*255=65307)。頁頭部格式如圖2

  0                  8                  16                 24               31

OggS

V

Header_type

Granule_position

Serial_number

Page_sequence

CRC_checksum

Num_segment

Segment_table

…………………………

…………………………

…………

payload

…………………………

2 OGG頁頭部結構

1)頁標識:ASCII字元,0x4f  'O'  0x67  'g'   0x67 'g'  0x53 'S'4個位元組大小,它標識著一個頁的開始。其作用是分離Ogg封裝格式還原媒體編碼時識別新頁的作用。

2)版本id:一般當前版本預設為01個位元組。

3)Header_type:標識當前的頁的型別,1個位元組,

0x01:本頁媒體編碼資料與前一頁屬於同一個邏輯流的同一個packet,若此位沒有設,表示本頁是以一個新的packet開始的;

0x02:表示該頁為邏輯流的第一頁,bos標識,如果此位未設定,那表示不是第一頁;

0x04:表示該頁位邏輯流的最後一頁,eos標識,如果此位未設定,那表示本頁不是最後一頁。

4)Granule_position:媒體編碼相關的引數資訊,8個位元組,對於音訊流來說,它儲存著到本頁為止邏輯流在PCM輸出中取樣碼的數目,可以由它來算得時間戳。對於視訊流來說,它儲存著到本頁為止視訊幀編碼的數目。若此值為-1,那表示截止到本頁,邏輯流的packet未結束。(小端)

5)Serial_number:當前頁中的流的id4個位元組,它是區分本頁所屬邏輯流與其他邏輯流的序號,我們可以通過這個值來劃分流。(小端)

6)Page_seguence:本頁在邏輯流的序號,4個位元組。OGG解碼器能據此識別有無頁丟失。

7)CRC_cbecksum:迴圈冗餘校驗碼校驗和,4個位元組,包含頁的32bit CRC校驗和(包括頭部零CRC校驗和頁資料校驗),它的產生多項式為:0x04c11db7

8)Num _segments:給定本頁在segment_table域中出現的segement個數,1個位元組。其最大值為255.頁最大物理尺寸為65307bytes,小於64KB

9)Segment_table:從字面看它就是一個表,表示著每個segment的長度,取值範圍是0~255

segment可以得到packet的值,每個packet的大小是以最後一個不等於255segment結束的,從頁頭中的segment_table可以得到每個packet長度,舉例:如果一組segment依次順序為FF 45 FF FF FF 40 FF 5 FF FF FF66,那麼第一個packet的長度為255+69 = 324,第二個packet大小829,同理。

頁頭基本上就是由上述的引數組成,由此我們可以得到頁頭的長度和整個頁的長度:

header_size  = 27+Num_segments;byte

page_size = header_size +segment_table中每個segment的大小;

3.OGG封裝處理過程(附)

1)音視訊編碼在提供給Ogg封裝之前是以具有包邊界的“Packets”形式呈現的,包邊界依賴於具體的編碼格式。如圖3所示。

2)將邏輯流的各個包進行分片segmentation,每片大小固定為255Byte,但包的最後一個segment通常小於255位元組。因為packet的大小可以是任意長度,由具體的媒體編碼器來決定。

3)進行頁封裝,每頁都被加上頁頭,每頁的長度可不等,由具體情況而確定。頁頭部segment_table域告知了lacing_value”值的大小,即頁中最後一個segment的長度(可以為0,或小於255)。一次處理一個packet,此packet被封裝成一個或多個page頁(page的長度設定了上限,一般為4kB);下一個packet必須用新的page開始封裝,由首部欄位域header_type_flag的設定規定來表示。

多個已被頁格式封裝好的邏輯流(如語音、文字、圖片、音訊、視訊等)按應用要求的時序關係合成物理流。

Logical bitstream with packet boundaries
 -----------------------------------------------------------------
 > |      packet_1            | packet_2         | packet_3 | <
 -----------------------------------------------------------------

                                        |segmentation(logically only)
                    v

packet_1 (5segments)          packet_2 (4segs)    p_3 (2 segs)
     ------------------------------ --------------------------------
 ..  |seg_1|seg_2|seg_3|seg_4|s_5 | |seg_1|seg_2|seg_3|| |seg_1|s_2 |..
     ------------------------------ --------------------------------

                                | page encapsulation
                    v

page_1 (packet_1 data)   page_2 (pket_1data)   page_3 (packet_2 data)
------------------------  ----------------  ------------------------
|H|------------------- |  |H|----------- |  |H|------------------- |
|D||seg_1|seg_2|seg_3| |  |D|seg_4|s_5 | |  |D||seg_1|seg_2|seg_3| | …
|R|------------------- |  |R|----------- |  |R|------------------- |
------------------------  ----------------  ------------------------

|
pages of            |
other    --------|  |
logical         -------
bitstreams      | MUX |
               -------
                  |
                  v

page_1 page_2          page_3
      ------  ------  ------- -----  -------
 …  ||   |  ||   | ||    |  ||  |  ||    |  …
      ------  ------  ------- -----  -------
             physical Ogg bitstream

3 OGG封裝流程示意圖

4.OGG Vorbis位元流結構

Vorbis位元流是以三個資料包頭開始的。這些頭資料包按順序依次是:The identification headerThe comment header和設定資料包。這些都與解碼Vorbis音訊檔案密切相關的。

1)資料包頭結構

每個資料包都是以同樣的頭結構開始的:

u[packet_type] : 8 bit value

u0x76, 0x6f, 0x72, 0x62, 0x69, 0x73: the characters'v','o','r','b','i','s' as six octets

2)The identification header

The identificationheader identifies the bitstream as Vorbis, Vorbis

version, and the simpleaudio characteristics of the stream such as sample rate and number of channels.

u[vorbis_version] = read 32 bits as unsigned integer

u[audio_channels] = read 8 bit integer as unsigned必須大於0

u[audio_sample_rate] = read 32 bits as unsigned integer必須大於0

u[bitrate_maximum] = read 32 bits as signed integer

u[bitrate_nominal] = read 32 bits as signed integer

u[bitrate_minimum] = read 32 bits as signed integer

u[blocksize_0] = 2 exponent (read 4 bits as unsigned integer)必須小於等於[blocksize_1]

u[blocksize_1] = 2 exponent (read 4 bits as unsigned integer)

u[framing_flag] = read one bit不能為0

Thebitrate fields above are used only as hints. The nominal bitrate fieldespecially may be considerably of in purely VBR streams. The fields aremeaningful only when greater than zero.

a)All three fields set to thesame value implies a fixed rate, or tightly bounded, nearly fixed-ratebitstream

b)Only nominal set implies a VBRor ABR stream that averages the nominal bitrate

c)Maximum and or minimum setimplies a VBR bitstream that obeys the bitrate limits

d)None set indicates the encoderdoes not care to speculate.

3)The comment header

Thecomment header includes user text comments (\tags") and a vendor stringfor the application/library that produced the bitstream.

Thecomment header is logically a list of eight-bit-clean vectors; the number ofvectors is bounded to 232 .. 1 and the length of each vector is limited to 232.. 1 bytes. The vector length is encoded; the vector contents themselves arenot null terminated. In addition to the vector list, there is a single vectorfor vendor name (also 8 bit clean, length encoded in 32 bits). For example, the1.0 release of libvorbis set the vendor string to \Xiph.Org libVorbis I20020717".

The vector lengths and number of vectors are stored lsbfirst, according to the bit packing conventions of the vorbis codec. However,since data in the comment header is octetaligned,they can simply be read asunaligned 32 bit little endian unsigned integers

 The comment vectors are structured similarlyto a UNIX environment variable. That is,comment fields consist of a field nameand a corresponding value and look like:

1 comment[0]="ARTIST=me";

2comment[1]="TITLE=the sound of Vorbis";

The fieldname is case-insensitive and may consist of ASCII 0x20 through 0x7D, 0x3D ('=')excluded. ASCII 0x41 through 0x5A inclusive (characters A-Z) is to beconsidered equivalent to ASCII 0x61 through 0x7A inclusive (characters a-z).Thefield name is immediately followed by ASCII 0x3D ('=');

thisequals sign is used to terminate the field name.0x3D is followed by 8 bit cleanUTF-8 encoded value of the field contents to the end of the field.Field namesBelow is a proposed, minimal list of standard field names with a description ofintended use. No single or group of field names is mandatory; a comment headermay contain one, all or none of the names in this list.

uTITLE Track/Work name

uVERSION The version field may be used to differentiate multipleversions of the same track title in a single collection. (e.g. remix info)

uALBUM The collection name to which this track belongs

uTRACKNUMBER The track number of this piece if part of a specific largercollection or album

uARTIST The artist generally considered responsible for the work. Inpopular music this is usually the performing band or singer. For classicalmusic it would be the composer.For an audio book it would be the author of theoriginal text.

uPERFORMER The artist(s) who performed the work. In classical musicthis would be the conductor, orchestra, soloists. In an audio book it would bethe actor who did the reading. In popular music this is typically the same asthe ARTIST and is omitted.

uCOPYRIGHT Copyright attribution.

uLICENSE License information, eg, 'All Rights Reserved', 'Any UsePermitted'.

uORGANIZATION Name of the organization producing the track (i.e. the'record label')

uDESCRIPTION A short text description of the contents

uGENRE A short text indication of music genre

uDATE Date the track was recorded

uLOCATION Location where track was recorded

uCONTACT Contact information for the creators or distributors of thetrack. This could be a URL, an email address, the physical address of the producinglabel.

uISRC International Standard Recording Code for the track; see theISRC intro page for more information on ISRC numbers.

Hint: Field names are not required to beunique (occur once) within a comment header. As

an example, assume a track was recorded bythree well know artists; the following is

permissible, and encouraged:

1 ARTIST=Dizzy Gillespie

2 ARTIST=Sonny Rollins

3 ARTIST=Sonny Stitt

4)Setup Header

The setupheader includes extensive CODEC setup information as well as the complete VQand Hu man codebooks needed for decode.

Thesetup header contains, in order, the lists of codebook configurations,time-domain transform configurations (placeholders in Vorbis I), floorconfigurations, residue configurations,channel mapping configurations and modeconfigurations. It finishes with a framing bit of '1'. 如下圖: