http-parser解析http報文詳解
說明
專案裡用到力http-parser,在這裡簡單說明一下其用法吧
下載地址:https://github.com/joyent/http-parser
其使用說明很詳細。
開源用例
開源tcpflow 1.4.4中使用http-parser的原始碼
<span xmlns="http://www.w3.org/1999/xhtml" style="">/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ /** * * scan_http: * Decodes HTTP responses */ #include "config.h" #include "tcpflow.h" #include "tcpip.h" #include "tcpdemux.h" #include "http-parser/http_parser.h" #include "mime_map.h" #ifdef HAVE_SYS_WAIT_H #include <sys/wait.h> #endif #ifdef HAVE_LIBZ # define ZLIB_CONST # ifdef GNUC_HAS_DIAGNOSTIC_PRAGMA # pragma GCC diagnostic ignored "-Wundef" # pragma GCC diagnostic ignored "-Wcast-qual" # endif # ifdef HAVE_ZLIB_H # include <zlib.h> # endif #else # define z_stream void * // prevents z_stream from generating an error #endif #define MIN_HTTP_BUFSIZE 80 // don't bother parsing smaller than this #include <sys/types.h> #include <iostream> #include <algorithm> #include <map> #include <iomanip> #define HTTP_CMD "http_cmd" #define HTTP_ALERT_FD "http_alert_fd" /* options */ std::string http_cmd; // command to run on each http object int http_subproc_max = 10; // how many subprocesses are we allowed? int http_subproc = 0; // how many do we currently have? int http_alert_fd = -1; // where should we send alerts? /* define a callback object for sharing state between scan_http() and its callbacks */ class scan_http_cbo { private: typedef enum {NOTHING,FIELD,VALUE} last_on_header_t; scan_http_cbo(const scan_http_cbo& c); // not implemented scan_http_cbo &operator=(const scan_http_cbo &c); // not implemented public: virtual ~scan_http_cbo(){ on_message_complete(); // make sure message was ended } scan_http_cbo(const std::string& path_,const char *base_,std::stringstream *xmlstream_) : path(path_), base(base_),xmlstream(xmlstream_),xml_fo(),request_no(0), headers(), last_on_header(NOTHING), header_value(), header_field(), output_path(), fd(-1), first_body(true),bytes_written(0),unzip(false),zs(),zinit(false),zfail(false){}; private: const std::string path; // where data gets written const char *base; // where data started in memory std::stringstream *xmlstream; // if present, where to put the fileobject annotations std::stringstream xml_fo; // xml stream for this file object int request_no; // request number /* parsed headers */ std::map<std::string, std::string> headers; /* placeholders for possibly-incomplete header data */ last_on_header_t last_on_header; std::string header_value, header_field; std::string output_path; int fd; // fd for writing bool first_body; // first call to on_body after headers uint64_t bytes_written; /* decompression for gzip-encoded streams. */ bool unzip; // should we be decompressing? z_stream zs; // zstream (avoids casting and memory allocation) bool zinit; // we have initialized the zstream bool zfail; // zstream failed in some manner, so ignore the rest of this stream /* The static functions are callbacks; they wrap the method calls */ #define CBO (reinterpret_cast<scan_http_cbo*>(parser->data)) public: static int scan_http_cb_on_message_begin(http_parser * parser) { return CBO->on_message_begin();} static int scan_http_cb_on_url(http_parser * parser, const char *at, size_t length) { return 0;} static int scan_http_cb_on_header_field(http_parser * parser, const char *at, size_t length) { return CBO->on_header_field(at,length);} static int scan_http_cb_on_header_value(http_parser * parser, const char *at, size_t length) { return CBO->on_header_value(at,length); } static int scan_http_cb_on_headers_complete(http_parser * parser) { return CBO->on_headers_complete();} static int scan_http_cb_on_body(http_parser * parser, const char *at, size_t length) { return CBO->on_body(at,length);} static int scan_http_cb_on_message_complete(http_parser * parser) {return CBO->on_message_complete();} #undef CBO private: int on_message_begin(); int on_url(const char *at, size_t length); int on_header_field(const char *at, size_t length); int on_header_value(const char *at, size_t length); int on_headers_complete(); int on_body(const char *at, size_t length); int on_message_complete(); }; /** * on_message_begin: * Increment request nubmer. Note that the first request is request_no = 1 */ int scan_http_cbo::on_message_begin() { request_no ++; return 0; } /** * on_url currently not implemented. */ int scan_http_cbo::on_url(const char *at, size_t length) { return 0; } /* Note 1: The state machine is defined in http-parser/README.md * Note 2: All header field names are converted to lowercase. * This is consistent with the RFC. */ int scan_http_cbo::on_header_field(const char *at,size_t length) { std::string field(at,length); std::transform(field.begin(), field.end(), field.begin(), ::tolower); switch(last_on_header){ case NOTHING: // Allocate new buffer and copy callback data into it header_field = field; break; case VALUE: // New header started. // Copy current name,value buffers to headers // list and allocate new buffer for new name headers[header_field] = header_value; header_field = field; break; case FIELD: // Previous name continues. Reallocate name // buffer and append callback data to it header_field.append(field); break; } last_on_header = FIELD; return 0; } int scan_http_cbo::on_header_value(const char *at, size_t length) { const std::string value(at,length); switch(last_on_header){ case FIELD: //Value for current header started. Allocate //new buffer and copy callback data to it header_value = value; break; case VALUE: //Value continues. Reallocate value buffer //and append callback data to it header_value.append(value); break; case NOTHING: // this shouldn't happen DEBUG(10)("Internal error in http-parser"); break; } last_on_header = VALUE; return 0; } /** * called when last header is read. * Determine the filename based on request_no and extension. * Also see if decompressing is happening... */ int scan_http_cbo::on_headers_complete() { tcpdemux *demux = tcpdemux::getInstance(); /* Add the most recently read header to the map, if any */ if (last_on_header==VALUE) { headers[header_field] = header_value; header_field=""; } /* Set output path to <path>-HTTPBODY-nnn.ext for each part. * This is not consistent with tcpflow <= 1.3.0, which supported only one HTTPBODY, * but it's correct... */ std::stringstream os; os << path << "-HTTPBODY-" << std::setw(3) << std::setfill('0') << request_no << std::setw(0); /* See if we can guess a file extension */ std::string extension = get_extension_for_mime_type(headers["content-type"]); if (extension.size()) { os << "." << extension; } output_path = os.str(); /* Choose an output function based on the content encoding */ std::string content_encoding(headers["content-encoding"]); if ((content_encoding == "gzip" || content_encoding == "deflate") && (demux->opt.gzip_decompress)){ #ifdef HAVE_LIBZ DEBUG(10) ( "%s: detected zlib content, decompressing", output_path.c_str()); unzip = true; #else /* We can't decompress, so just give it a .gz */ output_path.append(".gz"); DEBUG(5) ( "%s: refusing to decompress since zlib is unavailable", output_path.c_str() ); #endif } /* Open the output path */ fd = demux->retrying_open(output_path.c_str(), O_WRONLY|O_CREAT|O_BINARY|O_TRUNC, 0644); if (fd < 0) { DEBUG(1) ("unable to open HTTP body file %s", output_path.c_str()); } if(http_alert_fd>=0){ std::stringstream ss; ss << "open\t" << output_path << "\n"; const std::string &sso = ss.str(); if(write(http_alert_fd,sso.c_str(),sso.size())!=(int)sso.size()){ perror("write"); } } first_body = true; // next call to on_body will be the first one /* We can do something smart with the headers here. * * For example, we could: * - Record all headers into the report.xml * - Pick the intended filename if we see Content-Disposition: attachment; name="..." * - Record headers into filesystem extended attributes on the body file */ return 0; } /* Write to fd, optionally decompressing as we go */ int scan_http_cbo::on_body(const char *at,size_t length) { if (fd < 0) return -1; // no open fd? (internal error)x if (length==0) return 0; // nothing to write if(first_body){ // stuff for first time on_body is called xml_fo << " <byte_run file_offset='" << (at-base) << "'><fileobject><filename>" << output_path << "</filename>"; first_body = false; } /* If not decompressing, just write the data and return. */ if(unzip==false){ int rv = write(fd,at,length); if(rv<0) return -1; // write error; that's bad bytes_written += rv; return 0; } #ifndef HAVE_LIBZ assert(0); // shoudln't have gotten here #endif if(zfail) return 0; // stream was corrupt; ignore rest /* set up this round of decompression, using a small local buffer */ /* Call init if we are not initialized */ char decompressed[65536]; // where decompressed data goes if (!zinit) { memset(&zs,0,sizeof(zs)); zs.next_in = (Bytef*)at; zs.avail_in = length; zs.next_out = (Bytef*)decompressed; zs.avail_out = sizeof(decompressed); int rv = inflateInit2(&zs, 32 + MAX_WBITS); /* 32 auto-detects gzip or deflate */ if (rv != Z_OK) { /* fail! */ DEBUG(3) ("decompression failed at stream initialization; rv=%d bad Content-Encoding?",rv); zfail = true; return 0; } zinit = true; // successfully initted } else { zs.next_in = (Bytef*)at; zs.avail_in = length; zs.next_out = (Bytef*)decompressed; zs.avail_out = sizeof(decompressed); } /* iteratively decompress, writing each time */ while (zs.avail_in > 0) { /* decompress as much as possible */ int rv = inflate(&zs, Z_SYNC_FLUSH); if (rv == Z_STREAM_END) { /* are we done with the stream? */ if (zs.avail_in > 0) { /* ...no. */ DEBUG(3) ("decompression completed, but with trailing garbage"); return 0; } } else if (rv != Z_OK) { /* some other error */ DEBUG(3) ("decompression failed (corrupted stream?)"); zfail = true; // ignore the rest of this stream return 0; } /* successful decompression, at least partly */ /* write the result */ int bytes_decompressed = sizeof(decompressed) - zs.avail_out; ssize_t written = write(fd, decompressed, bytes_decompressed); if (written < bytes_decompressed) { DEBUG(3) ("writing decompressed data failed"); zfail= true; return 0; } bytes_written += written; /* reset the buffer for the next iteration */ zs.next_out = (Bytef*)decompressed; zs.avail_out = sizeof(decompressed); } return 0; } /** * called at the conclusion of each HTTP body. * Clean out all of the state for this HTTP header/body pair. */ int scan_http_cbo::on_message_complete() { /* Close the file */ headers.clear(); header_field = ""; header_value = ""; last_on_header = NOTHING; if(fd >= 0) { if (::close(fd) != 0) { perror("close() of http body"); } fd = -1; } /* Erase zero-length files and update the DFXML */ if(bytes_written>0){ /* Update DFXML */ if(xmlstream){ xml_fo << "<filesize>" << bytes_written << "</filesize></fileobject></byte_run>\n"; if(xmlstream) *xmlstream << xml_fo.str(); } if(http_alert_fd>=0){ std::stringstream ss; ss << "close\t" << output_path << "\n"; const std::string &sso = ss.str(); if(write(http_alert_fd,sso.c_str(),sso.size()) != (int)sso.size()){ perror("write"); } } if(http_cmd.size()>0 && output_path.size()>0){ /* If we are at maximum number of subprocesses, wait for one to exit */ std::string cmd = http_cmd + " " + output_path; #ifdef HAVE_FORK int status=0; pid_t pid = 0; while(http_subproc >= http_subproc_max){ pid = wait(&status); http_subproc--; } /* Fork off a child */ pid = fork(); if(pid<0) die("Cannot fork child"); if(pid==0){ /* We are the child */ exit(system(cmd.c_str())); } http_subproc++; #else system(cmd.c_str()); #endif } } else { /* Nothing written; erase the file */ if(output_path.size() > 0){ ::unlink(output_path.c_str()); } } /* Erase the state variables for this part */ xml_fo.str(""); output_path = ""; bytes_written=0; unzip = false; if(zinit){ inflateEnd(&zs); zinit = false; } zfail = false; return 0; } /*** * the HTTP scanner plugin itself */ extern "C" void scan_http(const class scanner_params &sp,const recursion_control_block &rcb) { if(sp.sp_version!=scanner_params::CURRENT_SP_VERSION){ std::cerr << "scan_http requires sp version " << scanner_params::CURRENT_SP_VERSION << "; " << "got version " << sp.sp_version << "\n"; exit(1); } if(sp.phase==scanner_params::PHASE_STARTUP){ sp.info->name = "http"; sp.info->flags = scanner_info::SCANNER_DISABLED; // default disabled sp.info->get_config(HTTP_CMD,&http_cmd,"Command to execute on each HTTP attachment"); sp.info->get_config(HTTP_ALERT_FD,&http_alert_fd,"File descriptor to send information about completed HTTP attachments"); return; /* No feature files created */ } if(sp.phase==scanner_params::PHASE_SCAN){ /* See if there is an HTTP response */ if(sp.sbuf.bufsize>=MIN_HTTP_BUFSIZE && sp.sbuf.memcmp(reinterpret_cast<const uint8_t *>("HTTP/1."),0,7)==0){ /* Smells enough like HTTP to try parsing */ /* Set up callbacks */ http_parser_settings scan_http_parser_settings; memset(&scan_http_parser_settings,0,sizeof(scan_http_parser_settings)); // in the event that new callbacks get created scan_http_parser_settings.on_message_begin = scan_http_cbo::scan_http_cb_on_message_begin; scan_http_parser_settings.on_url = scan_http_cbo::scan_http_cb_on_url; scan_http_parser_settings.on_header_field = scan_http_cbo::scan_http_cb_on_header_field; scan_http_parser_settings.on_header_value = scan_http_cbo::scan_http_cb_on_header_value; scan_http_parser_settings.on_headers_complete = scan_http_cbo::scan_http_cb_on_headers_complete; scan_http_parser_settings.on_body = scan_http_cbo::scan_http_cb_on_body; scan_http_parser_settings.on_message_complete = scan_http_cbo::scan_http_cb_on_message_complete; if(sp.sxml) (*sp.sxml) << "\n <byte_runs>\n"; for(size_t offset=0;;){ /* Set up a parser instance for the next chunk of HTTP responses and data. * This might be repeated several times due to connection re-use and multiple requests. * Note that the parser is not a C++ library but it can pass a "data" to the * callback. We put the address for the scan_http_cbo object in the data and * recover it with a cast in each of the callbacks. */ /* Make an sbuf for the remaining data. * Note that this may not be necessary, because in our test runs the parser * processed all of the data the first time through... */ sbuf_t sub_buf(sp.sbuf, offset); const char *base = reinterpret_cast<const char*>(sub_buf.buf); http_parser parser; http_parser_init(&parser, HTTP_RESPONSE); scan_http_cbo cbo(sp.sbuf.pos0.path,base,sp.sxml); parser.data = &cbo; /* Parse */ size_t parsed = http_parser_execute(&parser, &scan_http_parser_settings, base, sub_buf.size()); assert(parsed <= sub_buf.size()); /* Indicate EOF (flushing callbacks) and terminate if we parsed the entire buffer. */ if (parsed == sub_buf.size()) { http_parser_execute(&parser, &scan_http_parser_settings, NULL, 0); break; } /* Stop parsing if we parsed nothing, as that indicates something header! */ if (parsed == 0) { break; } /* Stop parsing if we're a connection upgrade (e.g. WebSockets) */ if (parser.upgrade) { DEBUG(9) ("upgrade connection detected (WebSockets?); cowardly refusing to dump further"); break; } /* Bump the offset for next iteration */ offset += parsed; } if(sp.sxml) (*sp.sxml) << " </byte_runs>"; } } }</span>
其中使用 struct http_parser_settings 設定回撥,使用http_parser 來解析。
開源libtnet-master中的使用情況
#include "httpparser.h" #include "httputil.h" #include "log.h" using namespace std; namespace tnet { struct http_parser_settings ms_settings; class HttpParserSettings { public: HttpParserSettings(); static int onMessageBegin(struct http_parser*); static int onUrl(struct http_parser*, const char*, size_t); static int onStatusComplete(struct http_parser*); static int onHeaderField(struct http_parser*, const char*, size_t); static int onHeaderValue(struct http_parser*, const char*, size_t); static int onHeadersComplete(struct http_parser*); static int onBody(struct http_parser*, const char*, size_t); static int onMessageComplete(struct http_parser*); }; HttpParserSettings::HttpParserSettings() { ms_settings.on_message_begin = &HttpParserSettings::onMessageBegin; ms_settings.on_url = &HttpParserSettings::onUrl; ms_settings.on_status_complete = &HttpParserSettings::onStatusComplete; ms_settings.on_header_field = &HttpParserSettings::onHeaderField; ms_settings.on_header_value = &HttpParserSettings::onHeaderValue; ms_settings.on_headers_complete = &HttpParserSettings::onHeadersComplete; ms_settings.on_body = &HttpParserSettings::onBody; ms_settings.on_message_complete = &HttpParserSettings::onMessageComplete; } static HttpParserSettings initObj; int HttpParserSettings::onMessageBegin(struct http_parser* parser) { HttpParser* p = (HttpParser*)parser->data; return p->onParser(HttpParser::Parser_MessageBegin, 0, 0); } int HttpParserSettings::onUrl(struct http_parser* parser, const char* at, size_t length) { HttpParser* p = (HttpParser*)parser->data; return p->onParser(HttpParser::Parser_Url, at, length); } int HttpParserSettings::onStatusComplete(struct http_parser* parser) { HttpParser* p = (HttpParser*)parser->data; return p->onParser(HttpParser::Parser_StatusComplete, 0, 0); } int HttpParserSettings::onHeaderField(struct http_parser* parser, const char* at, size_t length) { HttpParser* p = (HttpParser*)parser->data; return p->onParser(HttpParser::Parser_HeaderField, at, length); } int HttpParserSettings::onHeaderValue(struct http_parser* parser, const char* at, size_t length) { HttpParser* p = (HttpParser*)parser->data; return p->onParser(HttpParser::Parser_HeaderValue, at, length); } int HttpParserSettings::onHeadersComplete(struct http_parser* parser) { HttpParser* p = (HttpParser*)parser->data; return p->onParser(HttpParser::Parser_HeadersComplete, 0, 0); } int HttpParserSettings::onBody(struct http_parser* parser, const char* at, size_t length) { HttpParser* p = (HttpParser*)parser->data; return p->onParser(HttpParser::Parser_Body, at, length); } int HttpParserSettings::onMessageComplete(struct http_parser* parser) { HttpParser* p = (HttpParser*)parser->data; return p->onParser(HttpParser::Parser_MessageComplete, 0, 0); } HttpParser::HttpParser(enum http_parser_type type) { http_parser_init(&m_parser, type); m_parser.data = this; m_lastWasValue = true; } HttpParser::~HttpParser() { } int HttpParser::onParser(Event event, const char* at, size_t length) { switch(event) { case Parser_MessageBegin: return handleMessageBegin(); case Parser_Url: return onUrl(at, length); case Parser_StatusComplete: return 0; case Parser_HeaderField: return handleHeaderField(at, length); case Parser_HeaderValue: return handleHeaderValue(at, length); case Parser_HeadersComplete: return handleHeadersComplete(); case Parser_Body: return onBody(at, length); case Parser_MessageComplete: return onMessageComplete(); default: break; } return 0; } int HttpParser::handleMessageBegin() { m_curField.clear(); m_curValue.clear(); m_lastWasValue = true; m_errorCode = 0; return onMessageBegin(); } int HttpParser::handleHeaderField(const char* at, size_t length) { if(m_lastWasValue) { if(!m_curField.empty()) { onHeader(HttpUtil::normalizeHeader(m_curField), m_curValue); } m_curField.clear(); m_curValue.clear(); } m_curField.append(at, length); m_lastWasValue = 0; return 0; } int HttpParser::handleHeaderValue(const char* at, size_t length) { m_curValue.append(at, length); m_lastWasValue = 1; return 0; } int HttpParser::handleHeadersComplete() { if(!m_curField.empty()) { string field = HttpUtil::normalizeHeader(m_curField); onHeader(field, m_curValue); } return onHeadersComplete(); } int HttpParser::execute(const char* buffer, size_t count) { int n = http_parser_execute(&m_parser, &ms_settings, buffer, count); if(m_parser.upgrade) { onUpgrade(buffer + n, count - n); return 0; } else if(n != count) { int code = (m_errorCode != 0 ? m_errorCode : 400); HttpError error(code, http_errno_description((http_errno)m_parser.http_errno)); LOG_ERROR("parser error %s", error.message.c_str()); onError(error); return code; } return 0; } }
中文說明
概括
http-parser是一個用C程式碼編寫的HTTP訊息解析器。可以解析HTTP請求或者回應訊息。這個解析器常常在高效能的HTTP應用中使用。在解析的過程中,它不會呼叫任何系統呼叫,不會在HEAP上申請記憶體,不會快取資料,並且可以在任意時刻打斷解析過程,而不會產生任何影響。對於每個HTTP訊息(在WEB伺服器中就是每個請求),它只需要40位元組的記憶體佔用(解析器本身的基本資料結構),不過最終的要看你實際的程式碼架構。
特性:
無第三方依賴可以處理持久訊息(keep-alive)支援解碼chunk編碼的訊息支援Upgrade協議升級(如無例外就是WebSocket)可以防禦緩衝區溢位攻擊解析器可以處理以下型別的HTTP訊息:
簡單使用:
每個HTTP請求使用一個http_parser
物件。使用http_parser_init
來初始化結構體,並且設定解析時的回撥。下面的程式碼可能看起來像是解析HTTP請求:
// 設定回撥
http_parser_settings settings;
settings.on_url = my_url_callback;
settings.on_header_field = my_header_field_callback;
/* ... */
// 為結構體申請記憶體
http_parser *parser = malloc(sizeof(http_parser));
// 初始化解析器
http_parser_init(parser, HTTP_REQUEST);
// 設定儲存呼叫者的資料,用於在callback內使用
parser->data = my_socket;
當接收到資料後,解析器開始執行,並檢查錯誤:
size_t len = 80*1024; // 需要接受的資料大小80K
size_t nparsed; // 已經解析完成的資料大小
char buf[len]; // 接收快取
ssize_t recved; // 實際接收到的資料大小
// 接受資料
recved = recv(fd, buf, len, 0);
// 如果接收到的位元組數小於0,說明從socket讀取出錯
if (recved < 0) {
/* Handle error. */
}
/* Start up / continue the parser.
* Note we pass recved==0 to signal that EOF has been recieved.
*/
// 開始解析
// @parser 解析器物件
// @&settings 解析時的回撥函式
// @buf 要解析的資料
// @receved 要解析的資料大小
nparsed = http_parser_execute(parser, &settings, buf, recved);
// 如果解析到websocket請求
if (parser->upgrade) {
/* handle new protocol */
// 如果解析出錯,即解析完成的資料大小不等於傳遞給http_parser_execute的大小
} else if (nparsed != recved) {
/* Handle error. Usually just close the connection. */
}
HTTP需要知道資料流在那裡結束。
舉個例子,一些伺服器傳送響應資料的時候,HTTP頭部不帶有Content-Length
欄位,希望客戶端持續從socket中讀取資料,知道遇到EOF
為止。在呼叫http_parser_execute
時,傳遞最後一個引數為0,用來通知http_parser,解析已經結束。在http_parser遇到EOF並處理的過程中,仍然可能會遇到錯誤,所以應該在callback中處理這些錯誤。
注意:
上面的意思是說,如果需要多次呼叫http_parser_execute
的時候,就是因為無法一次完成對HTTP伺服器/客戶端資料的接收。所以需要在每次接收到一些資料之後,呼叫一次http_parser_execute
,當從socket接收到EOF
時,應該結束解析,同時通知http_parser解析結束。
一些可擴充套件的資訊欄位,例如status_code
、method
和HTTP版本號
,它們都儲存在解析器的資料結構中。這些資料被臨時的儲存在http_parser
中,並且會在每個連線到來後被重置(當多個連線的HTTP資料使用同一個解析器時);如果需要保留這些資料,必須要在on_headers_complete
返回之前儲存它門。
注意:
應該為每個HTTP連線的資料,單獨初始化一個解析器的時候,不會存在上述問題.
解析器會解析HTTP請求和相應中的transfer-encoding
欄位。就是說,chunked
編碼會在呼叫on_body
之前被解析。
關於Upgrade協議的問題
HTTP支援將連線升級為不同的協議. 例如目前日益普遍的WebSocket協議的請求資料:
GET /demo HTTP/1.1
Upgrade: WebSocket
Connection: Upgrade
Host: example.com
Origin: http://example.com
WebSocket-Protocol: sample
在WebSocket請求頭部傳輸完畢後,就下來傳輸的資料是非HTTP協議的資料了。
關於WebSocket協議的詳細內容見: http://tools.ietf.org/html/draft-hixie-thewebsocketprotocol-75
要支援這種類似與WebSocket的協議,解析器會把它當作一個不帶HTTP主體資料的包(只含有頭部).然後呼叫on_headers_complete
和on_message_complete
回撥。所以不論怎樣,當檢測到HTTP頭部的資料結束時,http_parser_execute
會停止解析,並且返回。
建議使用者在http_parser_execute
函式返回後,檢查parset->upgrade
欄位,是否被設定為1
.在http_parset_execute
的返回值中,非HTTP型別的資料(除去HTTP頭部的資料)的範圍,會被設定為從一個offset
引數處開始。
回撥函式
當呼叫http_parser_execute
時,在http_parset_settings
中設定的回撥會執行。解析器維護了自身狀態資料,並且這些資料不會被儲存,所以沒有必要將這些狀態資料快取。如果你真需要儲存這些狀態資料,可以在回撥中儲存。
有兩種型別的回撥:
通知 typedef int (*http_cb) (http_parser *);
包括:on_message_begin
,on_headers_complete
, on_message_complete
資料 typedef int (*http_data_cb) (http_parser *, const char at, size_t length);
包括;(只限與請求)on_uri
, (通用) on_header_field
, on_header_value
,on_body
使用者的回撥函式應該返回0
表示成功。返回非0
的值,會告訴解析器發生了錯誤,解析器會立刻退出。
如果你解析chunks編碼的HTTP訊息(例如:從socket中讀read()HTTP請求行,解析,然後再次讀到一半的頭部訊息後,再次解析,等等),你的資料型別的回撥就會被呼叫不止一次。HTTP解析器保證,引數中傳遞的資料指標,只在回撥函式內有效(即回撥呼叫結束,資料指標無效).因為http-parser返回解析結果的方式為:在需要解析的資料中,依靠指標和資料長度來供使用者程式碼讀取 如果可以的話,你也可以將read()到的資料,儲存到在HEAP上申請的記憶體中,以避免非必要的資料拷貝。
比較笨的方法是:每讀取一次將讀取到的資料傳遞給http_parset_execute函式
.
注意:對於將一個完整的HTTP報文分開多次解析,應該使用同一個parser物件!
但是實際上的情況更復雜:
首先根據HTTP協議頭部的規則,應該持續從socket讀取資料,直到讀到了\r\n\r\n
,表示頭部報文結束。這時可以傳遞給http_parser解析,或者根據下面的規則,繼續讀取實體部分的資料。
如果報文中使用Content-Length
指定傳輸實體的大小,接下來不論HTTP客戶/伺服器都因該根據讀取到Content-Length
指定的實體大小
對於分塊傳輸的實體,傳輸編碼為chunked
。即Transfer-Encoding: chunked
。分快傳輸的編碼,一般只適用於HTTP內容響應(HTTP請求也可以指定傳輸編碼為chunked,但不是所有HTTP伺服器都支援)。這時可以讀取定量的資料(如4096位元組) ,交給parser解析。然後重複此過程,直到chunk編碼結束。
是不是很簡單,那就用到你專案中吧!
參考:
https://github.com/joyent/http-parser
https://github.com/simsong/tcpflow
https://github.com/siddontang/libtnet
http://rootk.com/post/tutorial-for-http-parser.html