1. 程式人生 > >utf-7轉碼

utf-7轉碼

base64 tom sta ins += off chm example memset

List:       imap
I tried the code you referenced (the exact program and compilation script
are in attachments), but it failed. The program takes input as modified
UTF-7, uses MailboxToURL routine to change it to UTF-8 and then uses the
URLtoMailbox routine to change it to UTF-7 again:

int main(int argc, char* argv[]){
  char out[OUTSIZE];
  char in[OUTSIZE]; 

  strcpy(in,argv[1]);
  printf("in:   %s\n",in);

  MailboxToURL(out,in);
  printf("out:  %s\n",out);

  URLtoMailbox(in,out);
  printf("in:   %s\n",in);  
}

As an input I gave it the following UTF-7 code:
a&AQUBBA-e&AFC-f
which is the code produced by Microsoft Outlook and contains bunch of Polish
letters.

The output of the program is the following:
[[email protected]
/* */]$ ./utf7test ‘a&AQUBBA-e&AFC-f‘ in: a&AQUBBA-e&AFC-f out: a%C4%85%C4%84e%50f in: a&AQUBBA-ePf So, as you can see, the conversion is not 1:1 ;-)))) Strange enough, if I use the resulting output (a&AQUBBA-ePf) as an input to another iteration, it starts behaving correctly ;-))) Can you help me? Marek. ps. I tried the code on linux. There are couple of strange assignments in the code (like unsigned long variable = char variable), so I mention it just in case this might be of some importance. > -----Original Message----- > From: Chris Newman [mailto:[email protected]
/* */] > Sent: Tuesday, July 24, 2001 8:43 PM > To: Marek Kowal; [email protected] > Subject: Re: modified UTF7 to UTF8 conversion > > > Try the code in: > <http://www.innosoft.com/rfc/rfc2192.html> > > It‘s missing a security check for invalid UTF-8 chararacters > on input, but > is otherwise correct to my knowledge. If it‘s broken, please > email me the > example which breaks it so I can fix the code. > > - Chris > > --On Monday, July 23, 2001 19:52 +0200 Marek Kowal > <[email protected]
/* */> wrote: > > > Hi there, > > > > I am having HARD time trying to convert modified UTF7 > mailbox names to > > UTF8 (which I then convert to ISO-8859-2 using iconv > library, BTW). I > > tried the UTF7 to URL UTF8 code (which I found in imap > discussion list, > > > http://www.washington.edu/imap/listarch/1997/msg00800.html), > but it does > > not seem to work correctly - if I run it one-way and then > back on some > > string, sometimes I get different results - the resulting > UTF7 code is > > not the same. > > > > Anyway, can anybody point me to the proper conversion > routines which can > > transform between modified UTF7 and UTF8? It could be > separate code, but > > if anybody did it already as iconv conversion table, that > would be great. > ["compile" (application/octet-stream)] ["utf7test.c" (application/octet-stream)] #include <stdio.h> #include <string.h> #include <iconv.h> #define OUTSIZE 1024 /* hexadecimal lookup table */ static char hex[] = "0123456789ABCDEF"; /* URL unsafe printable characters */ static char urlunsafe[] = " \"#%&+:;<=>?@[\\]^`{|}"; /* UTF7 modified base64 alphabet */ static char base64chars[] = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+,"; #define UNDEFINED 64 /* UTF16 definitions */ #define UTF16MASK 0x03FFUL #define UTF16SHIFT 10 #define UTF16BASE 0x10000UL #define UTF16HIGHSTART 0xD800UL #define UTF16HIGHEND 0xDBFFUL #define UTF16LOSTART 0xDC00UL #define UTF16LOEND 0xDFFFUL /* Convert an IMAP mailbox to a URL path * dst needs to have roughly 4 times the storage space of src * Hex encoding can triple the size of the input * UTF-7 can be slightly denser than UTF-8 * (worst case: 8 octets UTF-7 becomes 9 octets UTF-8) */ void MailboxToURL(char *dst, char *src) { unsigned char c, i, bitcount; unsigned long ucs4, utf16, bitbuf; unsigned char base64[256], utf8[6]; /* initialize modified base64 decoding table */ memset(base64, UNDEFINED, sizeof (base64)); for (i = 0; i < sizeof (base64chars); ++i) { base64[base64chars[i]] = i; } /* loop until end of string */ while (*src != ‘\0‘) { c = *src++; /* deal with literal characters and &- */ if (c != ‘&‘ || *src == ‘-‘) { if (c < ‘ ‘ || c > ‘~‘ || strchr(urlunsafe, c) != NULL) { /* hex encode if necessary */ dst[0] = ‘%‘; dst[1] = hex[c >> 4]; dst[2] = hex[c & 0x0f]; dst += 3; } else { /* encode literally */ *dst++ = c; } /* skip over the ‘-‘ if this is an &- sequence */ if (c == ‘&‘) ++src; } else { /* convert modified UTF-7 -> UTF-16 -> UCS-4 -> UTF-8 -> HEX */ bitbuf = 0; bitcount = 0; ucs4 = 0; while ((c = base64[(unsigned char) *src]) != UNDEFINED) { ++src; bitbuf = (bitbuf << 6) | c; bitcount += 6; /* enough bits for a UTF-16 character? */ if (bitcount >= 16) { bitcount -= 16; utf16 = (bitcount ? bitbuf >> bitcount : bitbuf) & 0xffff; /* convert UTF16 to UCS4 */ if (utf16 >= UTF16HIGHSTART && utf16 <= UTF16HIGHEND) { ucs4 = (utf16 - UTF16HIGHSTART) << UTF16SHIFT; continue; } else if (utf16 >= UTF16LOSTART && utf16 <= UTF16LOEND) { ucs4 += utf16 - UTF16LOSTART + UTF16BASE; } else { ucs4 = utf16; } /* convert UTF-16 range of UCS4 to UTF-8 */ if (ucs4 <= 0x7fUL) { utf8[0] = ucs4; i = 1; } else if (ucs4 <= 0x7ffUL) { utf8[0] = 0xc0 | (ucs4 >> 6); utf8[1] = 0x80 | (ucs4 & 0x3f); i = 2; } else if (ucs4 <= 0xffffUL) { utf8[0] = 0xe0 | (ucs4 >> 12); utf8[1] = 0x80 | ((ucs4 >> 6) & 0x3f); utf8[2] = 0x80 | (ucs4 & 0x3f); i = 3; } else { utf8[0] = 0xf0 | (ucs4 >> 18); utf8[1] = 0x80 | ((ucs4 >> 12) & 0x3f); utf8[2] = 0x80 | ((ucs4 >> 6) & 0x3f); utf8[3] = 0x80 | (ucs4 & 0x3f); i = 4; } /* convert utf8 to hex */ for (c = 0; c < i; ++c) { dst[0] = ‘%‘; dst[1] = hex[utf8[c] >> 4]; dst[2] = hex[utf8[c] & 0x0f]; dst += 3; } } } /* skip over trailing ‘-‘ in modified UTF-7 encoding */ if (*src == ‘-‘) ++src; } } /* terminate destination string */ *dst = ‘\0‘; } /* Convert hex coded UTF-8 URL path to modified UTF-7 IMAP mailbox * dst should be about twice the length of src to deal with non-hex * coded URLs */ void URLtoMailbox(char *dst, char *src) { unsigned int utf8pos, utf8total, i, c, utf7mode, bitstogo, utf16flag; unsigned long ucs4, bitbuf; unsigned char hextab[256]; /* initialize hex lookup table */ memset(hextab, 0, sizeof (hextab)); for (i = 0; i < sizeof (hex); ++i) { hextab[hex[i]] = i; if (isupper(hex[i])) hextab[tolower(hex[i])] = i; } utf7mode = 0; utf8total = 0; bitstogo = 0; while ((c = *src) != ‘\0‘) { ++src; /* undo hex-encoding */ if (c == ‘%‘ && src[0] != ‘\0‘ && src[1] != ‘\0‘) { c = (hextab[src[0]] << 4) | hextab[src[1]]; src += 2; } /* normal character? */ if (c >= ‘ ‘ && c <= ‘~‘) { /* switch out of UTF-7 mode */ if (utf7mode) { if (bitstogo) { *dst++ = base64chars[(bitbuf << (6 - bitstogo)) & 0x3F]; } *dst++ = ‘-‘; utf7mode = 0; } *dst++ = c; /* encode ‘&‘ as ‘&-‘ */ if (c == ‘&‘) { *dst++ = ‘-‘; } continue; } /* switch to UTF-7 mode */ if (!utf7mode) { *dst++ = ‘&‘; utf7mode = 1; } /* Encode US-ASCII characters as themselves */ if (c < 0x80) { ucs4 = c; utf8total = 1; } else if (utf8total) { /* save UTF8 bits into UCS4 */ ucs4 = (ucs4 << 6) | (c & 0x3FUL); if (++utf8pos < utf8total) { continue; } } else { utf8pos = 1; if (c < 0xE0) { utf8total = 2; ucs4 = c & 0x1F; } else if (c < 0xF0) { utf8total = 3; ucs4 = c & 0x0F; } else { /* NOTE: can‘t convert UTF8 sequences longer than 4 */ utf8total = 4; ucs4 = c & 0x03; } continue; } /* loop to split ucs4 into two utf16 chars if necessary */ utf8total = 0; do { if (ucs4 >= UTF16BASE) { ucs4 -= UTF16BASE; bitbuf = (bitbuf << 16) | ((ucs4 >> UTF16SHIFT) + UTF16HIGHSTART); ucs4 = (ucs4 & UTF16MASK) + UTF16LOSTART; utf16flag = 1; } else { bitbuf = (bitbuf << 16) | ucs4; utf16flag = 0; } bitstogo += 16; /* spew out base64 */ while (bitstogo >= 6) { bitstogo -= 6; *dst++ = base64chars[(bitstogo ? (bitbuf >> bitstogo) : bitbuf) & 0x3F]; } } while (utf16flag); } /* if in UTF-7 mode, finish in ASCII */ if (utf7mode) { if (bitstogo) { *dst++ = base64chars[(bitbuf << (6 - bitstogo)) & 0x3F]; } *dst++ = ‘-‘; } /* tie off string */ *dst = ‘\0‘; } int main(int argc, char* argv[]){ char out[OUTSIZE]; char in[OUTSIZE]; strcpy(in,argv[1]); printf("in: %s\n",in); MailboxToURL(out,in); printf("out: %s\n",out); URLtoMailbox(in,out); printf("in: %s\n",in); }

  

utf-7轉碼

相關推薦

utf-7

base64 tom sta ins += off chm example memset List: imap I tried the code you referenced (the exact program and compilation script

stm32f103zet6實現HTTP協議請求,UTF-8JSON打包上傳

概述:   最近在做一個專案,需要用stm32f103zet6開發板走HTTP協議,向疲勞駕駛檢測裝置傳送請求訊息,使其下發人臉識別的圖片或引數資訊,開發板進行接收,要求如下:   1. 開發板作為客戶端、疲勞駕駛檢測是被作為伺服器端    2. 標準HTTP協議   3. 請求方法使用PO

解決gb2312與utf-8問題

{     iconv_t cd;     char **pin = &inbuf;     char **pout = &outbuf;     cd = iconv_open(to_charset, from_charset);     if (cd == -1)     {    

使用gulp將檔案utf-8格式

在前兩篇文章中,我除了看《MySQL必知必會》之外,還參考了《SQL基礎教程(第2版)》。但是把參考這本書裡的SQL語句導進新建的資料庫時遇到了點麻煩:我在執行SQL檔案後發現表中中文字元全部亂碼。在檢查檔案格式之後發現,隨書給出的SQL檔案的編碼是GB2312……那就先得把這麼些檔案轉成utf-8格式才能適

使用gulp將文件utf-8格式

合同 sam tab fault 類型 教程 pack htm 目錄 在前兩篇文章中,我除了看《MySQL必知必會》之外,還參考了《SQL基礎教程(第2版)》。但是把參考這本書裏的SQL語句導進新建的數據庫時遇到了點麻煩:我在運行SQL文件後發現表中中文字符全部亂碼。在檢查

java 檔案(gb2315,gbk,utf-8)csv,excel

最近做資料處理,需要將爬取的資料入庫,但是演算法提供的資料編碼格式和資料庫總是有出入,導致匯入的資料亂碼,所以寫一個轉碼程式,將檔案編碼轉為和資料庫一致。 package com.bjk.transcode; import java.io.FileInputStream; import j

C++ 實現unicode到utf-8的

思路: 獲取字串裡面中的Unicode部分,然後將該部分轉換位utf-8格式的字元,最後將字串裡面的所有Unicode替換為utf-8即可。 廢話不多少,直接上程式碼: 標頭檔案: /* * charsetEncode.h * * Created on: Jul

java的字元;eclipse設定UTF-8

把字符集中的字元 編碼為指定集合中某一物件(例如:位元模式、自然數序列、8位組或者電脈衝),以便文字在計算機中儲存和通過通訊網路的傳遞。 不同國家、不同計算機系統編碼方式不同; 中國大陸:GBK (規定檔案為GB13000) Unicode

【Python開發】Url中文字元時記得edcode("utf-8")

在url中使用中文其實是一個壞習慣,會帶來一系列的轉碼問題, 我更喜歡英文譯名或者id來標識某個uri。但是現實往往是殘酷的, 特別是在我們呼叫別人服務時候,有時候被逼無奈使用中文URL。 Python中unicode轉碼一向是讓人頭疼的問題。數次碰壁之後,我也摸出了一

詳細講解 ascii 、byte 以及 UTF-8、base64 的規則

多年來閒麻煩,只記錄筆記,不曾編寫BLOG,本文為原創,如需轉載請標明出處廢話不說,直奔主題ascii  計算機只接受 “高”、“低”電壓,所以使用二進位制  1  和  0 分別代表高低電壓ascii  將 “字元”和“符號”轉為二進位制,在通過二進位制轉為電壓讓計算機識別

String字串,UTF-8

String str = "任意字串"; str = new String(str.getBytes("gbk"),"utf-8"); 備註說明: str.getBytes("UTF-8"); 的意思是

[python爬蟲]對html解析讀取編碼格式,統一utf-8

from urllib.request import urlopen import chardet response=urlopen(url,timeout=3) html_byte=response

vbs gb2312轉換為UTF-8編碼的函式

<% 1、'UTF轉GB---將UTF8編碼文字轉換為GB編碼文字 function UTF2GB(UTFStr) for Dig=1 to len(UTFStr)   '如果UTF8編碼文字以%開頭則進行轉換   if mid(UTFStr,Dig,1)="%"

ISO 8859(GBK) 編碼檔案UTF-8編碼檔案

問題產生:          現需要把一個在windows的Myeclipse下開發的小程式,轉到Linux系統下去安裝執行,結果程式中的.java檔案中的中文到Linux下vim顯示全是亂碼。        現做以下分析:其主要原因是windows中Myeclipse中預

utf-8還是gbk

 說今天寫這一篇,因為遇到個問題,在centos上匯出檔案一直亂碼。 原來就沒出現這問題,適了好多種方法。 最後把匯出的編碼設成gbk竟然好了。現在還不知其原因,如有高手還請賜教。 java程式碼編碼utf-8, jsp編碼utf-8, centos中i18n:utf-8,

GBK(GB2312)與UTF-8檔案

最近使用的Intelij IDEA開發工具,轉碼有點小問題。百度了一下,Eclipse可以自動轉碼,而IDEA卻不可以。總是需要手動去轉若要把原始檔由GBK轉成UTF-8的,得靠其他方式了。網上搜羅了一下方法,然後自己整理了一下。現把程式碼貼出來,測試OK、可以直接使用!

[]UTF-8到GBK的特殊字元問題

Unicode字符集現在有超過10萬個字元,其BMP部分也有六萬多個字元;而GBK字符集只有兩萬以前多個字元。這樣的話,從支援unicode字符集或者unicode字符集BMP的編碼方式,轉化到GBK編碼的時候,就會有編碼落到GBK字符集以外,不能轉化成GBK編碼。在java中,轉換之後的字串,這部分字元都

IOS 獲取外網IP 及 GBK (UTF-8)

- (void)viewDidLoad {     [superviewDidLoad]; NSURL *url = [NSURLURLWithString:@"http://fw.qq.com/ipaddress"]; NSMutableURLRequest *requ

MySQL5.7安裝問題匯總

源碼 mysql boot 編譯安裝mysql5.7版本,想試用一下新的版本特性,發現跟之前的5.6版本編譯有了一些變化,總結一下避免以後繼續入坑。5.6安裝方式cmake版本5.7編譯cmake要求版本最低為2.8,當前為2.6,所以需要升級cmake版本。信息如下shell>

ffmpegflv到avi《

音頻 編碼器 nbsp 結合 獲得 獲取 src 流程 img 一個視頻轉碼器,則需要對視頻進行解碼,然後再對視頻進行編碼,因而相當於解碼器和編碼器的結合。 下面圖列舉了一個視頻的轉碼流程。 輸入的視頻封裝格式是flv 視頻編碼標準是H.264 音頻編碼標準是AAC;