linux&windows utf8和gbk編碼識別&互相轉換

阿新 • • 發佈：2019-01-14

linux預設是utf8編碼，Windows預設是gbk編碼，linux系統下可用locale命令檢視系統編碼。

linux下使用iconv命令轉換檔案編碼

iconv -f 源編碼 -t 目標編碼 1.txt > 2.txt

如 gbk轉utf8

iconv -f gbk -t utf8 1.txt > 2.txt

2.txt是轉換完的檔案，如覆蓋原始檔，去掉>2,txt即可

utf8和gbk編碼判定

utf8編碼

bool is_str_utf8(const char* str)
{
    unsigned int nBytes = 0 
;//UFT8可用1-6個位元組編碼,ASCII用一個位元組  
    unsigned char chr = *str;
    bool bAllAscii = true;

    for (unsigned int i = 0; str[i] != '\0'; ++i){
        chr = *(str + i);
        //判斷是否ASCII編碼,如果不是,說明有可能是UTF8,ASCII用7位編碼,最高位標記為0,0xxxxxxx 
        if (nBytes == 0 && (chr & 0x80) != 0){
            bAllAscii = false 
;
        }

        if (nBytes == 0) {
            //如果不是ASCII碼,應該是多位元組符,計算位元組數  
            if (chr >= 0x80) {

                if (chr >= 0xFC && chr <= 0xFD){
                    nBytes = 6;
                }
                else if (chr >= 0xF8){
                    nBytes = 5;
                }
                else 
 if (chr >= 0xF0){
                    nBytes = 4;
                }
                else if (chr >= 0xE0){
                    nBytes = 3;
                }
                else if (chr >= 0xC0){
                    nBytes = 2;
                }
                else{
                    return false;
                }

                nBytes--;
            }
        }
        else{
            //多位元組符的非首位元組,應為 10xxxxxx 
            if ((chr & 0xC0) != 0x80){
                return false;
            }
            //減到為零為止
            nBytes--;
        }
    }

    //違返UTF8編碼規則 
    if (nBytes != 0)  {
        return false;
    }

    if (bAllAscii){ //如果全部都是ASCII, 也是UTF8
        return true;
    }

    return true;
}

gbk編碼


bool is_str_gbk(const char* str)
{
    unsigned int nBytes = 0;//GBK可用1-2個位元組編碼,中文兩個 ,英文一個 
    unsigned char chr = *str;
    bool bAllAscii = true; //如果全部都是ASCII,  

    for (unsigned int i = 0; str[i] != '\0'; ++i){
        chr = *(str + i);
        if ((chr & 0x80) != 0 && nBytes == 0){// 判斷是否ASCII編碼,如果不是,說明有可能是GBK
            bAllAscii = false;
        }

        if (nBytes == 0) {
            if (chr >= 0x80) {
                if (chr >= 0x81 && chr <= 0xFE){
                    nBytes = +2;
                }
                else{
                    return false;
                }

                nBytes--;
            }
        }
        else{
            if (chr < 0x40 || chr>0xFE){
                return false;
            }
            nBytes--;
        }//else end
    }

    if (nBytes != 0)  {     //違反規則 
        return false;
    }

    if (bAllAscii){ //如果全部都是ASCII, 也是GBK
        return true;
    }

    return true;
}

使用boost庫進行編碼轉換

boost大法好

#include <boost/locale.hpp>
boost/locale/encoding.hpp中有這樣的函式
 template<typename CharType>
std::string from_utf(CharType const *text,std::string const &charset,method_type how=default_method)
{
    CharType const *text_end = text;
    while(*text_end) 
        text_end++;
    return from_utf(text,text_end,charset,how);
}
template<typename CharType>
std::basic_string<CharType> to_utf(char const *text,std::string const &charset,method_type how=default_method)
{
    char const *text_end = text;
    while(*text_end) 
        text_end++;
    return to_utf<CharType>(text,text_end,charset,how);
}
返回值是個string
utf轉gbk可以
std::string gbkstr =  boost::locale::conv::from_utf<char>(strUtf8.c_str(), "gb2312");//strUtf8是源utf8字串
同理gbk轉utf8可以
std::string utf8str = boost::locale::conv::to_utf<char>(strANSI.c_str(), "gb2312");//strANSI是源gbk字串

linux utf8和gbk編碼互轉

utf8轉gbk
#include <iconv.h>
char * UTF8toANSI(const std::string &from)
{
    char *inbuf=const_cast<char*>(from.c_str());
    size_t inlen = strlen(inbuf);
    size_t outlen = inlen *4;
    char *outbuf = (char *)malloc(inlen * 4 );
    bzero( outbuf, inlen * 4);
    char *in = inbuf;
    char *out = outbuf;
    iconv_t cd=iconv_open("GBK","UTF-8");
    iconv(cd,&in,&inlen,&out,&outlen);
    iconv_close(cd);
    return outbuf;
}

Windows utf8和gbk編碼互轉

linux&windows utf8和gbk編碼識別&互相轉換

linux預設是utf8編碼，Windows預設是gbk編碼，linux系統下可用locale命令檢視系統編碼。 linux下使用iconv命令轉換檔案編碼 iconv -f 源編碼 -t 目標編碼 1.txt > 2.txt 如 gbk轉u

C++實現utf8和gbk編碼字串互相轉換

不同系統或者伺服器之間訊息傳遞經常遇到編碼轉換問題，這裡用C++實現了一個輕量的gbk和utf8互相轉換，可跨平臺使用。（重量級的可以用libiconv庫）在windows下用<windows.h>標頭檔案裡的函式進行多位元組和寬字元轉換，linu

中文在UTF8和GBK編碼中的範圍

文章來源：http://www.reai.us/chinese-in-utf8-and-gbk 編碼範圍 1. GBK (GB2312/GB18030) x00-xff GBK雙位元組編碼範圍 x20-x7f ASCII xa1-xff 中文 x80-xff 中文 2.

python_day25__02_編碼問題_什麼時候用utf8和gbk

py3中只有 str和bytes兩種資料型別str: unicode編碼（萬國碼，全世界都能看懂的一種編碼方式）s = 'hello袁浩' #內部寸的是一個個unicode編碼bytes：十六進位制#utf8編碼由str轉到bytes叫做編碼b1 = bytes(s,'utf8') #把字串格式的str轉換為

php utf8編碼和gbk編碼相互轉換

1.utf8轉換為gbk header("Content-type:text/html;charset=UTF-8"); echo $str= 'utf8轉gbk!'; echo '<br />'; echo iconv("UTF-8

Windows和Linux系統文字檔案換行符互相轉換

不需要使用其它程式，只用sed命令就OK。 Windows轉換到Linux為： #windows2linux.sh sed -i 's/.$//' $1 Linux轉換到Windows為： #linux2windows.sh sed -i 's/$/\r/' $1

圖片和base64編碼字符串互相轉換，圖片和byte數組互相轉換

16進制 cnblogs exc 十六進制 tostring ati color int inpu 圖片和base64編碼字符串互相轉換 import sun.misc.BASE64Decoder; import sun.misc.BASE64Encoder; imp

UTF-8和GBK編碼之間的區別(頁面編碼、數據庫編碼區別)以及在實際項目中的應用

同方截斷擴展字節文章 ech shu 基礎上頁面第一節：UTF-8和GBK編碼概述 UTF-8 (8-bit Unicode Transformation Format) 是一種針對Unicode的可變長度字符編碼，又稱萬國碼，它包含全世界所有國家需要用到的字符

UTF-8和GBK編碼的區別

需要 div 字符英文世界 body utf8 nbsp 使用 UTF-8：對英文使用8位（一個字節）、中文使用24位（三個字節）編碼。對於英文字符比較多的網站一般用utf-8來編碼以節省空間；包含全世界所有國家需要用到的字符，其編碼的蚊子可以在各國各種支持utf8字符

JAVA 漢字在UTF-8和GBK編碼中佔用位數

做JAVA開發好久了，發現好多基礎的東西竟然還是不知道，平時也沒有寫筆記的習慣，就用CSDN來做簡單的筆記記錄吧，以供以後來查詢筆記。 JAVA的字元編碼中有兩種常用的字符集：GBK和U

[Linux] 建立 Win10 和 Centos7 雙系統互相引導

安裝 CentOS 7 作業系統 CentOS 7 不能選擇引導裝載程式安裝在MBR還是自己的分割槽內，預設會覆蓋MBR。 MBR上存放boot loader，centos7使用grub2，其grub2的配置檔案在centos系統的 /boot/grub2/grub.c

修改apache設定，支援UTF8和GBK

原本將apache預設設定成強制GBK編碼解釋網站，使得後來安裝UTF8的網站程式碼時出現亂碼的情況！解決方法，修改/etc/httpd/conf/httpd.conf 檔案，將其中AddDefaultCharset行註釋掉（前面加#）。儲存後重新啟動apache：/usr/sbin/apachectl

UTF8和GBK之間相互轉換（python指令碼）

import codecs def ReadFile(filePath,encoding=""): with codecs.open(filePath,"r",encoding) as f: return f.read() def WriteFil

python指令碼實現windows下檔案gbk編碼與utf-8相互轉換

程式碼如下： import codecs import sys def ReadFile(filePath, encoding="utf-8"): with codecs.open(filePa

CentOS下UTF8和GBK的互轉

忘記轉自哪裡了，自己做了下測試感覺好用，記一下。我在windows下新建Source.cpp #include <iconv.h> #include <stdlib.h> #include <stdio.h> #includ

QT utf8編碼轉gb2312編碼，互相轉換的原始碼

任何編碼先轉換成統一編碼toUnicode()，然後再轉出fromUnicode()； gb2312轉utf8： QString ssss=QString::fromLocal8Bit("ceshi我是測試"); //注意此處定義gb2312編碼的方式，要使用f

封裝tinyxml實現對UTF8和UNICODE編碼格式轉換

tinyxml對儲存xml非常方便，但儲存的格式卻不是UTF8，導致實際使用中遇到讀取問題。為了方便使用，藉助對C++ 儲存檔案為UTF8編碼格式學習，實現對tinyxml的資料轉換做了一些封裝，使使用更加方便些。重點實現了UTF8_to_string和s

c中實現utf8和gbk的互轉

#include <iconv.h> #include <stdlib.h> #include <stdio.h> #include <unistd.h> #include <fcntl.h> #include &

多字節和寬字節互相轉換

char nic code style class 轉換 ons unicode str 1 char* wchar2char(const wchar_t* _wsrc, char* _dest, size_t _destLen) 2 { 3 int iT

java 圖片文件Base64編碼與二進制編碼格式互相轉換

jre public log 編碼 lose img csdn rac body 1 public static byte[] base64String2ByteFun(String base64Str){ 2 BASE64Decoder

linux&windows utf8和gbk編碼識別&互相轉換

linux下使用iconv命令轉換檔案編碼

utf8和gbk編碼判定

utf8編碼

gbk編碼

使用boost庫進行編碼轉換

linux utf8和gbk編碼互轉

Windows utf8和gbk編碼互轉

相關推薦