LED點陣顯示,有關特殊國別(阿拉伯,希伯來,泰文)字元排版和亂碼問題解決
序:最近公司的需求:做一個模擬LED屏的顯示控制元件
中間各種曲折!此文做個記錄,本來早就改寫完的!各種原因前後隔了兩個多月!
文章寫的比較詳細,熟悉的大佬可以跳過直接看程式碼!
PS:後面有完整的程式碼
正文:具體的讀字型檔和點陣顯示就不詳細寫了,可以參考 簡書ForeverCy 大神的文章
廢話不多說,先看下直接讀出來的資料顯示,效果如下:
問題一:字元間隔太大!
分析原因:圖中選擇的是12字型大小!讀取的12*12點陣字型檔資料,一個字元佔12*2=24個位元組,每個字元橫向佔2個位元組也就是16位(繪製出來就是16個點),而英文字母實際上所佔用的位置只有5-7位,所以字元間會有點大!當然不只是英文字元,其他字元也有一樣的問題
解決思路:我們讀取出來的字型檔字模資料最後是轉換成二維布林陣列給顯示控制元件繪製的,能動的就是這個陣列了,所以將二維陣列豎排取出判斷,如果全是flase那就是沒有資料的空格了!再做刪除處理,當然為了保證程式的適應性,防止存在佔滿16位的字元(顯示出來的效果就是兩個字元連在一起)!在刪除間隔之前,需在字元之間再新增一位空格!
程式碼實現:
step1:在字元間插入一位空格
PS:此函式中還考慮了空格(16位空格)和阿拉伯文(字元是連在一起的不需要間隔)/** * 在字元間插入一位空格 * */ private void inserAemptyData(){ Log.e("matrix",matrix[0].length+""); //初始化空值陣列 empty_data = new boolean[dots]; for (int i=0;i<dots;i++) { empty_data[i] = false; } ArrayList<boolean[]> tem = new ArrayList<>(); int position = -1;//更改後的陣列當前下標 int matrix_index=-1;//更改前的陣列當前下標 spaceIndexs = new ArrayList<>(); ArabicIndexs = new ArrayList<>(); for (int i=0;i<str.length();i++){ String indexstr = str.substring(i, i + 1); String followstr = ""; if (i<str.length()-1){ followstr = str.substring(i+1,i+2); } boolean isSpace = indexstr.equals(" "); boolean isArabic = (!followstr.equals(""))&&ArabicUtils.isArbic(followstr)&&ArabicUtils.isArbic(indexstr); Log.e("indexstr>>>>",indexstr+""); for (int j=0;j<16;j++){//無論是12還是16字型 橫排都是兩個位元組 16位 position=position+1; matrix_index+=1; boolean[] indx = getstrbycolumn(matrix,matrix_index);//取一豎排 if (isArabic){ ArabicIndexs.add(position); } if (isSpace){ spaceIndexs.add(position); } tem.add(position,indx); indx =null; } // 連續的兩個阿拉伯字元之間不需要插入空格 if ((!followstr.equals(""))&&ArabicUtils.isArbic(followstr)&&ArabicUtils.isArbic(indexstr)){ }else{ Log.e("insert","1"); tem.add(position,empty_data); position+=1; } }
step2:取陣列豎排判斷並刪除多餘位空格(只留一個位)
/* * 2017.12.25 新增 * 消除多餘空格 * */ private void fillMatrixEmpty(){ //原則:判斷boolean二維陣列豎排是否出現連續為Flase的情況 如果是 便捨棄一個 否則新增到新的陣列中 // ArrayList<boolean[]> tem = new ArrayList<>(); int space_number=0; for (int i = 0;i<matrix[0].length-1;i++){ boolean[] indx = getstrbycolumn(matrix,i);//取一豎排 boolean[] indy = getstrbycolumn(matrix,i+1); if (i==matrix[0].length-1&&!Arrays.equals(empty_data,indy)){//最後一排加進去 // Log.e(i+">>>>","last_data"); tem.add(indy); } else if (isSpaceVaules(i)&&isSpaceVaules(i+1)){//如果是空格的位置 space_number+=1; // Log.e(i+">>>>","空格"); if (space_number<5){//空格位置過長 只取4個點作為空格 tem.add(indx); } if (space_number==16){ space_number=0; } } else if (!isSpaceVaules(i)&&Arrays.equals(empty_data,indx)&&Arrays.equals(empty_data,indy)){//如果相鄰兩列都為空 不儲存 // Log.e(i+">>>>","empty_data"); } else if (isArabicVaules(i)&&!isSpaceVaules(i)&&Arrays.equals(empty_data,indx)){//阿拉伯文 清除所有空格 } else{ //否則儲存 // Log.e(i+">>>>","data"); tem.add(indx); } indx = null; indy = null; } boolean[][] temps1 = new boolean[matrix.length][tem.size()];//12*n for (int i = 0;i<tem.size();i++){ boolean[] pos = tem.get(i); for (int j=0;j<matrix.length;j++){ temps1[j][i] = pos[j]; } } spaceIndexs=null; ArabicIndexs =null; matrix = temps1; // Log.e("matrix>>>>",matrix[0].length+""); // Log.e("tem>>>>",tem.size()+""); // Log.e("temps1>>>>",temps1[0].length+""); }
/**
* 取某一豎排值
* */
public boolean[] getstrbycolumn(boolean[][] strarray, int column){
int columnlength = strarray.length;
boolean[] result = new boolean[strarray.length];
for(int i=0;i<columnlength;i++) {
result[i] = strarray[i][column];
}
return result;
}
此時再看下效果,如下:很明顯間隔ok了,當然既然我們的主題是解決特殊國別的問題,先看下現在的效果如何
從左至右依次是:泰文,希伯來文,阿拉伯文
首先來看下泰文的問題:
問題二:字元上下標(估淺先這麼叫)無法正常顯示
原因分析:帶上下標的文字根本不只是一個字元,但我們讀取字模資料的時候是按照字元讀取的,所以上下標字元肯定不會出現在對應字元的上標或者下標位置
解決思路:這裡先看下 建國雄心 大佬新浪微博上的文章 泰文排版規則 ,可以先了解下泰文的基本字元知識,文章最後給出的那個解決方案,在下才疏學淺沒有用得上!浪費了大佬的苦心,抱歉!那就另闢蹊徑吧!
仔細看圖中泰文的上標字元,其中區域1(暫時叫實部)是有用的,區域2(暫時叫虛部)是不需要的!再用字型檔軟體開啟字型檔檔案看下:
所有的上下標的字元的虛部所佔的位置都是一樣的!既然這樣,那在讀取字型檔資料的時候就可以去掉虛部的部分,留下來的 實部再疊加到對應的字元上就行了,至於有哪些字元存在這種情況就需要一個個找了!我這裡給我三種國別文字上下標字元的Unicode碼,供參考:
//阿拉伯文上下標字元 unicode
static final int[] ArabicSup_Subs = {0x64b,0x64c,0x64d,0x64e,0x64f,0x650,0x651,0x652,0x653,0x654,0x655,0x656,0x657,0x658,0x659,0x65a,0x65b,0x65c,0x65d,0x65e,
0x6d6,0x6d7,0x6d8,0x6d9,0x6da,0x6db,0x6dc,
0x6df,0x6e0,0x6e1,0x6e2,0x6e3,0x6e4,
0x6e7,0x6e8,0x6ea,0x6eb,0x6ec
};
//希伯來文上下標字元 unicode
static final int[] HebrewSup_Subs = {0x591,0x592,0x593,0x594,0x595,0x596,0x597,0x598,0x599,0x59a,0x59b,0x59c,0x59d,0x59e,0x59f,0x5a0,
0x5a1,0x5a2,0x5a3,0x5a4,0x5a5,0x5a6,0x5a7,0x5a8,0x5a9,0x5aa,0x5ab,0x5ac,0x5ad,0x5ae,0x5af,0x5b0,0x5b1,0x5b2,
0x5b3,0x5b4,0x5b5,0x5b6,0x5b7,0x5b8,
0x5bb,0x5bd,0x5bf,0x5c1,0x5c2,0x5c4,0x5c5,0x5c7
};
//泰文 上下標字元 unicode
static final int[] ThaiSup_Subs = {0x0e31,0x0e34,0x0e35,0x0e36,0x0e37,0x0e38,0x0e39,0x0e3a,
0x0e47,0x0e48,0x0e49,0x0e4a,0x0e4b,0x0e4c,0x0e4d,0x0e4e
};
PS:實際上阿拉伯文和希伯來文都存在這種上下標的現象,在這裡就一併處理了!後面就不再累述了!
實現程式碼:
step1:在讀取字元字模資料時,去掉虛部併疊加到相應字元
//泰文上下標處理
if (ArabicUtils.isThai(subjectStr)&ArabicUtils.isThai(followStr)){//都為泰文字元
String follow2str = "";
if(index<str.length()-2){
follow2str = str.substring(index+2,index+3);//泰文存在上下標同時存在的情況
}
if (!follow2str.equals("")&&ArabicUtils.isSup_SubThai(follow2str)&&ArabicUtils.isSup_SubThai(followStr)){
byte[] data_follow2 = readAllZiMo(follow2str);
if(data_follow2!=null){//後續字元資料不為空
data = ArabicUtils.adminSup_SubThai(data_follow,data,dots);//將後面字元資料疊加到當前字元資料中
data = ArabicUtils.adminSup_SubThai(data_follow2,data,dots);//將後面字元資料疊加到當前字元資料中
index+=2;
System.arraycopy(data, 0, dataResult, hasDealByte, data.length);
hasDealByte = hasDealByte + data.length;
System.arraycopy(replacedata, 0, dataResult, hasDealByte, replacedata.length);
hasDealByte = hasDealByte + replacedata.length;
System.arraycopy(replacedata, 0, dataResult, hasDealByte, replacedata.length);
hasDealByte = hasDealByte + replacedata.length;
}
}else if (ArabicUtils.isSup_SubThai(followStr)){//後面字元是否為上下標特殊字元
if(data_follow!=null){//後續字元資料不為空
data = ArabicUtils.adminSup_SubThai(data_follow,data,dots);//將後面字元資料疊加到當前字元資料中
index+=1;
System.arraycopy(data, 0, dataResult, hasDealByte, data.length);
hasDealByte = hasDealByte + data.length;
System.arraycopy(replacedata, 0, dataResult, hasDealByte, replacedata.length);
hasDealByte = hasDealByte + replacedata.length;
}
}else {
System.arraycopy(data, 0, dataResult, hasDealByte, data.length);
hasDealByte = hasDealByte + data.length;
}
}else {
System.arraycopy(data, 0, dataResult, hasDealByte, data.length);
hasDealByte = hasDealByte + data.length;
}
上面的程式碼中還處理了上下標同時存在的情況,處理前後對比圖如下(12*12的點陣太小,16*16才能看除效果):
很明顯 效果好很多了,上面只給出了泰文的處理方式,其他兩種語言也都一樣,這裡我就不再貼了!如果需要完整程式碼,後面我會給出連結!
接下來,看阿拉伯文!
問題三:讀取的字元混亂,而且反向相反
這裡再次感謝 建國雄心的微博:阿拉伯文排版規則 的解惑,
原因分析:引用文章中的一句話:“阿拉伯文的字母沒有大寫和小寫的區分,但有印刷體和書寫體的區別,而且除去دذ ر زو五個字母以外,其他23個字母都可以和後面的字母連寫,而且因其在詞頭,詞中和詞尾的位置不同,字形也有所變化。阿拉伯文字的書寫方向和中文不同,它是自右向左橫著寫”,也就是說,我們看到的是書寫體,而字型檔檔案是以印刷體儲存的!也因為阿拉伯文是連字型的,所以直接讀取出來的資料是不對的,需要重新變形成新的字串再讀取byte資料
解決思路:大神的微博中給出的重排規則,至於方向,如果是純阿拉伯文字,直接反向就行了!但對於阿拉伯文和中文或者英文混輸就需要另外判斷了,
實現程式碼:
/**
* 阿拉伯文排版
* **/
@NonNull
public static String getArbicResult(String str){
StringBuffer stringBuffer = new StringBuffer();
for (int i=0;i<str.length();i++){
//取連續的三個字元判斷
String substr = str.substring(i,i+1);
String pre_sub ;
String for_sub ;
if (i==0){
pre_sub = "";
}else {
pre_sub = str.substring(i-1,i);
}
if (i==str.length()-1){
for_sub = "";
}else {
for_sub = str.substring(i+1,i+2);
}
if (isArbic(substr)){ //如果當前字元是阿拉伯文
boolean ispreconnect = false ;
boolean isforconnect = false;
//排版規則1:
// 1.判斷是否前連
if (isArbic(pre_sub)&&!pre_sub.equals("")){//如果前一個字元是阿拉伯文,判斷是否前連
ispreconnect = getIsPreConnect(pre_sub);
}else{//不需要判斷是否前連
}
//2.判斷是否後連
if (isArbic(for_sub)&&!for_sub.equals("")){//如果前一個字元是阿拉伯文,判斷是否後連
isforconnect = getIsForConnect(for_sub);
}else{//不需要判斷是否後連
}
//排版規則2:
//以0x644開頭,後面跟的是0x622,0x623,0x625,0x627
if (Integer.parseInt(gbEncoding(substr),16)==0x0644&&!for_sub.equals("")) {//是0x0644
int fors = Integer.parseInt(gbEncoding(for_sub),16);
if (fors==0x0622||fors==0x0623||fors==0x0625||fors==0x0627){//後面接0x622,0x623,0x625,0x627
//這種情況處理後 兩個字符合併成一個字元
//判斷0x0644前一個字元是否前連
int temp = 0;
if (ispreconnect){//是前連 取arabic_specs陣列 1
temp = 1;
}else{//不是 取arabic_specs陣列 0
temp = 0;
}
switch (fors){
case 0x0622:
substr = arabic_specs[0][temp]+"";
break;
case 0x0623:
substr = arabic_specs[1][temp]+"";
break;
case 0x0625:
substr = arabic_specs[2][temp]+"";
break;
case 0x0627:
substr = arabic_specs[3][temp]+"";
break;
}
substr = getStrFromUniCode(substr);
i+=1;
}
}else if (isNeedChange(substr)){//不是0x0644,並且在需要變形的陣列中
int index = 0;
if(!isforconnect&&ispreconnect){//前連
index = 1;
}
if (isforconnect&&!ispreconnect){//後連
index = 2;
}
if (isforconnect&&ispreconnect){//中間
index = 3;
}
if (!isforconnect&&!ispreconnect){//獨立
index = 4;
}
substr = getChangeReturn(substr,index);
substr = getStrFromUniCode(substr);
}
}else{//不是阿拉伯文
}
stringBuffer.append(substr);
}
return stringBuffer.toString();
}
/**
*返回重排後的字元
* */
private static String getChangeReturn(String substr,int index) {
int subunicode = Integer.parseInt(gbEncoding(substr),16);
for (int i=0;i<Arbic_Position.length;i++){
if (Arbic_Position[i][0]==subunicode){
substr = "\\u"+Integer.toHexString(Arbic_Position[i][index]);
}
}
return substr;
}
//阿拉伯文 當前字元是否需要重排
private static boolean isNeedChange(String substr) {
int subunicode = Integer.parseInt(gbEncoding(substr),16);
for (int i=0;i<Arbic_Position.length;i++){
if (Arbic_Position[i][0]==subunicode){
return true;
}
}
return false;
}
//後連
private static boolean getIsForConnect(String for_sub) {
int subunicode = Integer.parseInt(gbEncoding(for_sub),16);
for (int i=0;i<theSet2.length;i++){
if (theSet2[i]==subunicode){
return true;
}
}
return false;
}
//前連
private static boolean getIsPreConnect(String pre_sub) {
int subunicode = Integer.parseInt(gbEncoding(pre_sub),16);
for (int i=0;i<theSet1.length;i++){
if (theSet1[i]==subunicode){
return true;
}
}
return false;
}
下面是需要的Unicode陣列:
//阿拉伯文中需要變形字元的unicode碼 0x621-0x64a 集合中對應不同位置變形後的unicode碼
static final int[][] Arbic_Position = //former first, last, middle, alone
{
{0x621, 0xfe80, 0xfe80, 0xfe80, 0xfe80}, // 0x621
{0x622, 0xfe82, 0xfe81, 0xfe82, 0xfe81},
{ 0x623,0xfe84, 0xfe83, 0xfe84, 0xfe83},
{ 0x624,0xfe86, 0xfe85, 0xfe86, 0xfe85},
{0x625, 0xfe88, 0xfe87, 0xfe88, 0xfe87},
{ 0x626,0xfe8a, 0xfe8b, 0xfe8c, 0xfe89},
{0x627, 0xfe8e, 0xfe8d, 0xfe8e, 0xfe8d},
{0x628, 0xfe90, 0xfe91, 0xfe92, 0xfe8f}, // 0x628
{ 0x629,0xfe94, 0xfe93, 0xfe94, 0xfe93},
{0x62a, 0xfe96, 0xfe97, 0xfe98, 0xfe95}, // 0x62A
{0x62b, 0xfe9a, 0xfe9b, 0xfe9c, 0xfe99},
{0x62c, 0xfe9e, 0xfe9f, 0xfea0, 0xfe9d},
{0x62d, 0xfea2, 0xfea3, 0xfea4, 0xfea1},
{ 0x62e,0xfea6, 0xfea7, 0xfea8, 0xfea5},
{0x62f, 0xfeaa, 0xfea9, 0xfeaa, 0xfea9},
{0x630, 0xfeac, 0xfeab, 0xfeac, 0xfeab}, // 0x630
{0x631, 0xfeae, 0xfead, 0xfeae, 0xfead},
{ 0x632,0xfeb0, 0xfeaf, 0xfeb0, 0xfeaf},
{0x633, 0xfeb2, 0xfeb3, 0xfeb4, 0xfeb1},
{0x634, 0xfeb6, 0xfeb7, 0xfeb8, 0xfeb5},
{ 0x635,0xfeba, 0xfebb, 0xfebc, 0xfeb9},
{0x636, 0xfebe, 0xfebf, 0xfec0, 0xfebd},
{0x637, 0xfec2, 0xfec3, 0xfec4, 0xfec1},
{0x638, 0xfec6, 0xfec7, 0xfec8, 0xfec5}, // 0x638
{0x639, 0xfeca, 0xfecb, 0xfecc, 0xfec9},
{ 0x63a,0xfece, 0xfecf, 0xfed0, 0xfecd}, //0x63A
{0x63b, 0x63b, 0x63b, 0x63b, 0x63b},
{0x63c, 0x63c, 0x63c, 0x63c, 0x63c},
{0x63d, 0x63d, 0x63d, 0x63d, 0x63d},
{0x63e, 0x63e, 0x63e, 0x63e, 0x63e},
{0x63f, 0x63f, 0x63f, 0x63f, 0x63f},
{ 0x640,0x640, 0x640, 0x640, 0x640}, // 0x640
{0x641, 0xfed2, 0xfed3, 0xfed4, 0xfed1},
{ 0x642,0xfed6, 0xfed7, 0xfed8, 0xfed5},
{0x643, 0xfeda, 0xfedb, 0xfedc, 0xfed9},
{ 0x644,0xfede, 0xfedf, 0xfee0, 0xfedd},
{0x645, 0xfee2, 0xfee3, 0xfee4, 0xfee1},
{0x646, 0xfee6, 0xfee7, 0xfee8, 0xfee5},
{ 0x647,0xfeea, 0xfeeb, 0xfeec, 0xfee9},
{ 0x648,0xfeee, 0xfeed, 0xfeee, 0xfeed}, // 0x648
{0x649, 0xfef0, 0xfef3, 0xfef4, 0xfeef},
{0x64a,0xfef2, 0xfef3, 0xfef4, 0xfef1}, // 0x64A
};
//前連集合
//判斷是否是連線前面的,採用判斷該字元前一個字元的判定方法,方法是,看前一個字元是否在集合set1中。如果在,則是有連線前面的
static final int[] theSet1={
0x62c, 0x62d, 0x62e, 0x647, 0x639, 0x63a, 0x641, 0x642,
0x62b, 0x635, 0x636, 0x637, 0x643, 0x645, 0x646, 0x62a,
0x644, 0x628, 0x64a, 0x633, 0x634, 0x638, 0x626, 0x640}; // 0x640 新增
//後連集合
//判斷是否是連線後面的,採用判斷該字元後一個字元的判定方法,方法是,看後一個字元是否在集合set2中。如果在,則是有連線後面的
static final int[] theSet2={
0x62c, 0x62d, 0x62e, 0x647, 0x639, 0x63a, 0x641, 0x642,
0x62b, 0x635, 0x636, 0x637, 0x643, 0x645, 0x646, 0x62a,
0x644, 0x628, 0x64a, 0x633, 0x634, 0x638, 0x626,
0x627, 0x623, 0x625, 0x622, 0x62f, 0x630, 0x631, 0x632,
0x648, 0x624, 0x629, 0x649, 0x640}; // 0x640 新增
//連字元是以0x644開頭,後面跟的是0x622,0x623,0x625,0x627,並根據情況取下面的字元陣列0或1,如果0x644前一個字元是在集合1(同上面的集合1)中間,那麼取陣列1,否則取陣列0
static final int[][] arabic_specs=
{
{0xFEF5,0xFEF6},//0x622
{0xFEF7,0xFEF8},//0x623
{0xFEF9,0xFEFA},//0x625
{0xFEFB,0xFEFC},//0x627
};
字元順序處理:
/**
* 含阿拉伯/希伯來文字串 重新排序
* 問題:阿拉伯/希伯來文字串從右至左讀取,一般來說直接全部反序就行
* 當時用String.sub()方法逐一取字元時(從右向左) 在遇到連續的非阿拉伯/希伯來文時,java預設從連續字串的左邊讀取
*那麼問題是:如果不區分開 在讀取的時候就亂了
* 思路:按照正常字串的方式 從左向右讀取,第一個(也就是字串的最後一個字元)必須是正常的字元(如果不是新增“1”)
* 從左至右 遇到正常的字符集 反序
* */
private String adminInverso(String str){
String result =str;
boolean isEndOfSpecial = false;//阿拉伯字元結尾
boolean isendofspace =false;//空格結尾
String endstr = str.substring(str.length()-1,str.length());
if (ArabicUtils.isArbic(endstr)||ArabicUtils.isHebrew(endstr)){//字串以阿拉伯文/希伯來結尾,這種情況整個字串從右至左
Log.e("isArbic>>>>>>",str.substring(str.length()-1,str.length()));
isEndOfSpecial = true;
str = str+"1";
// result = inverso(str);//整個字串反序
}
// else if (endstr.equals(" ")){
// isendofspace = true;
// str = str+"1";
// }
// if (!str.substring(str.length()-1,str.length()).equals(" "))
// {//字串不以阿拉伯文結尾,從右至左解碼,解碼時最後剩餘的非阿拉伯字元為左到右
Log.e("isnomal>>>>>>",str.substring(str.length()-1,str.length()));
StringBuffer stringBuffer = new StringBuffer();
StringBuffer laststrs = new StringBuffer();
StringBuffer arabicstrs = new StringBuffer();
for (int i=str.length();i>0;i--){
String sub = str.substring(i-1,i);
String follow = "";
if (i>1){
follow = str.substring(i-2,i-1);
}
if (!ArabicUtils.isArbic(sub)&&!ArabicUtils.isHebrew(sub)){
laststrs.append(sub);
if ((!follow.equals(""))&&(ArabicUtils.isArbic(follow)||ArabicUtils.isHebrew(follow))){
stringBuffer.append(inverso(laststrs.toString()));
laststrs.delete(0,laststrs.length());
}
}
else {
stringBuffer.append(sub);
}
}
if (isEndOfSpecial){
stringBuffer.delete(0,1);
}
result = stringBuffer.toString();
return result;
}
//字串反序
private String inverso (String str){
StringBuffer stringBuffer = new StringBuffer();
List<String> list = new ArrayList<>();
for (int i=0;i<str.length();i++){
String sub = str.substring(i,i+1);
list.add(sub);
}
for (int j=0;j<list.size();j++){
stringBuffer.append(list.get(list.size()-j-1));
}
return stringBuffer.toString();
}
處理後效果對比如下:
好了,到目前為止!差不多都解決了!至於其他國別文字,思路都差不多!因為隔了很久,中間有些細節的東西就沒逐一寫出來了,後面給出完整的工具類!有需要的可以參考
資源連結:
注意:文章到此結束,後面是大篇幅的原始碼!
完整程式碼:
特殊國別處理工具類:
package xc.LEDILove.utils;
import android.support.annotation.NonNull;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
/**
* Created by xcgd on 2018/1/5.
*/
public class ArabicUtils {
private static ArabicUtils single = null;
private ArabicUtils(){};
public static ArabicUtils getInstance(){
if (single==null){
single = new ArabicUtils();
return single;
}else {
return single;
}
}
public static void main(String[] args){
//下面寫你要測試的方法,如:
// getArbicResult("مرحباً");
System.out.println("مرحباً");
System.out.println(getArbicResult("مرحباً"));
}
//阿拉伯文中需要變形字元的unicode碼 0x621-0x64a 集合中對應不同位置變形後的unicode碼
static final int[][] Arbic_Position = //former first, last, middle, alone
{
{0x621, 0xfe80, 0xfe80, 0xfe80, 0xfe80}, // 0x621
{0x622, 0xfe82, 0xfe81, 0xfe82, 0xfe81},
{ 0x623,0xfe84, 0xfe83, 0xfe84, 0xfe83},
{ 0x624,0xfe86, 0xfe85, 0xfe86, 0xfe85},
{0x625, 0xfe88, 0xfe87, 0xfe88, 0xfe87},
{ 0x626,0xfe8a, 0xfe8b, 0xfe8c, 0xfe89},
{0x627, 0xfe8e, 0xfe8d, 0xfe8e, 0xfe8d},
{0x628, 0xfe90, 0xfe91, 0xfe92, 0xfe8f}, // 0x628
{ 0x629,0xfe94, 0xfe93, 0xfe94, 0xfe93},
{0x62a, 0xfe96, 0xfe97, 0xfe98, 0xfe95}, // 0x62A
{0x62b, 0xfe9a, 0xfe9b, 0xfe9c, 0xfe99},
{0x62c, 0xfe9e, 0xfe9f, 0xfea0, 0xfe9d},
{0x62d, 0xfea2, 0xfea3, 0xfea4, 0xfea1},
{ 0x62e,0xfea6, 0xfea7, 0xfea8, 0xfea5},
{0x62f, 0xfeaa, 0xfea9, 0xfeaa, 0xfea9},
{0x630, 0xfeac, 0xfeab, 0xfeac, 0xfeab}, // 0x630
{0x631, 0xfeae, 0xfead, 0xfeae, 0xfead},
{ 0x632,0xfeb0, 0xfeaf, 0xfeb0, 0xfeaf},
{0x633, 0xfeb2, 0xfeb3, 0xfeb4, 0xfeb1},
{0x634, 0xfeb6, 0xfeb7, 0xfeb8, 0xfeb5},
{ 0x635,0xfeba, 0xfebb, 0xfebc, 0xfeb9},
{0x636, 0xfebe, 0xfebf, 0xfec0, 0xfebd},
{0x637, 0xfec2, 0xfec3, 0xfec4, 0xfec1},
{0x638, 0xfec6, 0xfec7, 0xfec8, 0xfec5}, // 0x638
{0x639, 0xfeca, 0xfecb, 0xfecc, 0xfec9},
{ 0x63a,0xfece, 0xfecf, 0xfed0, 0xfecd}, //0x63A
{0x63b, 0x63b, 0x63b, 0x63b, 0x63b},
{0x63c, 0x63c, 0x63c, 0x63c, 0x63c},
{0x63d, 0x63d, 0x63d, 0x63d, 0x63d},
{0x63e, 0x63e, 0x63e, 0x63e, 0x63e},
{0x63f, 0x63f, 0x63f, 0x63f, 0x63f},
{ 0x640,0x640, 0x640, 0x640, 0x640}, // 0x640
{0x641, 0xfed2, 0xfed3, 0xfed4, 0xfed1},
{ 0x642,0xfed6, 0xfed7, 0xfed8, 0xfed5},
{0x643, 0xfeda, 0xfedb, 0xfedc, 0xfed9},
{ 0x644,0xfede, 0xfedf, 0xfee0, 0xfedd},
{0x645, 0xfee2, 0xfee3, 0xfee4, 0xfee1},
{0x646, 0xfee6, 0xfee7, 0xfee8, 0xfee5},
{ 0x647,0xfeea, 0xfeeb, 0xfeec, 0xfee9},
{ 0x648,0xfeee, 0xfeed, 0xfeee, 0xfeed}, // 0x648
{0x649, 0xfef0, 0xfef3, 0xfef4, 0xfeef},
{0x64a,0xfef2, 0xfef3, 0xfef4, 0xfef1}, // 0x64A
};
//前連集合
//判斷是否是連線前面的,採用判斷該字元前一個字元的判定方法,方法是,看前一個字元是否在集合set1中。如果在,則是有連線前面的
static final int[] theSet1={
0x62c, 0x62d, 0x62e, 0x647, 0x639, 0x63a, 0x641, 0x642,
0x62b, 0x635, 0x636, 0x637, 0x643, 0x645, 0x646, 0x62a,
0x644, 0x628, 0x64a, 0x633, 0x634, 0x638, 0x626, 0x640}; // 0x640 新增
//後連集合
//判斷是否是連線後面的,採用判斷該字元後一個字元的判定方法,方法是,看後一個字元是否在集合set2中。如果在,則是有連線後面的
static final int[] theSet2={
0x62c, 0x62d, 0x62e, 0x647, 0x639, 0x63a, 0x641, 0x642,
0x62b, 0x635, 0x636, 0x637, 0x643, 0x645, 0x646, 0x62a,
0x644, 0x628, 0x64a, 0x633, 0x634, 0x638, 0x626,
0x627, 0x623, 0x625, 0x622, 0x62f, 0x630, 0x631, 0x632,
0x648, 0x624, 0x629, 0x649, 0x640}; // 0x640 新增
//連字元是以0x644開頭,後面跟的是0x622,0x623,0x625,0x627,並根據情況取下面的字元陣列0或1,如果0x644前一個字元是在集合1(同上面的集合1)中間,那麼取陣列1,否則取陣列0
static final int[][] arabic_specs=
{
{0xFEF5,0xFEF6},//0x622
{0xFEF7,0xFEF8},//0x623
{0xFEF9,0xFEFA},//0x625
{0xFEFB,0xFEFC},//0x627
};
//阿拉伯文上下標字元 unicode
static final int[] ArabicSup_Subs = {0x64b,0x64c,0x64d,0x64e,0x64f,0x650,0x651,0x652,0x653,0x654,0x655,0x656,0x657,0x658,0x659,0x65a,0x65b,0x65c,0x65d,0x65e,
0x6d6,0x6d7,0x6d8,0x6d9,0x6da,0x6db,0x6dc,
0x6df,0x6e0,0x6e1,0x6e2,0x6e3,0x6e4,
0x6e7,0x6e8,0x6ea,0x6eb,0x6ec
};
//印地文上下標字元 unicode
static final int[] HindiSup_Subs = {0x901,0x902,0x903,0x93c,0x941,0x942,0x943,0x944,0x945,0x946,0x947,0x948,0x94d,
0x951,0x952,0x953,0x954,0x962,0x963,
};
//希伯來文上下標字元 unicode
static final int[] HebrewSup_Subs = {0x591,0x592,0x593,0x594,0x595,0x596,0x597,0x598,0x599,0x59a,0x59b,0x59c,0x59d,0x59e,0x59f,0x5a0,
0x5a1,0x5a2,0x5a3,0x5a4,0x5a5,0x5a6,0x5a7,0x5a8,0x5a9,0x5aa,0x5ab,0x5ac,0x5ad,0x5ae,0x5af,0x5b0,0x5b1,0x5b2,
0x5b3,0x5b4,0x5b5,0x5b6,0x5b7,0x5b8,
0x5bb,0x5bd,0x5bf,0x5c1,0x5c2,0x5c4,0x5c5,0x5c7
};
//泰文 上下標字元 unicode
static final int[] ThaiSup_Subs = {0x0e31,0x0e34,0x0e35,0x0e36,0x0e37,0x0e38,0x0e39,0x0e3a,
0x0e47,0x0e48,0x0e49,0x0e4a,0x0e4b,0x0e4c,0x0e4d,0x0e4e
};
//左右結構字元 unicode
static final int[] CRLH = {0x903,0x93e,
};
//阿拉伯文 28個字母unicode範圍 :0x060C--0x06FE
/**
* 阿拉伯文排版
* **/
@NonNull
public static String getArbicResult(String str){
StringBuffer stringBuffer = new StringBuffer();
for (int i=0;i<str.length();i++){
//取連續的三個字元判斷
String substr = str.substring(i,i+1);
String pre_sub ;
String for_sub ;
if (i==0){
pre_sub = "";
}else {
pre_sub = str.substring(i-1,i);
}
if (i==str.length()-1){
for_sub = "";
}else {
for_sub = str.substring(i+1,i+2);
}
if (isArbic(substr)){ //如果當前字元是阿拉伯文
boolean ispreconnect = false ;
boolean isforconnect = false;
//排版規則1:
// 1.判斷是否前連
if (isArbic(pre_sub)&&!pre_sub.equals("")){//如果前一個字元是阿拉伯文,判斷是否前連
ispreconnect = getIsPreConnect(pre_sub);
}else{//不需要判斷是否前連
}
//2.判斷是否後連
if (isArbic(for_sub)&&!for_sub.equals("")){//如果前一個字元是阿拉伯文,判斷是否後連
isforconnect = getIsForConnect(for_sub);
}else{//不需要判斷是否後連
}
//排版規則2:
//以0x644開頭,後面跟的是0x622,0x623,0x625,0x627
if (Integer.parseInt(gbEncoding(substr),16)==0x0644&&!for_sub.equals("")) {//是0x0644
int fors = Integer.parseInt(gbEncoding(for_sub),16);
if (fors==0x0622||fors==0x0623||fors==0x0625||fors==0x0627){//後面接0x622,0x623,0x625,0x627
//這種情況處理後 兩個字符合併成一個字元
//判斷0x0644前一個字元是否前連
int temp = 0;
if (ispreconnect){//是前連 取arabic_specs陣列 1
temp = 1;
}else{//不是 取arabic_specs陣列 0
temp = 0;
}
switch (fors){
case 0x0622:
substr = arabic_specs[0][temp]+"";
break;
case 0x0623:
substr = arabic_specs[1][temp]+"";
break;
case 0x0625:
substr = arabic_specs[2][temp]+"";
break;
case 0x0627:
substr = arabic_specs[3][temp]+"";
break;
}
substr = getStrFromUniCode(substr);
i+=1;
}
}else if (isNeedChange(substr)){//不是0x0644,並且在需要變形的陣列中
int index = 0;
if(!isforconnect&&ispreconnect){//前連
index = 1;
}
if (isforconnect&&!ispreconnect){//後連
index = 2;
}
if (isforconnect&&ispreconnect){//中間
index = 3;
}
if (!isforconnect&&!ispreconnect){//獨立
index = 4;
}
substr = getChangeReturn(substr,index);
substr = getStrFromUniCode(substr);
}
}else{//不是阿拉伯文
}
stringBuffer.append(substr);
}
return stringBuffer.toString();
}
/**
*返回重排後的字元
* */
private static String getChangeReturn(String substr,int index) {
int subunicode = Integer.parseInt(gbEncoding(substr),16);
for (int i=0;i<Arbic_Position.length;i++){
if (Arbic_Position[i][0]==subunicode){
substr = "\\u"+Integer.toHexString(Arbic_Position[i][index]);
}
}
return substr;
}
//阿拉伯文 當前字元是否需要重排
private static boolean isNeedChange(String substr) {
int subunicode = Integer.parseInt(gbEncoding(substr),16);
for (int i=0;i<Arbic_Position.length;i++){
if (Arbic_Position[i][0]==subunicode){
return true;
}
}
return false;
}
//後連
private static boolean getIsForConnect(String for_sub) {
int subunicode = Integer.parseInt(gbEncoding(for_sub),16);
for (int i=0;i<theSet2.length;i++){
if (theSet2[i]==subunicode){
return true;
}
}
return false;
}
//前連
private static boolean getIsPreConnect(String pre_sub) {
int subunicode = Integer.parseInt(gbEncoding(pre_sub),16);
for (int i=0;i<theSet1.length;i++){
if (theSet1[i]==subunicode){
return true;
}
}
return false;
}
//阿拉伯文上下標處理
public static byte[] adminSup_SubArabic(byte[] str_byte,byte[] follow_byte,int dots){
byte[] resultbyte= follow_byte;
if (dots==12){//字型為12時 上下標字元 實體佔第1-4,11-12行 虛體佔5-10行
for (int i=4*2;i<10*2;i++){//每行兩個位元組
str_byte[i]=0x00;//將虛體部分清除
}
}else if (dots==16){//字型為12時 上下標字元 實體佔第1-6,14-16行 7-13虛體佔行
for (int i=6*2;i<12*2;i++){//每行兩個位元組
str_byte[i]=0x00;//將虛體部分清除
}
}
for (int k=0;k<str_byte.length;k++){
resultbyte[k]= (byte) (str_byte[k]|follow_byte[k]);
}
return resultbyte;
}
//是否為需要處理的上下標特殊字元
public static boolean isSup_SubArabic(String str){
int subunicode = Integer.parseInt(gbEncoding(str),16);
for (int i=0;i<ArabicSup_Subs.length;i++){
if (ArabicSup_Subs[i]==subunicode){
return true;
}
}
return false;
}
//判斷字元是否是阿拉伯文
public static boolean isArbic (String sub){
for (int j=0;j<sub.length();j++){
String substr = sub.substring(j,j+1);
if (substr.equals("")){
return false;
}
int subunicode = 0x00;
subunicode = Integer.parseInt(gbEncoding(substr),16);
if (((subunicode>0x0600)&&(subunicode<0x06ff))||//0600-06FF:阿拉伯文 (Arabic)
((subunicode>0xfb50)&&(subunicode<0xfdff))||// FB50-FDFF:阿拉伯表達形式A (Arabic Presentation Form-A)
((subunicode>0xfe70)&&(subunicode<0xfeff))){//FE70-FEFF:阿拉伯表達形式B (Arabic Presentation Form-B)
return true;
}else {
return false;
}
}
return false;
}
//泰文上下標處理
public static byte[] adminSup_SubThai(byte[] str_byte,byte[] follow_byte,int dots){
byte[] resultbyte= follow_byte;
if (dots==12){//字型為12時 上下標字元 實體佔第1-4,11-12行 虛體佔5-10行
for (int i=5*2;i<10*2;i++){//每行兩個位元組
str_byte[i]=0x00;//將虛體部分清除
}
}else if (dots==16){//字型為12時 上下標字元 實體佔第1-6,14-16行 7-13虛體佔行
for (int i=6*2;i<12*2;i++){//每行兩個位元組
str_byte[i]=0x00;//將虛體部分清除
}
}
for (int k=0;k<str_byte.length;k++){
resultbyte[k]= (byte) (str_byte[k]|follow_byte[k]);
}
return resultbyte;
}
//是否為需要處理的上下標特殊字元
public static boolean isSup_SubThai(String str){
int subunicode = Integer.parseInt(gbEncoding(str),16);
for (int i=0;i<ThaiSup_Subs.length;i++){
if (ThaiSup_Subs[i]==subunicode){
return true;
}
}
return false;
}
//判斷字元是否是泰文
public static boolean isThai (String sub){
for (int j=0;j<sub.length();j++){
String substr = sub.substring(j,j+1);
if (substr.equals("")){
return false;
}
int subunicode = 0x00;
subunicode = Integer.parseInt(gbEncoding(substr),16);
//泰文編碼範圍0E00-0E3a,0E3f-0E5b,
if (((subunicode>0x0e00)&&(subunicode<0x0e3a))||
((subunicode>0x0e3f)&&(subunicode<0x0e5b))){
return true;
}else {
return false;
}
}
return false;
}
//希伯來文上下標處理
public static byte[] adminSup_SubHebrew(byte[] str_byte,byte[] follow_byte,int dots){
byte[] resultbyte= follow_byte;
if (dots==12){//字型為12時 上下標字元 實體佔第1-4,11-12行 虛體佔5-10行
for (int i=4*2;i<10*2;i++){//每行兩個位元組
str_byte[i]=0x00;//將虛體部分清除
}
}else if (dots==16){//字型為16時 上下標字元 實體佔第1-6,14-16行 7-13虛體佔行
for (int i=6*2;i<13*2;i++){//每行兩個位元組
str_byte[i]=0x00;//將虛體部分清除
}
}
for (int k=0;k<str_byte.length;k++){
resultbyte[k]= (byte) (str_byte[k]|follow_byte[k]);
}
return resultbyte;
}
//是否為需要處理的上下標特殊字元
public static boolean isSup_SubHebrew(String str){
int subunicode = Integer.parseInt(gbEncoding(str),16);
for (int i=0;i<HebrewSup_Subs.length;i++){
if (HebrewSup_Subs[i]==subunicode){
return true;
}
}
return false;
}
//判斷字元是否是希伯來文
public static boolean isHebrew (String sub){
for (int j=0;j<sub.length();j++){
String substr = sub.substring(j,j+1);
if (substr.equals("")){
return false;
}
int subunicode = 0x00;
subunicode = Integer.parseInt(gbEncoding(substr),16);
//希伯來文編碼範圍:0590-05ff
if (((subunicode>0x0590)&&(subunicode<0x05ff))
){
return true;
}else {
return false;
}
}
return false;
}
//印地文上下標處理
public static byte[] adminSup_SubHindi(byte[] str_byte,byte[] follow_byte,int dots){
byte[] resultbyte= follow_byte;
if (dots==12){//字型為12時 上下標字元 實體佔第1-5,12行 虛體佔6-11行
for (int i=5*2;i<11*2;i++){//每行兩個位元組
str_byte[i]=0x00;//將虛體部分清除
}
}else if (dots==16){//字型為16時 上下標字元 實體佔第1-6,13-16行 7-12虛體佔行
for (int i=7*2;i<13*2;i++){//每行兩個位元組
str_byte[i]=0x00;//將虛體部分清除
}
}
for (int k=0;k<str_byte.length;k++){
resultbyte[k]= (byte) (str_byte[k]|follow_byte[k]);
}
return resultbyte;
}
//是否為需要處理的印地文上下標特殊字元
public static boolean isSup_SubHindi(String str){
int subunicode = Integer.parseInt(gbEncoding(str),16);
for (int i=0;i<HindiSup_Subs.length;i++){
if (HindiSup_Subs[i]==subunicode){
return true;
}
}
return false;
}
//判斷字元是否是印地文
public static boolean isHindi (String sub){
for (int j=0;j<sub.length();j++){
String substr = sub.substring(j,j+1);
if (substr.equals("")){
return false;
}
int subunicode = 0x00;
subunicode = Integer.parseInt(gbEncoding(substr),16);
//印地文編碼範圍:0900-097f
if (((subunicode>0x0900)&&(subunicode<0x097f))
){
return true;
}else {
return false;
}
}
return false;
}
/*
* 根據字元轉unicode碼
* */
private static String gbEncoding(final String gbString) {
char[] utfBytes = gbString.toCharArray();
String unicodeBytes = "";
for (int byteIndex = 0; byteIndex < utfBytes.length; byteIndex++) {
String hexB = Integer.toHexString(utfBytes[byteIndex]);
if (hexB.length() <= 2) {
hexB = "00" + hexB;
}
// unicodeBytes = unicodeBytes + "\\u" + hexB;
unicodeBytes = unicodeBytes + hexB;
}
// System.out.println("unicodeBytes is: " + unicodeBytes);
return unicodeBytes;
}
/*
* 根據unicode轉字元
* */
@NonNull
private static String getStrFromUniCode(String unicode){
StringBuffer string = new StringBuffer();
String[] hex = unicode.split("\\\\u");
for (int i = 1; i < hex.length; i++) {
// 轉換出每一個程式碼點
int data = Integer.parseInt(hex[i], 16);
// 追加成string
string.append((char) data);
}
String s = string.toString();
return string.toString();
}
public static String replaceUnicode(String sourceStr)
{
String regEx= "["+
"\u0000-\u001F"+//:C0控制符及基本拉丁文 (C0 Control and Basic Latin)
"\u007F-\u00A0" +// :特殊 (Specials);
// "\u0600-\u06FF"+// 阿拉伯文
"\u064b-\u064b"+// 阿拉伯文
// "\u0E00-\u0E7F"+//:泰文 (Thai)
"]";
// "\u4E00-\u9FBF"+//:CJK 統一表意符號 (CJK Unified Ideographs)
// "\u4DC0-\u4DFF"+//:易經六十四卦符號 (Yijing Hexagrams Symbols)
// "\u0000-\u007F"+//:C0控制符及基本拉丁文 (C0 Control and Basic Latin)
// "\u0080-\u00FF"+//:C1控制符及拉丁:補充-1 (C1 Control and Latin 1 Supplement)
// "\u0100-\u017F"+//:拉丁文擴充套件-A (Latin Extended-A)
// "\u0180-\u024F"+//:拉丁文擴充套件-B (Latin Extended-B)
// "\u0250-\u02AF"+//:國際音標擴充套件 (IPA Extensions)
// "\u02B0-\u02FF"+//:空白修飾字母 (Spacing Modifiers)
// "\u0300-\u036F"+//:結合用讀音符號 (Combining Diacritics Marks)
// "\u0370-\u03FF"+//:希臘文及科普特文 (Greek and Coptic)
// "\u0400-\u04FF"+//:西裡爾字母 (Cyrillic)
// "\u0500-\u052F"+//:西裡爾字母補充 (Cyrillic Supplement)
// "\u0530-\u058F"+//:亞美尼亞語 (Armenian)
// "\u0590-\u05FF"+//:希伯來文 (Hebrew)
// "\u0600-\u06FF"+//:阿拉伯文 (Arabic)
// "\u0700-\u074F"+//:敘利亞文 (Syriac)
// "\u0750-\u077F"+//:阿拉伯文補充 (Arabic Supplement)
// "\u0780-\u07BF"+//:馬爾地夫語 (Thaana)
// //"\u07C0-\u077F"+//:西非書面語言 (N'Ko)
// "\u0800-\u085F"+//:阿維斯塔語及巴列維語 (Avestan and Pahlavi)
// "\u0860-\u087F"+//:Mandaic
// "\u0880-\u08AF"+//:撒馬利亞語 (Samaritan)
// "\u0900-\u097F"+//:天城文書 (Devanagari)
// "\u0980-\u09FF"+//:孟加拉語 (Bengali)
// "\u0A00-\u0A7F"+//:錫克教文 (Gurmukhi)
// "\u0A80-\u0AFF"+//:古吉拉特文 (Gujarati)
// "\u0B00-\u0B7F"+//:奧里亞文 (Oriya)
// "\u0B80-\u0BFF"+//:泰米爾文 (Tamil)
// "\u0C00-\u0C7F"+//:泰盧固文 (Telugu)
// "\u0C80-\u0CFF"+//:卡納達文 (Kannada)
// "\u0D00-\u0D7F"+//:德拉維族語 (Malayalam)
// "\u0D80-\u0DFF"+//:僧伽羅語 (Sinhala)
// "\u0E00-\u0E7F"+//:泰文 (Thai)
// "\u0E80-\u0EFF"+//:寮國文 (Lao)
// "\u0F00-\u0FFF"+//:藏文 (Tibetan)
// "\u1000-\u109F"+//:緬甸語 (Myanmar)
// "\u10A0-\u10FF"+//:喬治亞語 (Georgian)
// "\u1100-\u11FF"+//:朝鮮文 (Hangul Jamo)
// "\u1200-\u137F"+//:衣索比亞語 (Ethiopic)
// "\u1380-\u139F"+//:衣索比亞語補充 (Ethiopic Supplement)
// "\u13A0-\u13FF"+//:切羅基語 (Cherokee)
// "\u1400-\u167F"+//:統一加拿大土著語音節 (Unified Canadian Aboriginal Syllabics)
// "\u1680-\u169F"+//:歐甘字母 (Ogham)
// "\u16A0-\u16FF"+//:如尼文 (Runic)
// "\u1700-\u171F"+//:塔加拉語 (Tagalog)
// "\u1720-\u173F"+//:Hanunóo
// "\u1740-\u175F"+//:Buhid
// "\u1760-\u177F"+//:Tagbanwa
// "\u1780-\u17FF"+//:高棉語 (Khmer)
// "\u1800-\u18AF"+//:蒙古文 (Mongolian)
// "\u18B0-\u18FF"+//:Cham
// "\u1900-\u194F"+//:Limbu
// "\u1950-\u197F"+//:德巨集泰語 (Tai Le)
// "\u1980-\u19DF"+//:新傣仂語 (New Tai Lue)
// "\u19E0-\u19FF"+//:高棉語記號 (Kmer Symbols)
// "\u1A00-\u1A1F"+//:Buginese
// "\u1A20-\u1A5F"+//:Batak
// "\u1A80-\u1AEF"+//:Lanna
// "\u1B00-\u1B7F"+//:巴釐語 (Balinese)
// "\u1B80-\u1BB0"+//:巽他語 (Sundanese)
// "\u1BC0-\u1BFF"+//:Pahawh Hmong
// "\u1C00-\u1C4F"+//:雷布查語(Lepcha)
// "\u1C50-\u1C7F"+//:Ol Chiki
// "\u1C80-\u1CDF"+//:曼尼普爾語 (Meithei/Manipuri)
// "\u1D00-\u1D7F"+//:語音學擴充套件 (Phone tic Extensions)
// "\u1D80-\u1DBF"+//:語音學擴充套件補充 (Phonetic Extensions Supplement)
// "\u1DC0-\u1DFF"+//結合用讀音符號補充 (Combining Diacritics Marks Supplement)
// "\u1E00-\u1EFF"+//:拉丁文擴充附加 (Latin Extended Additional)
// "\u1F00-\u1FFF"+//:希臘語擴充 (Greek Extended)
// "\u2000-\u206F"+//:常用標點 (General Punctuation)
// "\u2070-\u209F"+//:上標及下標 (Superscripts and Subscripts)
// "\u20A0-\u20CF"+//:貨幣符號 (Currency Symbols)
// "\u20D0-\u20FF"+//:組合用記號 (Combining Diacritics Marks for Symbols)
// "\u2100-\u214F"+//:字母式符號 (Letterlike Symbols)
// "\u2150-\u218F"+//:數字形式 (Number Form)
// "\u2190-\u21FF"+//:箭頭 (Arrows)
// "\u2200-\u22FF"+//:數學運算子 (Mathematical Operator)
// "\u2300-\u23FF"+//:雜項工業符號 (Miscellaneous Technical)
// "\u2400-\u243F"+//:控制圖片 (Control Pictures)
// "\u2440-\u245F"+//:光學識別符 (Optical Character Recognition)
// "\u2460-\u24FF"+//:封閉式字母數字 (Enclosed Alphanumerics)
// "\u2500-\u257F"+//:製表符 (Box Drawing)
// "\u2580-\u259F"+//:方塊元素 (Block Element)
// "\u25A0-\u25FF"+//:幾何圖形 (Geometric Shapes)
// "\u2600-\u26FF"+//:雜項符號 (Miscellaneous Symbols)
// "\u2700-\u27BF"+//:印刷符號 (Dingbats)
// "\u27C0-\u27EF"+//:雜項數學符號-A (Miscellaneous Mathematical Symbols-A)
// "\u27F0-\u27FF"+//:追加箭頭-A (Supplemental Arrows-A)
// "\u2800-\u28FF"+//:盲文點字模型 (Braille Patterns)
// "\u2900-\u297F"+//:追加箭頭-B (Supplemental Arrows-B)
// "\u2980-\u29FF"+//:雜項數學符號-B (Miscellaneous Mathematical Symbols-B)
// "\u2A00-\u2AFF"+//:追加數學運算子 (Supplemental Mathematical Operator)
// "\u2B00-\u2BFF"+//:雜項符號和箭頭 (Miscellaneous Symbols and Arrows)
// "\u2C00-\u2C5F"+//:格拉哥里字母 (Glagolitic)
// "\u2C60-\u2C7F"+//:拉丁文擴充套件-C (Latin Extended-C)
// "\u2C80-\u2CFF"+//:古埃及語 (Coptic)
// "\u2D00-\u2D2F"+//:喬治亞語補充 (Georgian Supplement)
// "\u2D30-\u2D7F"+//:提非納文 (Tifinagh)
// "\u2D80-\u2DDF"+//:衣索比亞語擴充套件 (Ethiopic Extended)
// "\u2E00-\u2E7F"+//:追加標點 (Supplemental Punctuation)
// "\u2E80-\u2EFF"+//:CJK 部首補充 (CJK Radicals Supplement)
// "\u2F00-\u2FDF"+//:康熙字典部首 (Kangxi Radicals)
// "\u2FF0-\u2FFF"+//:表意文字描述符 (Ideographic Description Characters)
// "\u3000-\u303F"+//:CJK 符號和標點 (CJK Symbols and Punctuation)
// "\u3040-\u309F"+//:日文平假名 (Hiragana)
// "\u30A0-\u30FF"+//:日文片假名 (Katakana)
// "\u3100-\u312F"+//:注音字母 (Bopomofo)
// "\u3130-\u318F"+//:朝鮮文相容字母 (Hangul Compatibility Jamo)
// "\u3190-\u319F"+//:象形字註釋標誌 (Kanbun)
// "\u31A0-\u31BF"+//:注音字母擴充套件 (Bopomofo Extended)
// "\u31C0-\u31EF"+//:CJK 筆畫 (CJK Strokes)
// "\u31F0-\u31FF"+//:日文片假名語音擴充套件 (Katakana Phonetic Extensions)
// "\u3200-\u32FF"+//:封閉式 CJK 文字和月份 (Enclosed CJK Letters and Months)
// "\u3300-\u33FF"+//:CJK 相容 (CJK Compatibility)
// "\u3400-\u4DBF"+//:CJK 統一表意符號擴充套件 A (CJK Unified Ideographs Extension A)
// "\u4DC0-\u4DFF"+//:易經六十四卦符號 (Yijing Hexagrams Symbols)
// "\u4E00-\u9FBF"+//:CJK 統一表意符號 (CJK Unified Ideographs)
// "\uA000-\uA48F"+//:彝文音節 (Yi Syllables)
// "\uA490-\uA4CF"+//:彝文字根 (Yi Radicals)
// "\uA500-\uA61F"+//:Vai
// "\uA660-\uA6FF"+//:統一加拿大土著語音節補充 (Unified Canadian Aboriginal Syllabics Supplement)
// "\uA700-\uA71F"+//:聲調修飾字母 (Modifier Tone Letters)
// "\uA720-\uA7FF"+//:拉丁文擴充套件-D (Latin Extended-D)
// "\uA800-\uA82F"+//:Syloti Nagri
// "\uA840-\uA87F"+//:八思巴字 (Phags-pa)
// "\uA880-\uA8DF"+//:Saurashtra
// "\uA900-\uA97F"+//:爪哇語 (Javanese)
// "\uA980-\uA9DF"+//:Chakma
// "\uAA00-\uAA3F"+//:Varang Kshiti
// "\uAA40-\uAA6F"+//:Sorang Sompeng
// "\uAA80-\uAADF"+//:Newari
// "\uAB00-\uAB5F"+//:越南傣語 (Vi?t Thái)
// "\uAB80-\uABA0"+//:Kayah Li
// "\uAC00-\uD7AF"+//:朝鮮文音節 (Hangul Syllables)
// //"\uD800-\uDBFF"+//:High-half zone of UTF-16
// //"\uDC00-\uDFFF"+//:Low-half zone of UTF-16
// "\uE000-\uF8FF"+//:自行使用區域 (Private Use Zone)
// "\uF900-\uFAFF"+//:CJK 相容象形文字 (CJK Compatibility Ideographs)
// "\uFB00-\uFB4F"+//:字母表達形式 (Alphabetic Presentation Form)
// "\uFB50-\uFDFF"+//:阿拉伯表達形式A (Arabic Presentation Form-A)
// "\uFE00-\uFE0F"+//:變數選擇符 (Variation Selector)
// "\uFE10-\uFE1F"+//:豎排形式 (Vertical Forms)
// "\uFE20-\uFE2F"+//:組合用半符號 (Combining Half Marks)
// "\uFE30-\uFE4F"+//:CJK 相容形式 (CJK Compatibility Forms)
// "\uFE50-\uFE6F"+//:小型變體形式 (Small Form Variants)
// "\uFE70-\uFEFF"+//:阿拉伯表達形式B (Arabic Presentation Form-B)
// "\uFF00-\uFFEF"+//:半型及全型形式 (Halfwidth and Fullwidth Form)
// "\uFFF0-\uFFFF]";//:特殊 (Specials);
Pattern pattern= Pattern.compile(regEx);
Matcher matcher=pattern.matcher(sourceStr);
return matcher.replaceAll("");
}
}
字型檔檔案讀取工具類:
package xc.LEDILove.font;
import android.content.Context;
import android.content.res.AssetManager;
import android.util.Log;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.List;
import java.util.logging.Handler;
import xc.LEDILove.utils.ArabicUtils;
import xc.LEDILove.utils.Helpful;
import xc.LEDILove.utils.LangUtils;
/**
* craete by YuChang on 2017/3/6 09:34
* <p>
* 字型檔幫助類
* 目前支援 12 16
*/
public class FontUtils {
private Context context;
private boolean hasChinese = false;
private boolean hasJapanese = false;
private boolean hasKorean = false;
private boolean hasWestern = false;
//英文的 12點位高 佔12位元組寬8位,16點位高佔16位元組 寬8位
private int asciiwordByteByDots = 12;
/*
* 字型檔名
*/
public String dotMatrix