DEX檔案解析--7、類及其類資料解析(完結篇)
阿新 • • 發佈:2020-07-16
# 一、前言
** 前置技能連結:**
** [DEX檔案解析---1、dex檔案頭解析](https://www.cnblogs.com/aWxvdmVseXc0/p/11879093.html)**
** [DEX檔案解析---2、Dex檔案checksum(校驗和)解析](https://www.cnblogs.com/aWxvdmVseXc0/p/12008146.html)**
** [DEX檔案解析--3、dex檔案字串解析](https://www.cnblogs.com/aWxvdmVseXc0/p/12632624.html)**
** [DEX檔案解析--4、dex類的型別解析](https://www.cnblogs.com/aWxvdmVseXc0/p/12661142.html)**
** [DEX檔案解析--5、dex方法原型解析](https://www.cnblogs.com/aWxvdmVseXc0/p/12713171.html)**
** [DEX檔案解析--6、dex檔案欄位和方法定義解析](https://www.cnblogs.com/aWxvdmVseXc0/p/12727731.html)**
** PS:Dex檔案解析到現在,終於到了最重要也是結構最複雜的部分了,不瞭解前面的一些必要知識的,可以看我前面幾篇文章;這篇文章分析的dex樣本來自一個複雜apk的dex檔案,但是程式碼執行時使用的樣本是一個在網上找的很簡單的dex樣本,原因很簡單,分析使用的dex涉及的smali指令太多了,大概有200多個,挨個解析起來工作量太大了,有時間我會寫一個通用的python解析模組,完成了我會上傳到github倉庫,有興趣的完成後可以看一下,用簡單的dex只涉及到5個指令,程式碼寫起來就沒那麼麻煩了!!!(tips:Dex類資料這裡解析起來有種俄羅斯套娃的感覺,多看幾篇就很容易理解了。)**
** PS:這篇文章及其之前同系列的整合版(只是所有文章彙總在一起的整合版)都發在某公眾號上面了,名字就不說了,怕被認為打廣告,所以這不是抄襲哦!!!**
---
# 二、uleb128編碼
** PS:本來關於uleb128編碼網上一大堆,沒必要寫這個,但是網上的你抄我的我抄你的,能找的的相關資料基本都一樣。。。。或者乾脆貼個官方程式碼,官方程式碼的位運算寫的很巧妙,但是直接去看的化,反正我是沒看懂到底是怎麼解碼出來的。**
** uleb128編碼,是一種可變長度的編碼,長度大小為`1-5位元組`,uleb128通過位元組的最高位來決定是否用到下一個位元組,如果最高位為1,則用到下一個位元組,直到某個位元組最高位為0或已經讀取了5個位元組為止,接下來通過一個例項來理解uleb128編碼。**
** 假設有以下經過uleb128編碼的資料(都為16進位制)--`81 80 04`,首先來看第一個位元組`81`,他的二進位制為`10000001`,他的最高位為`1`,則說明還要用到下一個位元組,它存放的資料則為`0000001`;再來看第二個位元組`80`,它的二進位制為`10000000`,它的最高位為`1`,則說明還需要用到第三個位元組,存放的資料為`0000000`;再來看第三個位元組`04`,它的二進位制為`00000100`,最高位為`0`,說明一共使用了三個位元組,它存放的資料為`0000100`;通過上面的資料我們已經獲取了存放的資料,接下來就是把這些bit組合起來獲取解碼後的資料,dex檔案裡面的資料都是採用的小端序的方式,uleb128也不例外,在這三個位元組,也不例外,第三個位元組`04`存放的資料`0000100`作為解碼後的資料的`高7位`,第二個位元組`80`存放的資料`0000000`作為解碼後的資料的`中7位`,第一個位元組`81`存放的資料`0000001`作為解碼後的資料的`低7位`;那麼解碼後的資料二進位制則為`0000100 0000000 0000001`,轉換為16進位制則為`0x10001`。其他使用5個位元組、4個位元組照此類推即可,下面是python讀取uleb128的程式碼(ps:該程式碼是最終類資料解析程式碼的一共函式,無法單獨執行,僅供參考,採用的是官方提供的位運算演算法):**
def readuleb128(f,addr):
result = [-1,-1]
n = 0
f.seek(addr)
data = oneByte2Int(f.read(1))
if data > 0x7f:
f.seek(addr + 1)
n = 1
tmp = oneByte2Int(f.read(1))
data = (data & 0x7f) | ((tmp & 0x7f) << 7)
if tmp > 0x7f:
f.seek(addr + 2)
n = 2
tmp = oneByte2Int(f.read(1))
data |= (tmp & 0x7f) << 14
if tmp > 0x7f:
f.seek(addr + 3)
n = 3
tmp = oneByte2Int(f.read(1))
data |= (tmp & 0x7f) << 21
if tmp > 0x7f:
f.seek(addr + 4)
n = 4
tmp = oneByte2Int(f.read(1))
data |= tmp << 28
result[0] = data
result[1] = addr + n + 1
return result
---
# 三、類解析第一層結構:class_def_item
** 1、在dex檔案頭`0x60-0x63`這四個位元組,指明瞭`class`的數量,在`0x64-0x67`這四個位元組,指明的`class_def_item`的偏移地址。如下所示:**
![1.png](https://pic.liesio.com/2020/07/15/87729ec682ff1.png)
** 2、通過上面的偏移地址,我們可以找到class_def_item的起始地址,class_def_item包含了一個類的類名、介面、父類、所屬java檔名等資訊。一個class_def_item結構大小為32位元組,分別包含8個資訊,每個資訊大小為4位元組(小端序儲存):**
* `第1-4位元組--class_idx`(該值為前面解析出來的類的型別列表的索引,也就是這個類的類名);
* `第5-8位元組--access_flags`(類的訪問標誌,也就是這個類是public還是private等,這個通過官方的文件查表得知,具體演算法在最後面說明);
* `第9-12位元組--superclass_idx`(該值也為前面解析出來的類的型別列表的索引,指明瞭父類的類名)
* `第13-16位元組--interfaces_off`(該值指明瞭介面資訊的偏移地址,所指向的地址結構為typelist,前面的文章有說過,這裡不再多說,如果該類沒有介面,該值則為0)
* `第17-20位元組--source_file_idx`(該值為dex字串列表的的索引,指明瞭該類所在的java檔名)
* `第21-24位元組--annotations_off`(該值為註釋資訊的偏移地址,由於註釋資訊不是我要解析的重點,要檢視註釋資訊具體結構的可以參考官方文件,官方文件地址貼上在文末)
* `第25-28位元組--class_data_off`(該值是這個類資料第二層結構的偏移地址,在該結構中指明瞭該類的欄位和方法)
* `第29-32位元組--static_value_off`(該值也是一個偏移地址,指向了一個結構,不是重點,感興趣的參考官方文件,如果沒相關資訊,則該值為0)
** 具體分析過程,如下圖所示:**
![2.png](https://pic.liesio.com/2020/07/15/adec0b1eceb69.png)
---
# 四、類解析第二層結構:class_data_item
** 1、通過上面class_def_item的分析,我們知道了類的基本資訊,例如類名、父類等啊,接下來就是要找到類裡面的欄位和方法這些資訊,而這些資訊,在class_def_item裡面的class_data_off欄位給我們指明`class_data_item`就包含這些資訊並給出了偏移地址,即現在需要解析`class_data_iem`結構獲取欄位和方法資訊。(ps:以下的資料結構不做特別說明都為uleb128編碼格式)**
** 2、`class_data_item`結構包含以下資訊:**
* `第一個uleb128編碼--static_field_size`,指明瞭該類的靜態欄位的數量
* `第二個uleb128編碼--instance_field_size`,指明瞭該類的例項欄位的數量(例項欄位不知道是啥的建議百度)
* `第三個uleb128編碼--direct_method_size`,指明瞭該類的直接方法的個數
* `第四個uleb128編碼--virtual_method_size`,指明瞭該類的虛方法的個數(虛方法理解不清楚的建議百度一下)
* `encoded_field--static_fields`,該結構指明瞭具體的靜態欄位資訊,該結構的存在前提是`static_field_size > 0 `,該結構包含兩個uleb128編碼,第一個uleb128編碼為前面解析出來的欄位列表的索引,第二個uleb128編碼指明瞭該欄位的訪問標誌
* `encoded_field--instance_fields`,跟上面類似,不再多說,值得注意的是,該結構存在的前提是`instance_field_size > 0`
* `encoded_method--direct_methods`,該結構指明瞭直接方法具體資訊,該結構存在的前提同樣是`direct_method_size > 0`,該結構包含3個uleb128編碼,第一個uleb128為前面文章解析出來的方法原型列表的索引值,第二個uleb128編碼為該方法的訪問標誌,第三個uleb128為code_off,也就是該方法具體程式碼的位元組碼的偏移地址,對應的結構為code_item,code_item結構裡面包含了該方法內部的程式碼,這裡是位元組碼,也就是smali(ps:如果該方法為抽象方法,例如native方法,這時code_off對應的值為0,即該方法不存在具體程式碼)
* `encoded_method--virtual_methods`,該結構指明瞭該類的虛方法的具體資訊,存在前提為`virtual_method_size > 0`,具體結構和上面一樣,不再多說
** 具體分析過程,如下圖所示:**
![3.png](https://pic.liesio.com/2020/07/15/4574932fceed0.png)
---
# 五、類解析的第三層結構:code_item
** 1、在上面的class_data_item結構中的`encoded_method`結構的第三個uleb128編碼中,指出了一個類中的方法具體程式碼的偏移地址,也就是dv虛擬機器在執行該方法的具體指令的偏移地址,該值指向的地址結構為`code_item`,裡面包含了暫存器數量、具體指令等資訊,下面來分析一下該結構。**
** 2、`code_item`結構包含以下資訊:**
* `第1-2位元組--registers_size`,該值指明瞭該方法使用的暫存器數量,對應的smali語法中的`.register`的值
* `第3-4位元組--ins_size`,該值指明瞭傳入引數的個數
* `第5-6位元組--outs_size`,該值指明瞭該方法內部呼叫其他函式用到的暫存器個數
* `第7-8位元組--tries_size`,該值指明瞭該方法用到的`try-catch`語句的個數
* `第9-12位元組--debug_info_off`,該值指明瞭除錯資訊結構的偏移地址,如果不存在除錯資訊,則該值為0
* `第13-16位元組--insns_size`,該值指明瞭指令列表的大小,可以這麼理解:規定了指令所用的位元組數大小--`2 x insns_size`
* `ushort[insns_size]--insns`,這個是指令列表,包含了該方法所用到的指令的位元組,每個指令佔用的位元組數可以參考官方文件,這個沒什麼演算法,就是一個查表的過程,例如`invoke-direct`指令佔用6個位元組,`return-void`指令佔用2個位元組
* `2個位元組--padding`,該值存在的前提是`tries-size > 0`,作用用來對齊程式碼
* `try_item--tries`,該值存在的前提是`tries-size > 0`,作用是指明異常具體位置和處理方式,該結構不是解析重點,重點是解析指令,感興趣的檢視官方文件
* `encoded_catch_handler_list--handlers`,該結構存在前提為`tries-size > 0`,同樣不是解析重點,感興趣的檢視官方文件
** 具體分析過程,如下圖所示:**
![4.png](https://pic.liesio.com/2020/07/15/5e4743670f596.png)
![5.png](https://pic.liesio.com/2020/07/15/3a732f0942e47.png)
---
# 六、access_flags演算法
** access_flags訪問標誌具體值可以去檢視官方文件,下圖只截了一部分。如果access_flags的演算法為`access_flags = flag1 | flag2 | ...`,如果訪問標誌只有一共,直接查表即可,如果是兩個,按照演算法對比值即可,下面舉給=個例子來理解該演算法。**
![6.png](https://pic.liesio.com/2020/07/15/27eb48f94aad4.png)
** 例如我有一個類的訪問標誌為`public static`,經過查表得知`public`對應的值為`0x01`,`static`對應的值為`0x8`,那麼`public static`對應的訪問標誌為`0x01 | 0x08 = 0x9`,如果讀取出來的access_flags為0x09,那麼對應的訪問標誌則為`public static`,其餘的照此演算法計算即可!!!**
---
# 七、解析程式碼
** PS:程式碼執行環境推薦3.6及其以上,需要模組`binascii`,執行樣本為`Hello.dex`,樣本附在文末網盤連結中!!!**
**執行截圖**
![7.png](https://pic.liesio.com/2020/07/15/b301ac6a334d5.png)
**通過指令碼解析出來的和通過apktools反編譯出來的smali檔案對比圖**
**(ps:左側為apktools反編譯出來的,右側為指令碼解析出來的,可以發現基本差不多)**
![8.png](https://pic.liesio.com/2020/07/15/ce2ac2c65b706.png)
![9.png](https://pic.liesio.com/2020/07/15/aee3528612570.png)
**解析程式碼(ps:程式碼量有點多):**
'''
__----~~~~~~~~~~~------___
. . ~~//====...... __--~ ~~
-. \_|// |||\\ ~~~~~~::::... /~
___-==_ _-~o~ \/ ||| \\ _/~~-
__---~~~.==~||\=_ -_--~/_-~|- |\\ \\ _/~
_-~~ .=~ | \\-_ '-~7 /- / || \ /
.~ .~ | \\ -_ / /- / || \ /
/ ____ / | \\ ~-_/ /|- _/ .|| \ /
|~~ ~~|--~~~~--_ \ ~==-/ | \~--===~~ .\
' ~-| /| |-~\~~ __--~~
|-~~-_/ | | ~\_ _-~ /\
/ \ \__ \/~ \__
_--~ _/ | .-~~____--~-/ ~~==.
((->/~ '.|||' -_| ~~-/ , . _||
-_ ~\ ~~---l__i__i__i--~~_/
_-~-__ ~) \--______________--~~
//.-~~~-~_--~- |-------~~~~~~~~
//.-~~~--\
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
神獸保佑 永無BUG
@Author: windy_ll
@Date: 2020-07-08 16:21:27
@LastEditors: windy_ll
@LastEditTime: 2020-07-14 23:45:28
@Description: file content
'''
import binascii
import re
import os
import sys
def byte2int(bs):
tmp = bytearray(bs)
tmp.reverse()
rl = bytes(tmp)
rl = str(binascii.b2a_hex(rl),encoding='UTF-8')
rl = int(rl,16)
return rl
def oneByte2Int(bs):
num = str(binascii.b2a_hex(bs),encoding='UTF-8')
num = int(num,16)
return num
def getSmaliName(oldname):
newname = ''
tmpname = oldname.split('.')
newname = str(os.path.join(sys.path[0])) + '\\smali\\' + str(tmpname[0]) + '.smali'
return newname
def readuleb128(f,addr):
result = [-1,-1]
n = 0
f.seek(addr)
data = oneByte2Int(f.read(1))
if data > 0x7f:
f.seek(addr + 1)
n = 1
tmp = oneByte2Int(f.read(1))
data = (data & 0x7f) | ((tmp & 0x7f) << 7)
if tmp > 0x7f:
f.seek(addr + 2)
n = 2
tmp = oneByte2Int(f.read(1))
data |= (tmp & 0x7f) << 14
if tmp > 0x7f:
f.seek(addr + 3)
n = 3
tmp = oneByte2Int(f.read(1))
data |= (tmp & 0x7f) << 21
if tmp > 0x7f:
f.seek(addr + 4)
n = 4
tmp = oneByte2Int(f.read(1))
data |= tmp << 28
result[0] = data
result[1] = addr + n + 1
return result
def getAccessFlags(flag):
accessFlag = ''
flagList = [0x01,0x02,0x04,0x08,0x10,0x20,0x40,0x80,0x100,0x200,0x400,0x800,0x2000,0x4000,0x10000]
flagdict = {0x01:'public',0x02:'private',0x04:'protected',0x08:'static',0x10:'final',0x20:'synchronized',0x40:'volatile',0x80:'transient',0x100:'native',\
0x200:'interface',0x400:'abstract',0x800:'strictfp',0x2000:'annotayion',0x4000:'enum',0x10000:'constructor'}
if flag == 0x1:
accessFlag = 'public'
elif flag == 0x2:
accessFlag = 'private'
elif flag == 0x4:
accessFlag = 'protected'
elif flag == 0x8:
accessFlag = 'static'
elif flag == 0x10:
accessFlag = 'final'
elif flag == 0x20:
accessFlag = 'synchronized'
elif flag == 0x40:
accessFlag = 'volatile'
elif flag == 0x80:
accessFlag = 'transient'
elif flag == 0x100:
accessFlag = 'native'
elif flag == 0x200:
accessFlag = 'interface'
elif flag == 0x400:
accessFlag = 'abstract'
elif flag == 0x800:
accessFlag = 'strictfp'
elif flag == 0x2000:
accessFlag = flagdict[0x2000]
elif flag == 0x4000:
accessFlag = flagdict[0x4000]
elif flag == 0x10000:
accessFlag = flagdict[0x10000]
else:
mark = 0
for k in range(14):
if mark == 1:
break
for item in flagList[(k + 1):]:
if flag == (flagList[k] | item):
idx1 = flagList[k]
idx2 = item
accessFlag = flagdict[idx1] + ' ' + flagdict[idx2]
mark = 1
break
return accessFlag
def parseTypeList(f,addr,tList):
paramList = []
f.seek(addr)
size = byte2int(f.read(4))
if size == 0:
return paramList
else:
addr = addr + 4
for k in range(size):
f.seek(addr + (k * 2))
paramString = typeList[byte2int(f.read(2))]
paramList.append(paramString)
return paramList
def getStringByteArr(f,addr):
byteArr = bytearray()
f.seek(addr + 1)
b = f.read(1)
b = str(binascii.b2a_hex(b),encoding='UTF-8')
b = int(b,16)
index = 2
while b != 0:
byteArr.append(b)
f.seek(addr + index)
b = f.read(1)
b = str(binascii.b2a_hex(b),encoding='UTF-8')
b = int(b,16)
index = index + 1
return byteArr
def BytesToString(byteArr):
try:
bs = bytes(byteArr)
stringItem = str(bs,encoding='UTF-8')
return stringItem
except:
pass
def getTypeAmount(f):
f.seek(0x40)
stringsId = f.read(4)
count = byte2int(stringsId)
return count
def getclassCount(f):
f.seek(0x60)
class_num = f.read(4)
class_size = byte2int(class_num)
return class_size
def getStringsCount(f):
f.seek(0x38)
stringsId = f.read(4)
count = byte2int(stringsId)
return count
def getStrings(f,stringAmount):
stringsList = []
f.seek(0x3c)
stringOff = f.read(4)
Off = byte2int(stringOff)
f.seek(Off)
for i in range(stringAmount):
addr = f.read(4)
address = byte2int(addr)
byteArr = getStringByteArr(f,address)
stringItem = BytesToString(byteArr)
stringsList.append(stringItem)
Off = Off + 4
f.seek(Off)
return stringsList
def getTypeItem(f,count,strLists):
typeList = []
f.seek(0x44)
type_ids_off = f.read(4)
type_off = byte2int(type_ids_off)
f.seek(type_off)
for i in range(count):
typeIndex = f.read(4)
typeIndex = byte2int(typeIndex)
typeList.append(strLists[typeIndex])
type_off = type_off + 0x04
f.seek(type_off)
return typeList
def parserField(f,stringList,typelist):
fieldList = []
f.seek(0x50)
fieldSize = byte2int(f.read(4))
fieldAddr = byte2int(f.read(4))
for i in range(fieldSize):
fieldStr = ''
f.seek(fieldAddr)
classIdx = typelist[byte2int(f.read(2))]
f.seek(fieldAddr + 2)
typeIdx = typelist[byte2int(f.read(2))]
f.seek(fieldAddr + 4)
nameIdx = stringList[byte2int(f.read(4))]
fieldAddr += 8
fieldStr = nameIdx + ':' + typeIdx
fieldList.append(fieldStr)
return fieldList
def parseProtold(f,typeList,stringList):
pList = []
f.seek(0x48)
protoldSizeTmp = f.read(4)
protoldSize = byte2int(protoldSizeTmp)
f.seek(0x4c)
protoldAddr = byte2int(f.read(4))
for i in range(protoldSize):
f.seek(protoldAddr)
AllString = stringList[byte2int(f.read(4))]
protoldAddr += 4
f.seek(protoldAddr)
returnString = typeList[byte2int(f.read(4))]
protoldAddr += 4
f.seek(protoldAddr)
paramAddr = byte2int(f.read(4))
if paramAddr == 0:
protoldAddr += 4
pList.append(returnString + '()')
continue
f.seek(paramAddr)
paramSize = byte2int(f.read(4))
paramList = []
if paramSize == 0:
pass
else:
paramAddr = paramAddr + 4
for k in range(paramSize):
f.seek(paramAddr + (k * 2))
paramString = typeList[byte2int(f.read(2))]
paramList.append(paramString)
protoldAddr += 4
paramTmp = []
for paramItem in paramList:
paramTmp.append(paramItem)
param = returnString + '(' + ','.join(paramTmp) + ')'
pList.append(param)
return pList
def parserMethod(f,stringlist,typelist,protoldlist):
methodlist = []
f.seek(0x58)
methodSize = byte2int(f.read(4))
f.seek(0x5c)
methodAddr = byte2int(f.read(4))
for i in range(methodSize):
f.seek(methodAddr)
classIdx = typelist[byte2int(f.read(2))]
f.seek(methodAddr + 2)
protoldIdx = protoldlist[byte2int(f.read(2))]
f.seek(methodAddr + 4)
nameIdx = stringlist[byte2int(f.read(4))]
tmp = protoldIdx.split('(',1)
methodItem = nameIdx + '(' + str(tmp[1]) + str(tmp[0])
methodlist.append(methodItem)
methodAddr += 8
return methodlist
def parseBytecode(f,addr,bytecount,stringsList,fieldsList,methodsList):
codestr = ''
n = 0
while True:
f.seek(addr)
op = byte2int(f.read(1))
if op == 0x0e:
codestr += '\treturn-void\r\n'
addr = addr + 2
n += 2
elif op == 0x1a:
f.seek(addr + 1)
register = oneByte2Int(f.read(1))
f.seek(addr + 2)
idx = byte2int(f.read(2))
stringIdx = stringsList[idx]
re.sub("[\n]","",stringIdx)
re.sub("[\r]","",stringIdx)
codestr += '\tconst-string v' + str(register) + ', "' + stringIdx + '"\r\n'
addr = addr + 4
n += 4
elif op == 0x62:
f.seek(addr + 1)
register = oneByte2Int(f.read(1))
f.seek(addr + 2)
idx = byte2int(f.read(2))
codestr += '\tset-object v' + str(register) + ', ' + fieldsList[idx] + '\r\n'
addr = addr + 4
n += 4
elif op == 0x70 or op == 0x6e:
f.seek(addr + 1)
data = oneByte2Int(f.read(1))
f.seek(addr + 4)
data1 = oneByte2Int(f.read(1))
f.seek(addr + 5)
data2 = oneByte2Int(f.read(1))
f.seek(addr + 2)
idx = byte2int(f.read(2))
registerNum = (data & 0xf0) >> 4
register = ''
if registerNum == 1:
register_1 = data & 0xf
register = '{v' + str(register_1) + '}, '
elif registerNum == 2:
register_1 = data & 0xf
register_2 = (data1 & 0xf0) >> 4
register = '{v' + str(register_1) + ', v' + str(register_2) + '}, '
elif registerNum == 3:
register_1 = data & 0xf
register_2 = (data1 & 0xf0) >> 4
register_3 = data1 & 0xf
register = '{v' + str(register_1) + ', v' + str(register_2) + ', v' + str(register_3) + '}, '
elif registerNum == 4:
register_1 = data & 0xf
register_2 = (data1 & 0xf0) >> 4
register_3 = data1 & 0xf
register_4 = (data2 & 0xf0) >> 4
register = '{v' + str(register_1) + ', v' + str(register_2) + ', v' + str(register_3) + ', v' + str(register_4) + '}, '
else:
register_1 = data & 0xf
register_2 = (data1 & 0xf0) >> 4
register_3 = data1 & 0xf
register_4 = (data2 & 0xf0) >> 4
register_5 = data2 & 0xf
register = '{v' + str(register_1) + ', v' + str(register_2) + ', v' + str(register_3) + ', v' + str(register_4) + ', v' + str(register_5) + '}, '
if op == 0x70:
codestr += '\tinvoke-direct ' + register + methodsList[idx] + '\r\n'
else:
codestr += '\tinvoke-virtual ' + register + methodsList[idx] + '\r\n'
addr = addr + 6
n += 6
else:
pass
if n == bytecount:
break
return codestr
def parseCode(f,addr,fn,slist,flist,mlist):
f.seek(addr)
register_size = byte2int(f.read(2))
f.seek(addr + 2)
ins_size = byte2int(f.read(2))
f.seek(addr + 4)
out_size = byte2int(f.read(2))
f.seek(addr + 6)
try_size = byte2int(f.read(2))
f.seek(addr + 8)
debug_off = byte2int(f.read(4))
f.seek(addr + 12)
insns_size = byte2int(f.read(4))
address = addr + 16
bytecount = insns_size * 2
registerString = '\t.register ' + str(register_size) + '\r\n'
fn.write(registerString)
codestr = parseBytecode(f,address,bytecount,slist,flist,mlist)
fn.write(codestr)
endstr = '.end method\r\n'
fn.write(endstr)
def parseClassData(f,addr,fn,fList,mList,strsList):
re = readuleb128(f,addr)
static_fields_size = re[0]
address = re[1]
re = readuleb128(f,address)
instance_fields_size = re[0]
address = re[1]
re = readuleb128(f,address)
direct_method_size = re[0]
address = re[1]
re = readuleb128(f,address)
virtual_method_size = re[0]
address = re[1]
fieldStr = ''
if static_fields_size != 0:
fieldStr += '# static fields\r\n'
for i in range(static_fields_size):
re = readuleb128(f,address)
fieldidx = re[0]
address = re[1]
re = readuleb128(f,address)
accflag = re[0]
address = re[1]
fieldStr += '.field ' + getAccessFlags(accflag) + ' ' + fList[fieldidx] + '\r\n'
fieldStr += '\r\n\r\n'
fn.write(fieldStr)
fieldStr = ''
if instance_fields_size != 0:
fieldStr += '# instance fields\r\n'
for i in range(instance_fields_size):
re = readuleb128(f,address)
fieldidx = re[0]
address = re[1]
re = readuleb128(f,address)
accflag = re[0]
address = re[1]
fieldStr += '.field ' + getAccessFlags(accflag) + ' ' + fList[fieldidx] + '\r\n'
fieldStr += '\r\n\r\n'
fn.write(fieldStr)
methodStr = ''
if direct_method_size != 0:
methodStr += '# direct methods\r\n'
fn.write(methodStr)
for i in range(direct_method_size):
re = readuleb128(f,address)
methodidx = re[0]
address = re[1]
re = readuleb128(f,address)
accflag = re[0]
address = re[1]
re = readuleb128(f,address)
code_off = re[0]
address = re[1]
methodStr = '.method ' + getAccessFlags(accflag) + ' ' + mList[methodidx] + '\r\n'
fn.write(methodStr)
parseCode(f,code_off,fn,strsList,fList,mList)
methodStr = '\r\n\r\n'
fn.write(methodStr)
methodStr = ''
if virtual_method_size != 0:
methodStr = '# virtual methods\r\n'
fn.write(methodStr)
for i in range(virtual_method_size):
re = readuleb128(f,address)
methodidx = re[0]
address = re[1]
re = readuleb128(f,address)
accflag = re[0]
address = re[1]
re = readuleb128(f,address)
code_off = re[0]
address = re[1]
methodStr = '.method ' + getAccessFlags(accflag) + ' ' + mList[methodidx] + '\r\n'
fn.write(methodStr)
parseCode(f,code_off,fn,strsList,fList,mList)
methodStr = '\r\n\r\n'
fn.write(methodStr)
def parseClassDefItem(f,class_num,tList,sList,fieldlist,methodlist):
f.seek(0x64)
addr = byte2int(f.read(4))
for i in range(class_num):
f.seek(addr)
classIdx = tList[byte2int(f.read(4))]
f.seek(addr + 4)
accessFlags = getAccessFlags(byte2int(f.read(4)))
if accessFlags != 'error':
pass
f.seek(addr + 8)
superclass_idx = tList[byte2int(f.read(4))]
f.seek(addr + 12)
interfaces_off = byte2int(f.read(4))
if interfaces_off == 0:
pass
else:
parseTypeList(f,interfaces_off,tList)
f.seek(addr + 16)
sourceFileIdx = sList[byte2int(f.read(4))]
f.seek(addr + 20)
annotions_off = byte2int(f.read(4))
address = 0
f.seek(addr + 24)
class_data_off = byte2int(f.read(4))
f.seek(addr + 28)
static_value_off = byte2int(f.read(4))
fname = getSmaliName(sourceFileIdx)
fn = open(fname,'a+',True)
headstr = '.class ' + str(accessFlags) + ' ' + str(classIdx) + '\r\n'
headstr += '.super ' + str(superclass_idx) + '\r\n'
headstr += '.source ' + '"' + str(sourceFileIdx) + '"\r\n\r\n'
fn.write(headstr)
if class_data_off != 0:
parseClassData(f,class_data_off,fn,fieldlist,methodlist,sList)
fn.close()
print('[*] %s檔案的類%s寫入完畢!'%(fname,classIdx))
addr += 32
if __name__ == '__main__':
filename = str(os.path.join(sys.path[0])) + '\\Hello.dex'
dir = str(os.path.join(sys.path[0])) + '\\smali'
if not os.path.exists(dir):
os.makedirs(dir)
f = open(filename,'rb',True)
stringsCount = getStringsCount(f)
strList = getStrings(f,stringsCount)
typeCount = getTypeAmount(f)
typeList = getTypeItem(f,typeCount,strList)
fieldList = parserField(f,strList,typeList)
protoldList = parseProtold(f,typeList,strList)
methodList = parserMethod(f,strList,typeList,protoldList)
classNum = getclassCount(f)
parseClassDefItem(f,classNum,typeList,strList,fieldList,methodList)
f.close()
---
# 八、參考資料以及樣本下載
**參考資料:**
** 1、Android逆向之旅—解析編譯之後的Dex檔案格式:[http://www.520monkey.com/archives/579](http://www.520monkey.com/archives/579)**
** 2、一篇文章帶你搞懂DEX檔案的結構:[https://blog.csdn.net/sinat_18268881/article/details/55832757](https://blog.csdn.net/sinat_18268881/article/details/55832757)**
** 3、官方文件:[https://source.android.google.cn/devices/tech/dalvik/dex-format#embedded-in-class_def_item,-encoded_field,-encoded_method,-and-innerclass](https://source.android.google.cn/devices/tech/dalvik/dex-format#embedded-in-class_def_item,-encoded_field,-encoded_method,-and-innerclass)**
**樣本及程式碼下載:**
**藍奏雲連結:[https://wws.lanzous.com/iG8Cuemlw4d](https://wws.lanzous.com/iG8Cuemlw4d);密碼:chb6**
**github連結:[https://github.com/windy-purple/parserDex](https://github.com/windy-purple/pars