MonoTouch 二三事(三)mono mkbundle 打包程式的解包支援
許久以後,這個續上這個系列的第三篇。
玩過mono的可能知道mono有一個工具mkbundle ,可以把mono的執行時與類庫與你的程式的依賴程式集都打包成一個可執行檔案,在win下為exe檔案,例如mandroid.exe,mtouch.exe,在mac下的Mach-O檔案,例如mtouch,mtouch-64。
根據他的原始碼 https://github.com/mono/mono/tree/master/mcs/tools/mkbundle,我們得到:
template_main.c
int main (int argc, char* argv[]) { char **newargs;int i, k = 0; newargs = (char **) malloc (sizeof (char *) * (argc + 2 + count_mono_options_args ())); newargs [k++] = argv [0]; if (mono_options != NULL) { i = 0; while (mono_options[i] != NULL) newargs[k++] = mono_options[i++]; } newargs [k++] = image_name; for (i = 1; i < argc; i++) { newargs [k++] = argv [i]; } newargs [k] = NULL; if (config_dir != NULL && getenv ("MONO_CFG_DIR") == NULL) mono_set_dirs (getenv ("MONO_PATH"), config_dir); mono_mkbundle_init();return mono_main (k, newargs); }
看呼叫了函式mono_mkbundle_init,而這個函式有兩個實現,分別位於:
https://github.com/mono/mono/blob/master/mcs/tools/mkbundle/template.c
和
https://github.com/mono/mono/blob/master/mcs/tools/mkbundle/template_z.c
工具根據執行選項 -z 是否壓縮程式集選擇使用template.c或template_z.c內的mono_mkbundle_init函式實現,我們使用時通常是選擇壓縮的,所以通常使用的是後者的實現。
看 https://github.com/mono/mono/blob/master/mcs/tools/mkbundle/template_z.c,:
void mono_mkbundle_init () { CompressedAssembly **ptr; MonoBundledAssembly **bundled_ptr; Bytef *buffer; int nbundles; install_dll_config_files (); ptr = (CompressedAssembly **) compressed; nbundles = 0; while (*ptr++ != NULL) nbundles++; bundled = (MonoBundledAssembly **) malloc (sizeof (MonoBundledAssembly *) * (nbundles + 1)); bundled_ptr = bundled; ptr = (CompressedAssembly **) compressed; while (*ptr != NULL) { uLong real_size; uLongf zsize; int result; MonoBundledAssembly *current; real_size = (*ptr)->assembly.size; zsize = (*ptr)->compressed_size; buffer = (Bytef *) malloc (real_size); result = my_inflate ((*ptr)->assembly.data, zsize, buffer, real_size); if (result != 0) { fprintf (stderr, "mkbundle: Error %d decompressing data for %s\n", result, (*ptr)->assembly.name); exit (1); } (*ptr)->assembly.data = buffer; current = (MonoBundledAssembly *) malloc (sizeof (MonoBundledAssembly)); memcpy (current, *ptr, sizeof (MonoBundledAssembly)); current->name = (*ptr)->assembly.name; *bundled_ptr = current; bundled_ptr++; ptr++; } *bundled_ptr = NULL; mono_register_bundled_assemblies((const MonoBundledAssembly **) bundled); }
我們看到解壓時使用了compressed這個本檔案未定義的變數。通過工具原始碼我們得知其是一個型別為如下結構體指標的陣列:
typedef struct { const char *name; const unsigned char *data; const unsigned int size; } MonoBundledAssembly; typedef struct _compressed_data { MonoBundledAssembly assembly; int compressed_size; } CompressedAssembly;
也就是說我們找到被打包後的程式的函式mono_mkbundle_init ,並找到對compressed這個資料的引用操作,就可以找到一個程式集個數的int32(64位打包target為int64)陣列,每個陣列為一個指向CompressedAssembly結構體的指標。(不好描述,繼續看我給的程式碼吧~)
因為compressed指向的是常量資料,一般位於執行檔案的類似名為.data或.const等段。
因為被打包後的程式如 mandroid.exe 往往無任何符號,定位mono_mkbundle_init 以及 compressed並不容易,往往需要靠人工判斷,這個想自動化完成。通過對各個版本的Xa*****程式集分析得到結果是,再無c語言級別上的程式碼大改動的情況下,同一語句生成的彙編的對資料引用的偏移量可能會變更,但如果不看資料引用的話,彙編語句的語義序列以及順序往往固定,也就是說我們可以根據此特徵定位位於函式mono_mkbundle_init 內對compressed變數引用時compressed變數在可執行檔案的虛擬地址(VA)。
下面我們就得請出偉大的洩漏版IDA Pro 6.5 (沒有的自己百度吧~pediy的資源區有)。
我們得知函式內有常量 [mkbundle: Error %d decompressing data for %s\n]這個字串(根據win或mac的編譯器不同,前面的mkbundle: 有時會沒有),而往往整個程式只有一個函式對此有引用,由此我們得到mono_mkbundle_init 函式,這個通過IDAPython指令碼可以得到,然後找到函式內第一次對資料段的引用這個引用的就是compressed變數,上程式碼:
#!/usr/bin/env python # coding=gbk # 支援 mtouch mtouch-64 mtouch.exe mandroid.exe 解包 # 用IDA開啟待分析檔案,等待分析完畢,執行此指令碼,將會在待分析檔案同目錄下生成臨時資料夾並解壓檔案
# by BinSys
import urllib2, httplib import zlib import StringIO, gzip import struct import io import sys import idaapi import idc import idautils from struct import * import time import datetime from datetime import datetime, date, time InputFileType_EXE = 11 InputFileType_MachO = 25 InputFileType = -1 Is64Bit = False string_type_map = { 0 : "ASCSTR_C", # C-string, zero terminated 1 : "ASCSTR_PASCAL", # Pascal-style ASCII string (length byte) 2 : "ASCSTR_LEN2", # Pascal-style, length is 2 bytes 3 : "ASCSTR_UNICODE", # Unicode string 4 : "ASCSTR_LEN4", # Delphi string, length is 4 bytes 5 : "ASCSTR_ULEN2", # Pascal-style Unicode, length is 2 bytes 6 : "ASCSTR_ULEN4", # Pascal-style Unicode, length is 4 bytes } filetype_t_map = { 0 : "f_EXE_old", # MS DOS EXE File 1 : "f_COM_old", # MS DOS COM File 2 : "f_BIN", # Binary File 3 : "f_DRV", # MS DOS Driver 4 : "f_WIN", # New Executable (NE) 5 : "f_HEX", # Intel Hex Object File 6 : "f_MEX", # MOS Technology Hex Object File 7 : "f_LX", # Linear Executable (LX) 8 : "f_LE", # Linear Executable (LE) 9 : "f_NLM", # Netware Loadable Module (NLM) 10 : "f_COFF", # Common Object File Format (COFF) 11 : "f_PE", # Portable Executable (PE) 12 : "f_OMF", # Object Module Format 13 : "f_SREC", # R-records 14 : "f_ZIP", # ZIP file (this file is never loaded to IDA database) 15 : "f_OMFLIB", # Library of OMF Modules 16 : "f_AR", # ar library 17 : "f_LOADER", # file is loaded using LOADER DLL 18 : "f_ELF", # Executable and Linkable Format (ELF) 19 : "f_W32RUN", # Watcom DOS32 Extender (W32RUN) 20 : "f_AOUT", # Linux a.out (AOUT) 21 : "f_PRC", # PalmPilot program file 22 : "f_EXE", # MS DOS EXE File 23 : "f_COM", # MS DOS COM File 24 : "f_AIXAR", # AIX ar library 25 : "f_MACHO", # Max OS X } def FindStringEA(): searchstr = str("mkbundle: Error %d decompressing data for %s\n") searchstr2 = str("Error %d decompresing data for %s\n") #Do not use default set up, we'll call setup(). s = idautils.Strings(default_setup = False) # we want C & Unicode strings, and *only* existing strings. s.setup(strtypes=Strings.STR_C | Strings.STR_UNICODE, ignore_instructions = True, display_only_existing_strings = True) #loop through strings for i, v in enumerate(s): if not v: #print("Failed to retrieve string at index {}".format(i)) return -1 else: #print("[{}] ea: {:#x} ; length: {}; type: {}; '{}'".format(i, v.ea, v.length, string_type_map.get(v.type, None), str(v))) if str(v) == searchstr or str(v) == searchstr2: return v.ea return -1 def FindUnFunction(StringEA): for ref in DataRefsTo(StringEA): f = idaapi.get_func(ref) if f: return f return None def FindDataOffset(FuncEA): for funcitem in FuncItems(FuncEA): #print hex(funcitem) for dataref in DataRefsFrom(funcitem): return dataref #print " " + hex(dataref) return None def GetStructOffsetList(DataOffset): global Is64Bit if Is64Bit == True: addv = 8 mf=MakeQword vf=Qword else: mf=MakeDword addv = 4 vf=Dword AsmListStructListOffset = DataOffset currentoffset = AsmListStructListOffset mf(currentoffset) currentvalue = vf(currentoffset) currentoffset+=addv AsmListStructListOffsetList = [] AsmListStructListOffsetList.append(currentvalue) while currentvalue!= 0: mf(currentoffset) currentvalue = vf(currentoffset) if currentvalue!=0: AsmListStructListOffsetList.append(currentvalue) currentoffset+=addv return AsmListStructListOffsetList #print len(AsmListStructListOffsetList) #for vv in AsmListStructListOffsetList: #print hex(vv) def MakeFileItemStruct(FileItemStructOffset): global Is64Bit if Is64Bit == True: addv = 8 mf=MakeQword vf=Qword else: mf=MakeDword addv = 4 vf=Dword offset = FileItemStructOffset mf(offset) FileNameOffset = vf(offset) FileName = idc.GetString(FileNameOffset) offset+=addv mf(offset) FileDataOffset = vf(offset) offset+=addv mf(offset) FileSize = vf(offset) FileSizeOffset = offset offset+=addv mf(offset) FileCompressedSize = vf(offset) FileCompressedSizeOffset = offset offset+=addv IsGZip = 0 FileDataCompressed = idc.GetManyBytes(FileDataOffset,FileCompressedSize) b1,b2,b3 = struct.unpack('ccc', FileDataCompressed[0:3]) if b1 == '\x1f' and b2 == '\x8b' and b3 == '\x08': IsGZip = 1 else: IsGZip = 0 return {\ "FileItemStructOffset":FileItemStructOffset, \ "FileNameOffset":FileNameOffset,\ "FileName":FileName,\ "FileDataOffset":FileDataOffset,\ "FileSize":FileSize,\ "FileSizeOffset":FileSizeOffset,\ "FileCompressedSizeOffset":FileCompressedSizeOffset,\ "FileCompressedSize":FileCompressedSize,\ "IsGZip":IsGZip,\ "FileDataCompressed":FileDataCompressed\ } #Python語言: Python Cookbook: 比系統自帶的更加友好的makedir函式 #from: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/82465 def _mkdir(newdir): """works the way a good mkdir should :) - already exists, silently complete - regular file in the way, raise an exception - parent directory(ies) does not exist, make them as well """ if os.path.isdir(newdir): pass elif os.path.isfile(newdir): raise OSError("a file with the same name as the desired " \ "dir, '%s', already exists." % newdir) else: head, tail = os.path.split(newdir) if head and not os.path.isdir(head): _mkdir(head) #print "_mkdir %s" % repr(newdir) if tail: os.mkdir(newdir) def DecompressZLib(Data,Path): #compressedstream = StringIO.StringIO(Data) data2 = zlib.decompress(Data) f = open(Path, 'wb') f.write(data2) f.close() pass def DecompressGzipTo(Data,Path): compressedstream = StringIO.StringIO(Data) gziper = gzip.GzipFile(fileobj=compressedstream) data2 = gziper.read() # 讀取解壓縮後資料 f = open(Path, 'wb') f.write(data2) f.close() pass def DecompressFileTo(FileItem,OutputDir): newpath = '{}\\{}'.format(OutputDir, FileItem["FileName"]) #print newpath if FileItem["IsGZip"] == 1: DecompressGzipTo(FileItem["FileDataCompressed"],newpath) pass else: DecompressZLib(FileItem["FileDataCompressed"],newpath) pass pass def main(): global Is64Bit global InputFileType print("Input File:{}".format(GetInputFile())) print("Input File Path:{}".format(GetInputFilePath())) print("Idb File Path:{}".format(GetIdbPath())) print("cpu_name:{}".format(idc.GetShortPrm(idc.INF_PROCNAME).lower())) InputFileType = idc.GetShortPrm(idc.INF_FILETYPE) #ida.hpp filetype_t f_PE=11 f_MACHO=25 print("InputFileType:{}".format(filetype_t_map.get(InputFileType, None))) if InputFileType != InputFileType_EXE and InputFileType != InputFileType_MachO: print "Error,Input file type must is PE or MachO!" return if (idc.GetShortPrm(idc.INF_LFLAGS) & idc.LFLG_64BIT) == idc.LFLG_64BIT: Is64Bit = True else: Is64Bit = False print("Is64Bit:{}".format(Is64Bit)) OutputDir = '{}_{:%Y%m%d%H%M%S%f}'.format(GetInputFilePath(), datetime.now()) _mkdir(OutputDir) print("OutputDir:{}".format(OutputDir)) StringEA = FindStringEA() if StringEA == -1: print "Can't find StringEA!" return Func = FindUnFunction(StringEA) if not Func: print "Can't find Func!" return FuncName = idc.GetFunctionName(Func.startEA) print "Found Data Function:" + FuncName DataOffset = FindDataOffset(Func.startEA) if not DataOffset: print "Can't find DataOffset!" return print("DataOffset:0x{:016X}".format(DataOffset)); StructOffsetList = GetStructOffsetList(DataOffset) if len(StructOffsetList) == 0: print "Can't find StructOffsetList!" return FileItems = [] for StructOffsetItem in StructOffsetList: FileItemStruct = MakeFileItemStruct(StructOffsetItem) FileItems.append(FileItemStruct) for FileItem in FileItems: print("FileItemStructOffset:{:016X} FileNameOffset:{:016X} FileDataOffset:{:016X} FileSize:{:016X} FileCompressedSize:{:016X} IsGZip:{} FileName:{}"\ .format( \ FileItem["FileItemStructOffset"] , \ FileItem["FileNameOffset"],\ FileItem["FileDataOffset"],\ FileItem["FileSize"],\ FileItem["FileCompressedSize"],\ FileItem["IsGZip"],\ FileItem["FileName"])) DecompressFileTo(FileItem,OutputDir) if __name__ == "__main__": main()
被壓縮的資料有兩種格式,新版和舊版不一樣,根據資料的頭部幾個位元組可以判斷壓縮格式。