MonoTouch 二三事（三）mono mkbundle 打包程式的解包支援

阿新 • • 發佈：2018-12-27

許久以後，這個續上這個系列的第三篇。

玩過mono的可能知道mono有一個工具mkbundle ，可以把mono的執行時與類庫與你的程式的依賴程式集都打包成一個可執行檔案，在win下為exe檔案，例如mandroid.exe，mtouch.exe，在mac下的Mach-O檔案，例如mtouch，mtouch-64。

根據他的原始碼 https://github.com/mono/mono/tree/master/mcs/tools/mkbundle，我們得到：

template_main.c

int main (int argc, char* argv[])
{
    char **newargs;
     
int i, k = 0;

    newargs = (char **) malloc (sizeof (char *) * (argc + 2 + count_mono_options_args ()));

    newargs [k++] = argv [0];

    if (mono_options != NULL) {
        i = 0;
        while (mono_options[i] != NULL)
            newargs[k++] = mono_options[i++];
    }

    newargs [k 
++] = image_name;

    for (i = 1; i < argc; i++) {
        newargs [k++] = argv [i];
    }
    newargs [k] = NULL;
    
    if (config_dir != NULL && getenv ("MONO_CFG_DIR") == NULL)
        mono_set_dirs (getenv ("MONO_PATH"), config_dir);
    
    mono_mkbundle_init();

     
return mono_main (k, newargs);
}

看呼叫了函式mono_mkbundle_init，而這個函式有兩個實現，分別位於：

https://github.com/mono/mono/blob/master/mcs/tools/mkbundle/template.c

和

https://github.com/mono/mono/blob/master/mcs/tools/mkbundle/template_z.c

工具根據執行選項 -z 是否壓縮程式集選擇使用template.c或template_z.c內的mono_mkbundle_init函式實現，我們使用時通常是選擇壓縮的，所以通常使用的是後者的實現。

看 https://github.com/mono/mono/blob/master/mcs/tools/mkbundle/template_z.c，：

void mono_mkbundle_init ()
{
    CompressedAssembly **ptr;
    MonoBundledAssembly **bundled_ptr;
    Bytef *buffer;
    int nbundles;

    install_dll_config_files ();

    ptr = (CompressedAssembly **) compressed;
    nbundles = 0;
    while (*ptr++ != NULL)
        nbundles++;

    bundled = (MonoBundledAssembly **) malloc (sizeof (MonoBundledAssembly *) * (nbundles + 1));
    bundled_ptr = bundled;
    ptr = (CompressedAssembly **) compressed;
    while (*ptr != NULL) {
        uLong real_size;
        uLongf zsize;
        int result;
        MonoBundledAssembly *current;

        real_size = (*ptr)->assembly.size;
        zsize = (*ptr)->compressed_size;
        buffer = (Bytef *) malloc (real_size);
        result = my_inflate ((*ptr)->assembly.data, zsize, buffer, real_size);
        if (result != 0) {
            fprintf (stderr, "mkbundle: Error %d decompressing data for %s\n", result, (*ptr)->assembly.name);
            exit (1);
        }
        (*ptr)->assembly.data = buffer;
        current = (MonoBundledAssembly *) malloc (sizeof (MonoBundledAssembly));
        memcpy (current, *ptr, sizeof (MonoBundledAssembly));
        current->name = (*ptr)->assembly.name;
        *bundled_ptr = current;
        bundled_ptr++;
        ptr++;
    }
    *bundled_ptr = NULL;
    mono_register_bundled_assemblies((const MonoBundledAssembly **) bundled);
}

我們看到解壓時使用了compressed這個本檔案未定義的變數。通過工具原始碼我們得知其是一個型別為如下結構體指標的陣列：

typedef struct {
    const char *name;
    const unsigned char *data;
    const unsigned int size;
} MonoBundledAssembly;


typedef struct _compressed_data {
    MonoBundledAssembly assembly;
    int compressed_size;
} CompressedAssembly;

也就是說我們找到被打包後的程式的函式mono_mkbundle_init ，並找到對compressed這個資料的引用操作，就可以找到一個程式集個數的int32（64位打包target為int64）陣列，每個陣列為一個指向CompressedAssembly結構體的指標。（不好描述，繼續看我給的程式碼吧~）

因為compressed指向的是常量資料，一般位於執行檔案的類似名為.data或.const等段。

因為被打包後的程式如 mandroid.exe 往往無任何符號，定位mono_mkbundle_init 以及 compressed並不容易，往往需要靠人工判斷，這個想自動化完成。通過對各個版本的Xa*****程式集分析得到結果是，再無c語言級別上的程式碼大改動的情況下，同一語句生成的彙編的對資料引用的偏移量可能會變更，但如果不看資料引用的話，彙編語句的語義序列以及順序往往固定，也就是說我們可以根據此特徵定位位於函式mono_mkbundle_init 內對compressed變數引用時compressed變數在可執行檔案的虛擬地址（VA）。

下面我們就得請出偉大的洩漏版IDA Pro 6.5 （沒有的自己百度吧~pediy的資源區有）。

我們得知函式內有常量 [mkbundle: Error %d decompressing data for %s\n]這個字串（根據win或mac的編譯器不同，前面的mkbundle: 有時會沒有），而往往整個程式只有一個函式對此有引用，由此我們得到mono_mkbundle_init 函式，這個通過IDAPython指令碼可以得到，然後找到函式內第一次對資料段的引用這個引用的就是compressed變數，上程式碼：

#!/usr/bin/env python
# coding=gbk


# 支援 mtouch mtouch-64 mtouch.exe mandroid.exe 解包
# 用IDA開啟待分析檔案，等待分析完畢，執行此指令碼，將會在待分析檔案同目錄下生成臨時資料夾並解壓檔案
# by BinSys 


import urllib2, httplib
import zlib
import StringIO, gzip
import struct
import io
import sys

import idaapi
import idc
import idautils
from struct import *

import time
import datetime
from datetime import datetime, date, time



InputFileType_EXE = 11
InputFileType_MachO = 25
InputFileType = -1


Is64Bit = False



string_type_map = {
    0 : "ASCSTR_C",       #              C-string, zero terminated
    1 : "ASCSTR_PASCAL",  #              Pascal-style ASCII string (length byte)
    2 : "ASCSTR_LEN2",    #              Pascal-style, length is 2 bytes
    3 : "ASCSTR_UNICODE", #              Unicode string
    4 : "ASCSTR_LEN4",    #              Delphi string, length is 4 bytes
    5 : "ASCSTR_ULEN2",   #              Pascal-style Unicode, length is 2 bytes
    6 : "ASCSTR_ULEN4",   #              Pascal-style Unicode, length is 4 bytes
}




filetype_t_map = {
     0 : "f_EXE_old",            # MS DOS EXE File
     1 : "f_COM_old",            # MS DOS COM File
     2 : "f_BIN",                # Binary File
     3 : "f_DRV",                # MS DOS Driver
     4 : "f_WIN",                # New Executable (NE)
     5 : "f_HEX",                # Intel Hex Object File
     6 : "f_MEX",                # MOS Technology Hex Object File
     7 : "f_LX",                 # Linear Executable (LX)
     8 : "f_LE",                 # Linear Executable (LE)
     9 : "f_NLM",                # Netware Loadable Module (NLM)
    10 : "f_COFF",               # Common Object File Format (COFF)
    11 : "f_PE",                 # Portable Executable (PE)
    12 : "f_OMF",                # Object Module Format
    13 : "f_SREC",               # R-records
    14 : "f_ZIP",                # ZIP file (this file is never loaded to IDA database)
    15 : "f_OMFLIB",             # Library of OMF Modules
    16 : "f_AR",                 # ar library
    17 : "f_LOADER",             # file is loaded using LOADER DLL
    18 : "f_ELF",                # Executable and Linkable Format (ELF)
    19 : "f_W32RUN",             # Watcom DOS32 Extender (W32RUN)
    20 : "f_AOUT",               # Linux a.out (AOUT)
    21 : "f_PRC",                # PalmPilot program file
    22 : "f_EXE",                # MS DOS EXE File
    23 : "f_COM",                # MS DOS COM File
    24 : "f_AIXAR",              # AIX ar library
    25 : "f_MACHO",              # Max OS X
}

def FindStringEA():
    searchstr = str("mkbundle: Error %d decompressing data for %s\n")
    searchstr2 = str("Error %d decompresing data for %s\n")
    
    #Do not use default set up, we'll call setup().
    s = idautils.Strings(default_setup = False)
    # we want C & Unicode strings, and *only* existing strings.
    s.setup(strtypes=Strings.STR_C | Strings.STR_UNICODE, ignore_instructions = True, display_only_existing_strings = True)

    #loop through strings
    for i, v in enumerate(s):                
        if not v:
            #print("Failed to retrieve string at index {}".format(i))
            return -1
        else:
            #print("[{}] ea: {:#x} ; length: {}; type: {}; '{}'".format(i, v.ea, v.length, string_type_map.get(v.type, None), str(v)))
            if str(v) == searchstr or str(v) == searchstr2:
                return v.ea
                
    return -1


def FindUnFunction(StringEA):
    for ref in DataRefsTo(StringEA):
        f = idaapi.get_func(ref)

        if f:
            return f
    return None
    
def FindDataOffset(FuncEA):
    for funcitem in FuncItems(FuncEA):
        #print hex(funcitem)
        for dataref in DataRefsFrom(funcitem):
            return dataref
            #print "    " + hex(dataref)
    return None
    
def GetStructOffsetList(DataOffset):

    global Is64Bit


    if Is64Bit == True:
        addv = 8
        mf=MakeQword
        vf=Qword
    else:
        mf=MakeDword
        addv = 4
        vf=Dword
    
    AsmListStructListOffset = DataOffset
    
    currentoffset = AsmListStructListOffset



    mf(currentoffset)
    currentvalue = vf(currentoffset)
    currentoffset+=addv


    AsmListStructListOffsetList = []
    AsmListStructListOffsetList.append(currentvalue)
    
    while currentvalue!= 0:


        mf(currentoffset)
        currentvalue = vf(currentoffset)
        
        if currentvalue!=0:
            AsmListStructListOffsetList.append(currentvalue)
        currentoffset+=addv
        
    return AsmListStructListOffsetList
        
    #print len(AsmListStructListOffsetList)
    
    #for vv in AsmListStructListOffsetList:
        #print hex(vv)
        
def MakeFileItemStruct(FileItemStructOffset):

    global Is64Bit


    if Is64Bit == True:
        addv = 8
        mf=MakeQword
        vf=Qword
    else:
        mf=MakeDword
        addv = 4
        vf=Dword
    
    offset = FileItemStructOffset


    
    mf(offset)
    FileNameOffset = vf(offset)
    FileName = idc.GetString(FileNameOffset)
    offset+=addv
    
    mf(offset)
    FileDataOffset = vf(offset)
    offset+=addv
    
    mf(offset)
    FileSize = vf(offset)
    FileSizeOffset = offset
    offset+=addv
    
    
    
    mf(offset)
    FileCompressedSize = vf(offset)
    FileCompressedSizeOffset = offset
    offset+=addv
    
    IsGZip = 0
    
    FileDataCompressed = idc.GetManyBytes(FileDataOffset,FileCompressedSize)
    
    b1,b2,b3 = struct.unpack('ccc', FileDataCompressed[0:3])
    if b1 == '\x1f' and b2 == '\x8b' and b3 == '\x08':
        IsGZip = 1
    else:
        IsGZip = 0
    

    
    return {\
        "FileItemStructOffset":FileItemStructOffset, \
        "FileNameOffset":FileNameOffset,\
        "FileName":FileName,\
        "FileDataOffset":FileDataOffset,\
        "FileSize":FileSize,\
        "FileSizeOffset":FileSizeOffset,\
        "FileCompressedSizeOffset":FileCompressedSizeOffset,\
        "FileCompressedSize":FileCompressedSize,\
        "IsGZip":IsGZip,\
        "FileDataCompressed":FileDataCompressed\
         }




#Python語言: Python Cookbook: 比系統自帶的更加友好的makedir函式
#from: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/82465
def _mkdir(newdir):
    """works the way a good mkdir should :)
        - already exists, silently complete
        - regular file in the way, raise an exception
        - parent directory(ies) does not exist, make them as well
    """
    if os.path.isdir(newdir):
        pass
    elif os.path.isfile(newdir):
        raise OSError("a file with the same name as the desired " \
                      "dir, '%s', already exists." % newdir)
    else:
        head, tail = os.path.split(newdir)
        if head and not os.path.isdir(head):
            _mkdir(head)
        #print "_mkdir %s" % repr(newdir)
        if tail:
            os.mkdir(newdir)
def DecompressZLib(Data,Path):

    #compressedstream = StringIO.StringIO(Data) 
    data2 = zlib.decompress(Data)
    f = open(Path, 'wb')
    f.write(data2)
    f.close()
    pass

def DecompressGzipTo(Data,Path):

    compressedstream = StringIO.StringIO(Data)  
    gziper = gzip.GzipFile(fileobj=compressedstream)    
    data2 = gziper.read()   # 讀取解壓縮後資料

    f = open(Path, 'wb')
    f.write(data2)
    f.close()
    pass

def DecompressFileTo(FileItem,OutputDir):
    newpath = '{}\\{}'.format(OutputDir, FileItem["FileName"])
    #print newpath


    if FileItem["IsGZip"] == 1:
        DecompressGzipTo(FileItem["FileDataCompressed"],newpath)
        pass
    else:
        DecompressZLib(FileItem["FileDataCompressed"],newpath)
        pass

    pass


def main():
    global Is64Bit
    global InputFileType
    print("Input File:{}".format(GetInputFile()))
    print("Input File Path:{}".format(GetInputFilePath()))
    print("Idb File Path:{}".format(GetIdbPath()))
    print("cpu_name:{}".format(idc.GetShortPrm(idc.INF_PROCNAME).lower()))
    
    InputFileType = idc.GetShortPrm(idc.INF_FILETYPE)
    #ida.hpp filetype_t f_PE=11 f_MACHO=25
    
    print("InputFileType:{}".format(filetype_t_map.get(InputFileType, None)))
    
    
    if InputFileType != InputFileType_EXE and InputFileType != InputFileType_MachO:
        print "Error,Input file type must is PE or MachO!"
        return
        
    
    
    
    if (idc.GetShortPrm(idc.INF_LFLAGS) & idc.LFLG_64BIT) == idc.LFLG_64BIT:
        Is64Bit = True
    else:
        Is64Bit = False
        
    print("Is64Bit:{}".format(Is64Bit))
    
    
    OutputDir = '{}_{:%Y%m%d%H%M%S%f}'.format(GetInputFilePath(), datetime.now())
    _mkdir(OutputDir)
    print("OutputDir:{}".format(OutputDir))
    
    

    
    StringEA = FindStringEA()
    if StringEA == -1:
        print "Can't find StringEA!"
        return
    
    Func = FindUnFunction(StringEA)
    
    if not Func:
        print "Can't find Func!"
        return
        
        
    FuncName = idc.GetFunctionName(Func.startEA)
    
    print "Found Data Function:" + FuncName
        
    DataOffset = FindDataOffset(Func.startEA)
    if not DataOffset:
        print "Can't find DataOffset!"
        return
    
    print("DataOffset:0x{:016X}".format(DataOffset));
    
    
    
    StructOffsetList = GetStructOffsetList(DataOffset)
    
    if len(StructOffsetList) == 0:
        print "Can't find StructOffsetList!"
        return
    
    
    
    FileItems = []
    
    for StructOffsetItem in StructOffsetList:
        FileItemStruct = MakeFileItemStruct(StructOffsetItem)
        FileItems.append(FileItemStruct)
    
    for FileItem in FileItems:
        
        print("FileItemStructOffset:{:016X} FileNameOffset:{:016X} FileDataOffset:{:016X} FileSize:{:016X} FileCompressedSize:{:016X} IsGZip:{} FileName:{}"\
            .format( \
            FileItem["FileItemStructOffset"] , \
            FileItem["FileNameOffset"],\
            FileItem["FileDataOffset"],\
            FileItem["FileSize"],\
            FileItem["FileCompressedSize"],\
            FileItem["IsGZip"],\
            FileItem["FileName"]))

        DecompressFileTo(FileItem,OutputDir)
        
if __name__ == "__main__":
    main()

被壓縮的資料有兩種格式，新版和舊版不一樣，根據資料的頭部幾個位元組可以判斷壓縮格式。

MonoTouch 二三事（三）mono mkbundle 打包程式的解包支援

MonoTouch 二三事（三）mono mkbundle 打包程式的解包支援

MonoTouch 二三事（一）

童年生活二三事（語法）

關於Java虛擬機器二三事（八）---JVM機器指令集及其執行引擎

關於Java虛擬機器二三事（五）---類檔案結構（上）

讀vue-element-admin原始碼二三事（一）

Android：Handler 二三事（三）訊息處理機制

關於qemu的二三事（5）————qemu原始碼分析之引數解析

單身北漂生活二、三事（上）——北漂18年（8）

第三章（4）擴充套件------lambda表示式與閉包(關於lambda使用區域性變數的補充)

MonoTouch 二三事（二）

再刷PAT系列~ 1008 童年生活二三事（斐波那契數列）

zzuli OJ 1091: 童年生活二三事（多例項測試）

Android：Handler 二三事（二）由記憶體洩漏所想到的（垃圾回收機制）

大數據入門第二十二天——spark（三）自定義分區、排序與查找

區塊鏈二三事兒（技術篇）

資料結構篇：二叉樹（三：根據中序和後序遍歷結果推算出完整二叉樹）

關於JAVA你必須知道的那些事（三）：繼承和訪問修飾符

聊聊高併發（三十五）Java記憶體模型那些事（三）理解記憶體屏障

聊聊高併發（三十三）Java記憶體模型那些事（一）從一致性(Consistency)的角度理解Java記憶體模型

MonoTouch 二三事（三）mono mkbundle 打包程式的解包支援

相關推薦