主流的旗舰Android手机已经尽数升级到64位,相应的,内核镜像zImage也发生了改变。如果想要用IDA Pro逆向分析arm64的手机内核,特别是完成内核符号的加载,着实需要折腾一番功夫。
从/dev/block或ROM包中提取boot.img,然后用abootimg -x解开得到zImage
如果zImage是gzip压缩的,就gzip -d解压得到kernel
以上两部都是常规项目,下面重点是要从kernel中提取本应显示在/proc/kallsyms下的内核符号,这样IDA Pro加载分析时才更得心应手。参考Bits, Please!的文章中32位的kernel符号提取方法,可以很快想到64位的解决方案:
首先要知道内核加载时的虚拟地址,一种投机的方法是,手机开机后执行:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
shell@surabaya:/ $ dmesg ... [ 0.000000] Virtual kernel memory layout: [ 0.000000] vmalloc : 0xffffff8000000000 - 0xffffffbdbfff0000 ( 246 GB) [ 0.000000] vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000 ( 8 GB maximum) [ 0.000000] PCI I/O : 0xffffffbffa000000 - 0xffffffbffb000000 ( 16 MB) [ 0.000000] fixed : 0xffffffbffbdfe000 - 0xffffffbffbdff000 ( 4 KB) [ 0.000000] modules : 0xffffffbffc000000 - 0xffffffc000000000 ( 64 MB) [ 0.000000] memory : 0xffffffc000000000 - 0xffffffc0fe550000 ( 4069 MB) [ 0.000000] .init : 0xffffffc001600000 - 0xffffffc001813000 ( 2124 KB) [ 0.000000] .text : 0xffffffc000080000 - 0xffffffc001600000 ( 22016 KB) [ 0.000000] .data : 0xffffffc00181d000 - 0xffffffc001995f80 ( 1508 KB) ... |
由于现在手机还没有开启KASLR,所以基地址基本上总是0xffffffc000080000,有了这个地址就可以从kernel中找到symbol table了。内核导出的前两个符号stext,_text等总是指向0xffffffc000080000,所以搜索连续的两个0xffffffc000080000就能找到symbol table。之后按照Bits, Please!的方法就可以导出所有符号了,唯一要注意的是32位到64位,地址长度变成了8字节,内存对齐也从0x10变成了0x100。修改原来的Python脚本,开发了一个arm64解析符号的脚本:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
import sys import struct #The default address at which the kernel text segment is loaded DEFAULT_KERNEL_TEXT_START = 0xffffffc000080000 #The size of the QWORD in a 64-bit architecture QWORD_SIZE = struct.calcsize("Q") #The size of the DWORD in a 32-bit architecture DWORD_SIZE = struct.calcsize("I") #The size of the WORD in a 32-bit architecture WORD_SIZE = struct.calcsize("H") #The alignment of labels in the resulting kernel file LABEL_ALIGN = 0x100 #The minimal number of repeating addresses pointing to the kernel's text start address #which are used as a heuristic in order to find the beginning of the kernel's symbol #table. Since usually there are at least two symbols pointing to the beginning of the #text segment ("stext", "_text"), the minimal number for the heuristic is 2. KALLSYMS_ADDRESSES_MIN_HEURISTIC = 2 def read_qword(kernel_data, offset): ''' Reads a DWORD from the given offset within the kernel data ''' return struct.unpack("<Q", kernel_data[offset : offset + QWORD_SIZE])[0] def read_dword(kernel_data, offset): ''' Reads a DWORD from the given offset within the kernel data ''' return struct.unpack("<I", kernel_data[offset : offset + DWORD_SIZE])[0] def read_word(kernel_data, offset): ''' Reads a WORD from the given offset within the kernel data ''' return struct.unpack("<H", kernel_data[offset : offset + WORD_SIZE])[0] def read_byte(kernel_data, offset): ''' Reads an unsigned byte from the given offset within the kernel data ''' return struct.unpack("<B", kernel_data[offset : offset + 1])[0] def read_c_string(kernel_data, offset): ''' Reads a NUL-delimited C-string from the given offset ''' current_offset = offset result_str = "" while kernel_data[current_offset] != '\x00': result_str += kernel_data[current_offset] current_offset += 1 return result_str def label_align(address): ''' Aligns the given value to the closest label output boundry ''' return address & ~(LABEL_ALIGN-1) def find_kallsyms_addresses(kernel_data, kernel_text_start): ''' Searching for the beginning of the kernel's symbol table Returns the offset of the kernel's symbol table, or -1 if the symbol table could not be found ''' search_str = struct.pack("<Q", DEFAULT_KERNEL_TEXT_START) * KALLSYMS_ADDRESSES_MIN_HEURISTIC return kernel_data.find(search_str) def get_kernel_symbol_table(kernel_data, kernel_text_start): ''' Retrieves the kernel's symbol table from the given kernel file ''' #Getting the beginning and end of the kallsyms_addresses table kallsyms_addresses_off = find_kallsyms_addresses(kernel_data, kernel_text_start) kallsyms_addresses_end_off = kernel_data.find(struct.pack("<Q", 0), kallsyms_addresses_off) num_symbols = (kallsyms_addresses_end_off - kallsyms_addresses_off) / QWORD_SIZE #Making sure that kallsyms_num_syms matches the table size kallsyms_num_syms_off = label_align(kallsyms_addresses_end_off + LABEL_ALIGN) kallsyms_num_syms = read_qword(kernel_data, kallsyms_num_syms_off) if kallsyms_num_syms != num_symbols: print "[-] Actual symbol table size: %d, read symbol table size: %d" % (num_symbols, kallsyms_num_syms) return None #Calculating the location of the markers table kallsyms_names_off = label_align(kallsyms_num_syms_off + LABEL_ALIGN) current_offset = kallsyms_names_off for i in range(0, num_symbols): current_offset += read_byte(kernel_data, current_offset) + 1 kallsyms_markers_off = label_align(current_offset + LABEL_ALIGN) #Reading the token table ''' Not sure if this can be a universal solution ''' kallsyms_token_table_off = label_align(kernel_data.find(struct.pack("<Q", 0)*2, kallsyms_markers_off)+LABEL_ALIGN) ## kallsyms_token_table_off = label_align(kallsyms_markers_off + (((num_symbols + 255) >> 8) * QWORD_SIZE)) current_offset = kallsyms_token_table_off for i in range(0, 256): token_str = read_c_string(kernel_data, current_offset) current_offset += len(token_str) + 1 kallsyms_token_index_off = label_align(current_offset + LABEL_ALIGN) #Creating the token table token_table = [] for i in range(0, 256): index = read_word(kernel_data, kallsyms_token_index_off + i * WORD_SIZE) token_table.append(read_c_string(kernel_data, kallsyms_token_table_off + index)) #Decompressing the symbol table using the token table offset = kallsyms_names_off symbol_table = [] for i in range(0, num_symbols): num_tokens = read_byte(kernel_data, offset) offset += 1 symbol_name = "" for j in range(num_tokens, 0, -1): token_table_idx = read_byte(kernel_data, offset) symbol_name += token_table[token_table_idx] offset += 1 symbol_address = read_qword(kernel_data, kallsyms_addresses_off + i * QWORD_SIZE) symbol_table.append((symbol_address, symbol_name[0], symbol_name[1:])) return symbol_table def main(): #Verifying the arguments if len(sys.argv) < 2: print "USAGE: %s: <KERNEL_FILE> [optional: <0xKERNEL_TEXT_START>]" % sys.argv[0] return kernel_data = open(sys.argv[1], "rb").read() kernel_text_start = int(sys.argv[2], 16) if len(sys.argv) == 3 else DEFAULT_KERNEL_TEXT_START #Getting the kernel symbol table symbol_table = get_kernel_symbol_table(kernel_data, kernel_text_start) fp = open("syms","wb") for symbol in symbol_table: print "%016X %s %s" % symbol fp.write("%016X %s %s\n" % symbol) fp.close() if __name__ == "__main__": main() |
输出的符号会按照/proc/kallsyms打印出来,同时会写入当前目录syms文件。接下来就是让IDA Pro识别syms文件了,我的做法是针对每个符号尝试给特定地址重命名,如果失败就undefine以后再试一次,对于代码段的函数都重新makecode一次:
1 2 3 4 5 6 7 8 9 |
lines = open("syms","rb").read().split("\n") for line in lines: [addr, type, name] = line.split(" ") if not MakeNameEx(int(addr,16), name, SN_NOWARN): MakeUnkn(int(addr,16),1) MakeNameEx(int(addr,16), name, SN_NOWARN) if type == "t" or type=="T": MakeUnkn(int(addr,16),1) MakeCode(int(addr,16)) |