逆向ARM64内核zImage

主流的旗舰Android手机已经尽数升级到64位,相应的,内核镜像zImage也发生了改变。如果想要用IDA Pro逆向分析arm64的手机内核,特别是完成内核符号的加载,着实需要折腾一番功夫。

从/dev/block或ROM包中提取boot.img,然后用abootimg -x解开得到zImage

如果zImage是gzip压缩的,就gzip -d解压得到kernel

以上两部都是常规项目,下面重点是要从kernel中提取本应显示在/proc/kallsyms下的内核符号,这样IDA Pro加载分析时才更得心应手。参考Bits, Please!的文章中32位的kernel符号提取方法,可以很快想到64位的解决方案:

首先要知道内核加载时的虚拟地址,一种投机的方法是,手机开机后执行:

shell@surabaya:/ $ dmesg
...
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vmalloc : 0xffffff8000000000 - 0xffffffbdbfff0000   (   246 GB)
[    0.000000]     vmemmap : 0xffffffbdc0000000 - 0xffffffbfc0000000   (     8 GB maximum)
[    0.000000]     PCI I/O : 0xffffffbffa000000 - 0xffffffbffb000000   (    16 MB)
[    0.000000]     fixed   : 0xffffffbffbdfe000 - 0xffffffbffbdff000   (     4 KB)
[    0.000000]     modules : 0xffffffbffc000000 - 0xffffffc000000000   (    64 MB)
[    0.000000]     memory  : 0xffffffc000000000 - 0xffffffc0fe550000   (  4069 MB)
[    0.000000]       .init : 0xffffffc001600000 - 0xffffffc001813000   (  2124 KB)
[    0.000000]       .text : 0xffffffc000080000 - 0xffffffc001600000   ( 22016 KB)
[    0.000000]       .data : 0xffffffc00181d000 - 0xffffffc001995f80   (  1508 KB)
...

由于现在手机还没有开启KASLR,所以基地址基本上总是0xffffffc000080000,有了这个地址就可以从kernel中找到symbol table了。内核导出的前两个符号stext,_text等总是指向0xffffffc000080000,所以搜索连续的两个0xffffffc000080000就能找到symbol table。之后按照Bits, Please!的方法就可以导出所有符号了,唯一要注意的是32位到64位,地址长度变成了8字节,内存对齐也从0x10变成了0x100。修改原来的Python脚本,开发了一个arm64解析符号的脚本:

import sys
import struct

#The default address at which the kernel text segment is loaded
DEFAULT_KERNEL_TEXT_START = 0xffffffc000080000

#The size of the QWORD in a 64-bit architecture
QWORD_SIZE = struct.calcsize("Q")

#The size of the DWORD in a 32-bit architecture
DWORD_SIZE = struct.calcsize("I")

#The size of the WORD in a 32-bit architecture
WORD_SIZE = struct.calcsize("H")

#The alignment of labels in the resulting kernel file
LABEL_ALIGN = 0x100

#The minimal number of repeating addresses pointing to the kernel's text start address
#which are used as a heuristic in order to find the beginning of the kernel's symbol
#table. Since usually there are at least two symbols pointing to the beginning of the
#text segment ("stext", "_text"), the minimal number for the heuristic is 2.
KALLSYMS_ADDRESSES_MIN_HEURISTIC = 2

def read_qword(kernel_data, offset):
	'''
	Reads a DWORD from the given offset within the kernel data
	'''
	return struct.unpack("<Q", kernel_data[offset : offset + QWORD_SIZE])[0]

def read_dword(kernel_data, offset):
	'''
	Reads a DWORD from the given offset within the kernel data
	'''
	return struct.unpack("<I", kernel_data[offset : offset + DWORD_SIZE])[0]

def read_word(kernel_data, offset):
	'''
	Reads a WORD from the given offset within the kernel data
	'''
	return struct.unpack("<H", kernel_data[offset : offset + WORD_SIZE])[0]

def read_byte(kernel_data, offset):
	'''
	Reads an unsigned byte from the given offset within the kernel data
	'''
	return struct.unpack("<B", kernel_data[offset : offset + 1])[0]

def read_c_string(kernel_data, offset):
	'''
	Reads a NUL-delimited C-string from the given offset
	'''
	current_offset = offset
	result_str = ""
	while kernel_data[current_offset] != '\x00':
		result_str += kernel_data[current_offset]
		current_offset += 1
	return result_str

def label_align(address):
	'''
	Aligns the given value to the closest label output boundry
	'''
	return address & ~(LABEL_ALIGN-1)

def find_kallsyms_addresses(kernel_data, kernel_text_start):
	'''
	Searching for the beginning of the kernel's symbol table
	Returns the offset of the kernel's symbol table, or -1 if the symbol table could not be found
	'''
	search_str = struct.pack("<Q", DEFAULT_KERNEL_TEXT_START) * KALLSYMS_ADDRESSES_MIN_HEURISTIC
	return kernel_data.find(search_str)

def get_kernel_symbol_table(kernel_data, kernel_text_start):	
	'''
	Retrieves the kernel's symbol table from the given kernel file
	'''

	#Getting the beginning and end of the kallsyms_addresses table
	kallsyms_addresses_off = find_kallsyms_addresses(kernel_data, kernel_text_start)
	kallsyms_addresses_end_off = kernel_data.find(struct.pack("<Q", 0), kallsyms_addresses_off)
	num_symbols = (kallsyms_addresses_end_off - kallsyms_addresses_off) / QWORD_SIZE

	#Making sure that kallsyms_num_syms matches the table size
	kallsyms_num_syms_off = label_align(kallsyms_addresses_end_off + LABEL_ALIGN)
	kallsyms_num_syms = read_qword(kernel_data, kallsyms_num_syms_off)
	if kallsyms_num_syms != num_symbols:
		print "[-] Actual symbol table size: %d, read symbol table size: %d" % (num_symbols, kallsyms_num_syms)
		return None	

	#Calculating the location of the markers table
	kallsyms_names_off = label_align(kallsyms_num_syms_off + LABEL_ALIGN)
	current_offset = kallsyms_names_off
	for i in range(0, num_symbols):
		current_offset += read_byte(kernel_data, current_offset) + 1
	kallsyms_markers_off = label_align(current_offset + LABEL_ALIGN)

	#Reading the token table
	'''
        Not sure if this can be a universal solution
        '''
	kallsyms_token_table_off = label_align(kernel_data.find(struct.pack("<Q", 0)*2, kallsyms_markers_off)+LABEL_ALIGN)
##	kallsyms_token_table_off = label_align(kallsyms_markers_off + (((num_symbols + 255) >> 8) * QWORD_SIZE))
	current_offset = kallsyms_token_table_off
	for i in range(0, 256):
		token_str = read_c_string(kernel_data, current_offset)
		current_offset += len(token_str) + 1
	kallsyms_token_index_off = label_align(current_offset + LABEL_ALIGN)

	#Creating the token table
	token_table = []
	for i in range(0, 256):
		index = read_word(kernel_data, kallsyms_token_index_off + i * WORD_SIZE)
		token_table.append(read_c_string(kernel_data, kallsyms_token_table_off + index))

	#Decompressing the symbol table using the token table
	offset = kallsyms_names_off
	symbol_table = []
	for i in range(0, num_symbols):
		num_tokens = read_byte(kernel_data, offset)
		offset += 1
		symbol_name = ""
		for j in range(num_tokens, 0, -1):
			token_table_idx = read_byte(kernel_data, offset)
			symbol_name += token_table[token_table_idx]
			offset += 1

		symbol_address = read_qword(kernel_data, kallsyms_addresses_off + i * QWORD_SIZE)
		symbol_table.append((symbol_address, symbol_name[0], symbol_name[1:]))
		
	return symbol_table

def main():

	#Verifying the arguments
	if len(sys.argv) < 2:
		print "USAGE: %s: <KERNEL_FILE> [optional: <0xKERNEL_TEXT_START>]" % sys.argv[0]
		return
	kernel_data = open(sys.argv[1], "rb").read()
	kernel_text_start = int(sys.argv[2], 16) if len(sys.argv) == 3 else DEFAULT_KERNEL_TEXT_START

	
	#Getting the kernel symbol table
	symbol_table = get_kernel_symbol_table(kernel_data, kernel_text_start)
	fp = open("syms","wb")
	for symbol in symbol_table:
		print "%016X %s %s" % symbol
		fp.write("%016X %s %s\n" % symbol)
	fp.close()
		
if __name__ == "__main__":
	main()

输出的符号会按照/proc/kallsyms打印出来,同时会写入当前目录syms文件。接下来就是让IDA Pro识别syms文件了,我的做法是针对每个符号尝试给特定地址重命名,如果失败就undefine以后再试一次,对于代码段的函数都重新makecode一次:

lines = open("syms","rb").read().split("\n")
for line in lines:
    [addr, type, name] = line.split(" ")
    if not MakeNameEx(int(addr,16), name, SN_NOWARN):
        MakeUnkn(int(addr,16),1)
        MakeNameEx(int(addr,16), name, SN_NOWARN)
    if type == "t" or type=="T":
        MakeUnkn(int(addr,16),1)
        MakeCode(int(addr,16))

发表回复

您的电子邮箱地址不会被公开。 必填项已用 * 标注