作者:程序喵大人 2020-11-11 08:25:45
云计算
虚拟化
存储软件 我们会通过/proc文件系统找到正在运行的进程的字符串所在的虚拟内存地址,并通过更改此内存地址的内容来更改字符串内容,使你更深入了解虚拟内存这个概念!

成都网站建设哪家好,找创新互联建站!专注于网页设计、成都网站建设公司、微信开发、小程序制作、集团成都定制网站等服务项目。核心团队均拥有互联网行业多年经验,服务众多知名企业客户;涵盖的客户类型包括:社区文化墙等众多领域,积累了大量丰富的经验,同时也获得了客户的一致赞赏!
本文转载自微信公众号「程序喵大人」,作者程序喵大人 。转载本文请联系程序喵大人公众号。
摊牌了,不装了,其实我是程序喵辛苦工作一天还要回家编辑公众号到大半夜的老婆,希望各位大哥能踊跃转发,完成我一千阅读量的KPI(梦想),谢谢!
咳咳,有点跑题,以下是程序喵的废话,麻烦给个面子划到最后点击在看或者赞,证明我比程序喵人气高,谢谢!
通过/proc文件系统探究虚拟内存
我们会通过/proc文件系统找到正在运行的进程的字符串所在的虚拟内存地址,并通过更改此内存地址的内容来更改字符串内容,使你更深入了解虚拟内存这个概念!这之前先介绍下虚拟内存的定义!
虚拟内存
虚拟内存是一种实现在计算机软硬件之间的内存管理技术,它将程序使用到的内存地址(虚拟地址)映射到计算机内存中的物理地址,虚拟内存使得应用程序从繁琐的管理内存空间任务中解放出来,提高了内存隔离带来的安全性,虚拟内存地址通常是连续的地址空间,由操作系统的内存管理模块控制,在触发缺页中断时利用分页技术将实际的物理内存分配给虚拟内存,而且64位机器虚拟内存的空间大小远超出实际物理内存的大小,使得进程可以使用比物理内存大小更多的内存空间。
在深入研究虚拟内存前,有几个关键点:
virtual_memory.png
上图并不是特别详细的内存管理图,高地址其实还有内核空间等等,但这不是这篇文章的主题。从图中可以看到高地址存储着命令行参数和环境变量,之后是栈空间、堆空间和可执行程序,其中栈空间向下延申,堆空间向上增长,堆空间需要使用malloc分配,是动态分配的内存的一部分。
首先通过一个简单的C程序探究虚拟内存。
- #include
 - #include
 - #include
 - /**
 - * main - 使用strdup创建一个字符串的拷贝,strdup内部会使用malloc分配空间,
 - * 返回新空间的地址,这段地址空间需要外部自行使用free释放
 - *
 - * Return: EXIT_FAILURE if malloc failed. Otherwise EXIT_SUCCESS
 - */
 - int main(void)
 - {
 - char *s;
 - s = strdup("test_memory");
 - if (s == NULL)
 - {
 - fprintf(stderr, "Can't allocate mem with malloc\n");
 - return (EXIT_FAILURE);
 - }
 - printf("%p\n", (void *)s);
 - return (EXIT_SUCCESS);
 - }
 - 编译运行:gcc -Wall -Wextra -pedantic -Werror main.c -o test; ./test
 - 输出:0x88f010
 
我的机器是64位机器,进程的虚拟内存高地址为0xffffffffffffffff, 低地址为0x0,而0x88f010远小于0xffffffffffffffff,因此大概可以推断出被复制的字符串的地址(堆地址)是在内存低地址附近,具体可以通过/proc文件系统验证.
ls /proc目录可以看到好多文件,这里主要关注/proc/[pid]/mem和/proc/[pid]/maps
mem & maps
- man proc
 - /proc/[pid]/mem
 - This file can be used to access the pages of a process's memory through open(2), read(2), and lseek(2).
 - /proc/[pid]/maps
 - A file containing the currently mapped memory regions and their access permissions.
 - See mmap(2) for some further information about memory mappings.
 - The format of the file is:
 - address perms offset dev inode pathname
 - 00400000-00452000 r-xp 00000000 08:02 173521 /usr/bin/dbus-daemon
 - 00651000-00652000 r--p 00051000 08:02 173521 /usr/bin/dbus-daemon
 - 00652000-00655000 rw-p 00052000 08:02 173521 /usr/bin/dbus-daemon
 - 00e03000-00e24000 rw-p 00000000 00:00 0 [heap]
 - 00e24000-011f7000 rw-p 00000000 00:00 0 [heap]
 - ...
 - 35b1800000-35b1820000 r-xp 00000000 08:02 135522 /usr/lib64/ld-2.15.so
 - 35b1a1f000-35b1a20000 r--p 0001f000 08:02 135522 /usr/lib64/ld-2.15.so
 - 35b1a20000-35b1a21000 rw-p 00020000 08:02 135522 /usr/lib64/ld-2.15.so
 - 35b1a21000-35b1a22000 rw-p 00000000 00:00 0
 - 35b1c00000-35b1dac000 r-xp 00000000 08:02 135870 /usr/lib64/libc-2.15.so
 - 35b1dac000-35b1fac000 ---p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so
 - 35b1fac000-35b1fb0000 r--p 001ac000 08:02 135870 /usr/lib64/libc-2.15.so
 - 35b1fb0000-35b1fb2000 rw-p 001b0000 08:02 135870 /usr/lib64/libc-2.15.so
 - ...
 - f2c6ff8c000-7f2c7078c000 rw-p 00000000 00:00 0 [stack:986]
 - ...
 - 7fffb2c0d000-7fffb2c2e000 rw-p 00000000 00:00 0 [stack]
 - 7fffb2d48000-7fffb2d49000 r-xp 00000000 00:00 0 [vdso]
 - The address field is the address space in the process that the mapping occupies.
 - The perms field is a set of permissions:
 - r = read
 - w = write
 - x = execute
 - s = shared
 - p = private (copy on write)
 - The offset field is the offset into the file/whatever;
 - dev is the device (major:minor); inode is the inode on that device. 0 indicates
 - that no inode is associated with the memory region,
 - as would be the case with BSS (uninitialized data).
 - The pathname field will usually be the file that is backing the mapping.
 - For ELF files, you can easily coordinate with the offset field
 - by looking at the Offset field in the ELF program headers (readelf -l).
 - There are additional helpful pseudo-paths:
 - [stack]
 - The initial process's (also known as the main thread's) stack.
 - [stack:
 ] (since Linux 3.4) - A thread's stack (where the
 is a thread ID). - It corresponds to the /proc/[pid]/task/[tid]/ path.
 - [vdso] The virtual dynamically linked shared object.
 - [heap] The process's heap.
 - If the pathname field is blank, this is an anonymous mapping as obtained via the mmap(2) function.
 - There is no easy way to coordinate
 - this back to a process's source, short of running it through gdb(1), strace(1), or similar.
 - Under Linux 2.0 there is no field giving pathname.
 
通过mem文件可以访问和修改整个进程的内存页,通过maps可以看到进程当前已映射的内存区域,有地址和访问权限偏移量等,从maps中可以看到堆空间是在低地址而栈空间是在高地址. 从maps中可以看到heap的访问权限是rw,即可写,所以可以通过堆地址找到上个示例程序中字符串的地址,并通过修改mem文件对应地址的内容,就可以修改字符串的内容啦,程序:
- #include
 - #include
 - #include
 - #include
 - /**
 - * main - uses strdup to create a new string, loops forever-ever
 - *
 - * Return: EXIT_FAILURE if malloc failed. Other never returns
 - */
 - int main(void)
 - {
 - char *s;
 - unsigned long int i;
 - s = strdup("test_memory");
 - if (s == NULL)
 - {
 - fprintf(stderr, "Can't allocate mem with malloc\n");
 - return (EXIT_FAILURE);
 - }
 - i = 0;
 - while (s)
 - {
 - printf("[%lu] %s (%p)\n", i, s, (void *)s);
 - sleep(1);
 - i++;
 - }
 - return (EXIT_SUCCESS);
 - }
 - 编译运行:gcc -Wall -Wextra -pedantic -Werror main.c -o loop; ./loop
 - 输出:
 - [0] test_memory (0x21dc010)
 - [1] test_memory (0x21dc010)
 - [2] test_memory (0x21dc010)
 - [3] test_memory (0x21dc010)
 - [4] test_memory (0x21dc010)
 - [5] test_memory (0x21dc010)
 - [6] test_memory (0x21dc010)
 - ...
 
这里可以写一个脚本通过/proc文件系统找到字符串所在位置并修改其内容,相应的输出也会更改。
首先找到进程的进程号
- ps aux | grep ./loop | grep -v grep
 - zjucad 2542 0.0 0.0 4352 636 pts/3 S+ 12:28 0:00 ./loop
 
2542即为loop程序的进程号,cat /proc/2542/maps得到
- 00400000-00401000 r-xp 00000000 08:01 811716 /home/zjucad/wangzhiqiang/loop
 - 00600000-00601000 r--p 00000000 08:01 811716 /home/zjucad/wangzhiqiang/loop
 - 00601000-00602000 rw-p 00001000 08:01 811716 /home/zjucad/wangzhiqiang/loop
 - 021dc000-021fd000 rw-p 00000000 00:00 0 [heap]
 - 7f2adae2a000-7f2adafea000 r-xp 00000000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
 - 7f2adafea000-7f2adb1ea000 ---p 001c0000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
 - 7f2adb1ea000-7f2adb1ee000 r--p 001c0000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
 - 7f2adb1ee000-7f2adb1f0000 rw-p 001c4000 08:01 8661324 /lib/x86_64-linux-gnu/libc-2.23.so
 - 7f2adb1f0000-7f2adb1f4000 rw-p 00000000 00:00 0
 - 7f2adb1f4000-7f2adb21a000 r-xp 00000000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
 - 7f2adb3fa000-7f2adb3fd000 rw-p 00000000 00:00 0
 - 7f2adb419000-7f2adb41a000 r--p 00025000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
 - 7f2adb41a000-7f2adb41b000 rw-p 00026000 08:01 8661310 /lib/x86_64-linux-gnu/ld-2.23.so
 - 7f2adb41b000-7f2adb41c000 rw-p 00000000 00:00 0
 - 7ffd51bb3000-7ffd51bd4000 rw-p 00000000 00:00 0 [stack]
 - 7ffd51bdd000-7ffd51be0000 r--p 00000000 00:00 0 [vvar]
 - 7ffd51be0000-7ffd51be2000 r-xp 00000000 00:00 0 [vdso]
 - ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
 
看见堆地址范围021dc000-021fd000,并且可读可写,而且021dc000<0x21dc010<021fd000,这就可以确认字符串的地址在堆中,在堆中的索引是0x10(至于为什么是0x10,后面会讲到),这时可以通过mem文件到0x21dc010地址修改内容,字符串输出的内容也会随之更改,这里通过python脚本实现此功能。
- #!/usr/bin/env python3
 - '''
 - Locates and replaces the first occurrence of a string in the heap
 - of a process
 - Usage: ./read_write_heap.py PID search_string replace_by_string
 - Where:
 - - PID is the pid of the target process
 - - search_string is the ASCII string you are looking to overwrite
 - - replace_by_string is the ASCII string you want to replace
 - search_string with
 - '''
 - import sys
 - def print_usage_and_exit():
 - print('Usage: {} pid search write'.format(sys.argv[0]))
 - sys.exit(1)
 - # check usage
 - if len(sys.argv) != 4:
 - print_usage_and_exit()
 - # get the pid from args
 - pid = int(sys.argv[1])
 - if pid <= 0:
 - print_usage_and_exit()
 - search_string = str(sys.argv[2])
 - if search_string == "":
 - print_usage_and_exit()
 - write_string = str(sys.argv[3])
 - if search_string == "":
 - print_usage_and_exit()
 - # open the maps and mem files of the process
 - maps_filename = "/proc/{}/maps".format(pid)
 - print("[*] maps: {}".format(maps_filename))
 - mem_filename = "/proc/{}/mem".format(pid)
 - print("[*] mem: {}".format(mem_filename))
 - # try opening the maps file
 - try:
 - maps_file = open('/proc/{}/maps'.format(pid), 'r')
 - except IOError as e:
 - print("[ERROR] Can not open file {}:".format(maps_filename))
 - print(" I/O error({}): {}".format(e.errno, e.strerror))
 - sys.exit(1)
 - for line in maps_file:
 - sline = line.split(' ')
 - # check if we found the heap
 - if sline[-1][:-1] != "[heap]":
 - continue
 - print("[*] Found [heap]:")
 - # parse line
 - addr = sline[0]
 - perm = sline[1]
 - offset = sline[2]
 - device = sline[3]
 - inode = sline[4]
 - pathname = sline[-1][:-1]
 - print("\tpathname = {}".format(pathname))
 - print("\taddresses = {}".format(addr))
 - print("\tpermisions = {}".format(perm))
 - print("\toffset = {}".format(offset))
 - print("\tinode = {}".format(inode))
 - # check if there is read and write permission
 - if perm[0] != 'r' or perm[1] != 'w':
 - print("[*] {} does not have read/write permission".format(pathname))
 - maps_file.close()
 - exit(0)
 - # get start and end of the heap in the virtual memory
 - addr = addr.split("-")
 - if len(addr) != 2: # never trust anyone, not even your OS :)
 - print("[*] Wrong addr format")
 - maps_file.close()
 - exit(1)
 - addr_start = int(addr[0], 16)
 - addr_end = int(addr[1], 16)
 - print("\tAddr start [{:x}] | end [{:x}]".format(addr_start, addr_end))
 - # open and read mem
 - try:
 - mem_file = open(mem_filename, 'rb+')
 - except IOError as e:
 - print("[ERROR] Can not open file {}:".format(mem_filename))
 - print(" I/O error({}): {}".format(e.errno, e.strerror))
 - maps_file.close()
 - exit(1)
 - # read heap
 - mem_file.seek(addr_start)
 - heap = mem_file.read(addr_end - addr_start)
 - # find string
 - try:
 - i = heap.index(bytes(search_string, "ASCII"))
 - except Exception:
 - print("Can't find '{}'".format(search_string))
 - maps_file.close()
 - mem_file.close()
 - exit(0)
 - print("[*] Found '{}' at {:x}".format(search_string, i))
 - # write the new string
 - print("[*] Writing '{}' at {:x}".format(write_string, addr_start + i))
 - mem_file.seek(addr_start + i)
 - mem_file.write(bytes(write_string, "ASCII"))
 - # close files
 - maps_file.close()
 - mem_file.close()
 - # there is only one heap in our example
 - break
 
运行这个Python脚本
- zjucad@zjucad-ONDA-H110-MINI-V3-01:~/wangzhiqiang$ sudo ./loop.py 2542 test_memory test_hello
 - [*] maps: /proc/2542/maps
 - [*] mem: /proc/2542/mem
 - [*] Found [heap]:
 - pathname = [heap]
 - addresses = 021dc000-021fd000
 - permisions = rw-p
 - offset = 00000000
 - inode = 0
 - Addr start [21dc000] | end [21fd000]
 - [*] Found 'test_memory' at 10
 - [*] Writing 'test_hello' at 21dc010
 
同时字符串输出的内容也已更改
- [633] test_memory (0x21dc010)
 - [634] test_memory (0x21dc010)
 - [635] test_memory (0x21dc010)
 - [636] test_memory (0x21dc010)
 - [637] test_memory (0x21dc010)
 - [638] test_memory (0x21dc010)
 - [639] test_memory (0x21dc010)
 - [640] test_helloy (0x21dc010)
 - [641] test_helloy (0x21dc010)
 - [642] test_helloy (0x21dc010)
 - [643] test_helloy (0x21dc010)
 - [644] test_helloy (0x21dc010)
 - [645] test_helloy (0x21dc010)
 
实验成功。
通过实践画出虚拟内存空间分布图
再列出内存空间分布图
基本上每个人或多或少都了解虚拟内存的空间分布,那如何验证它呢,下面会提到。
堆栈空间
首先验证栈空间的位置,我们都知道C中局部变量是存储在栈空间的,malloc分配的内存是存储在堆空间,所以可以通过打印出局部变量地址和malloc的返回内存地址的方式来验证堆栈空间在整个虚拟空间中的位置。
- #include
 - #include
 - #include
 - /**
 - * main - print locations of various elements
 - *
 - * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 - */
 - int main(void)
 - {
 - int a;
 - void *p;
 - printf("Address of a: %p\n", (void *)&a);
 - p = malloc(98);
 - if (p == NULL)
 - {
 - fprintf(stderr, "Can't malloc\n");
 - return (EXIT_FAILURE);
 - }
 - printf("Allocated space in the heap: %p\n", p);
 - return (EXIT_SUCCESS);
 - }
 - 编译运行:gcc -Wall -Wextra -pedantic -Werror main.c -o test; ./test
 - 输出:
 - Address of a: 0x7ffedde9c7fc
 - Allocated space in the heap: 0x55ca5b360670
 
通过结果可以看出堆地址空间在栈地址空间下面,整理如图:
可执行程序
可执行程序也在虚拟内存中,可以通过打印main函数的地址,并与堆栈地址相比较,即可知道可执行程序地址相对于堆栈地址的分布。
- #include
 - #include
 - #include
 - /**
 - * main - print locations of various elements
 - *
 - * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 - */
 - int main(void)
 - {
 - int a;
 - void *p;
 - printf("Address of a: %p\n", (void *)&a);
 - p = malloc(98);
 - if (p == NULL)
 - {
 - fprintf(stderr, "Can't malloc\n");
 - return (EXIT_FAILURE);
 - }
 - printf("Allocated space in the heap: %p\n", p);
 - printf("Address of function main: %p\n", (void *)main);
 - return (EXIT_SUCCESS);
 - }
 - 编译运行:gcc main.c -o test; ./test
 - 输出:
 - Address of a: 0x7ffed846de2c
 - Allocated space in the heap: 0x561b9ee8c670
 - Address of function main: 0x561b9deb378a
 
由于main(0x561b9deb378a) < heap(0x561b9ee8c670) < (0x7ffed846de2c),可以画出分布图如下:
virtual_memory_stack_heap_executable.png
命令行参数和环境变量
程序入口main函数可以携带参数:
通过程序可以看见这些元素在虚拟内存中的位置:
- #include
 - #include
 - #include
 - /**
 - * main - print locations of various elements
 - *
 - * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 - */
 - int main(int ac, char **av, char **env)
 - {
 - int a;
 - void *p;
 - int i;
 - printf("Address of a: %p\n", (void *)&a);
 - p = malloc(98);
 - if (p == NULL)
 - {
 - fprintf(stderr, "Can't malloc\n");
 - return (EXIT_FAILURE);
 - }
 - printf("Allocated space in the heap: %p\n", p);
 - printf("Address of function main: %p\n", (void *)main);
 - printf("First bytes of the main function:\n\t");
 - for (i = 0; i < 15; i++)
 - {
 - printf("%02x ", ((unsigned char *)main)[i]);
 - }
 - printf("\n");
 - printf("Address of the array of arguments: %p\n", (void *)av);
 - printf("Addresses of the arguments:\n\t");
 - for (i = 0; i < ac; i++)
 - {
 - printf("[%s]:%p ", av[i], av[i]);
 - }
 - printf("\n");
 - printf("Address of the array of environment variables: %p\n", (void *)env);
 - printf("Address of the first environment variable: %p\n", (void *)(env[0]));
 - return (EXIT_SUCCESS);
 - }
 - 编译运行:gcc main.c -o test; ./test nihao hello
 - 输出:
 - Address of a: 0x7ffcc154a748
 - Allocated space in the heap: 0x559bd1bee670
 - Address of function main: 0x559bd09807ca
 - First bytes of the main function:
 - 55 48 89 e5 48 83 ec 40 89 7d dc 48 89 75 d0
 - Address of the array of arguments: 0x7ffcc154a848
 - Addresses of the arguments:
 - [./test]:0x7ffcc154b94f [nihao]:0x7ffcc154b956 [hello]:0x7ffcc154b95c
 - Address of the array of environment variables: 0x7ffcc154a868
 - Address of the first environment variable: 0x7ffcc154b962
 
结果如下:
main(0x559bd09807ca) < heap(0x559bd1bee670) < stack(0x7ffcc154a748) < argv(0x7ffcc154a848) < env(0x7ffcc154a868) < arguments(0x7ffcc154b94f->0x7ffcc154b95c + 6)(6为hello+1('\0')) < env first(0x7ffcc154b962)
可以看出所有的命令行参数都是相邻的,并且紧接着就是环境变量。
argv和env数组地址是相邻的吗
上例中argv有4个元素,命令行中有三个参数,还有一个NULL指向标记数组的末尾,每个指针是8字节,8*4=32, argv(0x7ffcc154a848) + 32(0x20) = env(0x7ffcc154a868),所以argv和env数组指针是相邻的.
命令行参数地址紧随环境变量地址之后吗
首先需要获取环境变量数组的大小,环境变量数组是以NULL结束的,所以可以遍历env数组,检查是否为NULL,获取数组大小,代码如下:
- #include
 - #include
 - #include
 - /**
 - * main - print locations of various elements
 - *
 - * Return: EXIT_FAILURE if something failed. Otherwise EXIT_SUCCESS
 - */
 - int main(int ac, char **av, char **env)
 - {
 - int a;
 - void *p;
 - int i;
 - int size;
 - printf("Address of a: %p\n", (void *)&a);
 - p = malloc(98);
 - if (p == NULL)
 - {
 - fprintf(stderr, "Can't malloc\n");
 - return (EXIT_FAILURE);
 - }
 - printf("Allocated space in the heap: %p\n", p);
 - printf("Address of function main: %p\n", (void *)main);
 - printf("First bytes of the main function:\n\t");
 - for (i&n
 当前题目:10张图22段代码,万字长文带你搞懂虚拟内存模型和Malloc内部原理
当前地址:http://www.csdahua.cn/qtweb/news22/299272.html网站建设、网络推广公司-快上网,是专注品牌与效果的网站制作,网络营销seo公司;服务项目有等
声明:本网站发布的内容(图片、视频和文字)以用户投稿、用户转载内容为主,如果涉及侵权请尽快告知,我们将会在第一时间删除。文章观点不代表本网站立场,如需处理请联系客服。电话:028-86922220;邮箱:631063699@qq.com。内容未经允许不得转载,或转载时需注明来源: 快上网