`
xitongyunwei
  • 浏览: 915969 次
文章分类
社区版块
存档分类
最新评论

How The Kernel Manages Your Memory(内核如何管理程序的内存)

 
阅读更多

转载自:http://duartes.org/gustavo/blog/post/how-the-kernel-manages-your-memory


After examining thevirtual address layoutof a process, we turn to the kernel and its mechanisms for managing user memory.Here is gonzo again:

Linux kernel mm_struct

Linux processes are implemented in the kernel as instances oftask_struct, the process descriptor.Themmfield in task_struct points to thememory descriptor,mm_struct, which is an executive summary of a program's memory.It stores the start and end of memory segments as shown above, thenumberof physical memory pages used by the process (rssstands for Resident Set Size), theamountof virtual address space used, and other tidbits.Within the memory descriptor we also find the two work horses for managing program memory: the set ofvirtual memory areasand thepage tables.Gonzo's memory areas are shown below:

Kernel memory descriptor and memory areas

Each virtual memory area (VMA) is a contiguous range of virtual addresses; these areas never overlap.An instance ofvm_area_structfully describes a memory area, including its start and end addresses,flagsto determine access rights and behaviors, and thevm_filefield to specify which file is being mapped by the area, if any.A VMA that does not map a file isanonymous.Each memory segment above (eg, heap, stack) corresponds to a single VMA, with the exception of the memory mapping segment.This is not a requirement, though it is usual in x86 machines.VMAs do not care which segment they are in.

A program's VMAs are stored in its memory descriptor both as a linked list in themmapfield, ordered by starting virtual address, and as ared-black treerooted at themm_rbfield.The red-black tree allows the kernel to search quickly for the memory area covering a given virtual address.When you read file/proc/pid_of_process/maps, the kernel is simply going through the linked list of VMAs for the process andprinting each one.

In Windows, theEPROCESSblock is roughly a mix of task_struct and mm_struct.The Windows analog to a VMA is the Virtual Address Descriptor, orVAD; they are stored in anAVL tree.You know what the funniest thing about Windows and Linux is?It's the little differences.

The 4GB virtual address space is divided intopages.x86 processors in 32-bit mode support page sizes of 4KB, 2MB, and 4MB.Both Linux and Windows map the user portion of the virtual address space using 4KB pages.Bytes 0-4095 fall in page 0, bytes 4096-8191 fall in page 1, and so on.The size of a VMAmust be a multiple of page size.Here's 3GB of user space in 4KB pages:

4KB Pages Virtual User Space

The processor consultspage tablesto translate a virtual address into a physical memory address.Each process has its own set of page tables; whenever a process switch occurs, page tables for user space are switched as well.Linux stores a pointer to a process' page tables in thepgdfield of the memory descriptor.To each virtual page there corresponds onepage table entry(PTE) in the page tables, which in regular x86 paging is a simple 4-byte record shown below:

x86 Page Table Entry (PTE) for 4KB page

Linux has functions toreadandseteach flag in a PTE.Bit P tells the processor whether the virtual page ispresentin physical memory.If clear (equal to 0), accessing the page triggers a page fault.Keep in mind that when this bit is zero,the kernel can do whatever it pleaseswith the remaining fields.The R/W flag stands for read/write; if clear, the page is read-only.Flag U/S stands for user/supervisor; if clear, then the page can only be accessed by the kernel.These flags are used to implement the read-only memory and protected kernel space we saw before.

Bits D and A are fordirtyandaccessed.A dirty page has had a write, while an accessed page has had a write or read.Both flags are sticky: the processor only sets them, they must be cleared by the kernel.Finally, the PTE stores the starting physical address that corresponds to this page, aligned to 4KB.This naive-looking field is the source of some pain, for it limits addressable physical memory to4 GB.The other PTE fields are for another day, as is Physical Address Extension.

A virtual page is the unit of memory protection because all of its bytes share the U/S and R/W flags.However, the same physical memory could be mapped by different pages, possibly with different protection flags.Notice that execute permissions are nowhere to be seen in the PTE.This is why classic x86 paging allows code on the stack to be executed, making it easier to exploit stack buffer overflows (it's still possible to exploit non-executable stacks usingreturn-to-libcand other techniques).This lack of a PTE no-execute flag illustrates a broader fact: permission flags in a VMA may or may not translate cleanly into hardware protection.The kernel does what it can, but ultimately the architecture limits what is possible.

Virtual memory doesn't store anything, it simplymapsa program's address space onto the underlying physical memory, which is accessed by the processor as a large block called thephysical address space.While memory operations on the bus aresomewhat involved, we can ignore that here and assume that physical addresses range from zero to the top of available memory in one-byte increments.This physical address space is broken down by the kernel intopage frames.The processor doesn't know or care about frames, yet they are crucial to the kernel becausethe page frame is the unit of physical memory management.Both Linux and Windows use 4KB page frames in 32-bit mode; here is an example of a machine with 2GB of RAM:

Physical Address Space

In Linux each page frame is tracked by adescriptorandseveral flags.Together these descriptors track the entire physical memory in the computer; the precise state of each page frame is always known.Physical memory is managed with thebuddy memory allocationtechnique, hence a page frame isfreeif it's available for allocation via the buddy system.An allocated page frame might beanonymous, holding program data, or it might be in thepage cache, holding data stored in a file or block device.There are other exotic page frame uses, but leave them alone for now.Windows has an analogous Page Frame Number (PFN) database to track physical memory.

Let's put together virtual memory areas, page table entries and page frames to understand how this all works.Below is an example of a user heap:

Physical Address Space

Blue rectangles represent pages in the VMA range, while arrows represent page table entries mapping pages onto page frames.Some virtual pages lack arrows; this means their corresponding PTEs have thePresentflag clear.This could be because the pages have never been touched or because their contents have been swapped out.In either case access to these pages will lead to page faults, even though they are within the VMA.It may seem strange for the VMA and the page tables to disagree, yet this often happens.

A VMA is like a contract between your program and the kernel.You ask for something to be done (memory allocated, a file mapped, etc.), the kernel says “sure”, and it creates or updates the appropriate VMA.Butit does notactually honor the request right away, it waits until a page fault happens to do real work.The kernel is a lazy, deceitful sack of scum; this is the fundamental principle of virtual memory.It applies in most situations, some familiar and some surprising, but the rule is that VMAs record what has beenagreed upon, while PTEs reflect what hasactually been doneby the lazy kernel.These two data structures together manage a program's memory; both play a role in resolving page faults, freeing memory, swapping memory out, and so on.Let's take the simple case of memory allocation:

Example of demand paging and memory allocation

When the program asks for more memory via thebrk()system call, the kernel simplyupdatesthe heap VMA and calls it good.No page frames are actually allocated at this point and the new pages are not present in physical memory.Once the program tries to access the pages, the processor page faults anddo_page_fault()is called.Itsearchesfor the VMA covering the faulted virtual address usingfind_vma().If found, the permissions on the VMA are also checked against the attempted access (read or write).If there's no suitable VMA, no contract covers the attempted memory access and the process is punished by Segmentation Fault.

When a VMA isfoundthe kernel musthandlethe fault by looking at the PTE contents and the type of VMA.In our case, the PTE shows the page isnot present.In fact, our PTE is completely blank (all zeros), which in Linux means the virtual page has never been mapped.Since this is an anonymous VMA, we have a purely RAM affair that must be handled bydo_anonymous_page(), which allocates a page frame and makes a PTE to map the faulted virtual page onto the freshly allocated frame.

Things could have been different.The PTE for a swapped out page, for example, has 0 in the Present flag but is not blank.Instead, it stores the swap location holding the page contents, which must be read from disk and loaded into a page frame bydo_swap_page()in what is called amajor fault.

This concludes the first half of our tour through the kernel's user memory management.In the next post, we'll throw files into the mix to build a complete picture of memory fundamentals, including consequences for performance.




/*********************************************************************

google机器人翻译的结果,参考一下

**********************************************************************/


内核如何管理你的内存


检查后的虚拟地址布局的过程中,我们的内核和用户内存管理机制。这里又是愚蠢:

Linux内核的mm_struct

的Linux进程task_struct的,进程描述符的实例在内核中实现的。内存描述符指向mm_struct毫米领域中的task_struct点,这是一个程序的内存的执行摘要。它存储了如上图所示的内存段的开始和结束,所使用的物理内存页的过程(RSS代表驻留集大小)的虚拟地址空间的使用,以及其他花絮。在内存描述符中,我们还可以找到两个工作管理程序存储器:马组的虚拟内存区页表。Gonzo的内存区域,则如下所示:

内核内存描述符和存储区

每个虚拟内存区(VMA)是一个连续的虚拟地址范围,这些区域不重叠。的一个实例的vm_area_struct充分描述的存储区域,包括其开始和结束地址,标志,以确定访问权限和行为,和vm_file字段指定哪些文件被映射到由区域,如果有的话。不映射文件的一个VMA是匿名的。以上(例如,堆,栈)每个存储器段对应于一个单一的VMA,与异常的内存映射段。这不是必需的,但它通常是在x86机器。VMA的不关心这部分他们所处

一个程序的VMA都存储在其内存描述符作为一个链表的mmap的领域中,命令的起始虚拟地址,并作为一个红黑树的扎根在mm_rb领域。红黑树的快速搜索允许内核的内存区域覆盖给定虚拟地址。当你读文件/的进程/ pid_of_process /地图,内核是简单地通过VMA的链表的过程中,每一个印刷

在Windows中,EPROCESS块大约进程和mm_struct的组合。一个VMA的Windows模拟的虚拟地址描述符,或VAD,它们存储在一个AVL树。你知道Windows和Linux的最有趣的事情是什么?这是一个小的差异。

4GB的虚拟地址空间被划分为页面。在32位模式下支持的页面大小为4KB,2MB,以及4MB的x86处理器。Linux和Windows的用户部分使用4KB的页面的虚拟地址空间映射。字节4095秋季第0页,在第1页字节4096-8191秋季,等等。一个VMA的大小必须是页大小的倍数。这里有3GB的用户空间的4KB的页面:

4KB页的虚拟用户空间

该处理器参考页表翻译一个虚拟地址转换成物理存储器地址。每个进程都有自己的一套页表发生进程切换时,切换用户空间的页表。Linux的一个进程的页表中的PGD的存储器描述符的字段存储一个指针。给每个虚拟页面有对应一个页表项(PTE)在页表中,这在常规的x86寻呼是一个简单的4字节的记录,如下所示:

x86的为4KB页的页表项(PTE)

Linux的功能,读取设置每个标志的PTE。位P告诉虚拟页面是否是出现在物理存储器的处理器。如果明确(等于0),访问该页面的触发页面错误。请记住,当该位为0时,内核就可以为所欲为,其余字段。读/写R / W标志表示,如果清晰,页面是只读的。标志U / S代表用户/管理员,如果明确的,然后在页面只能由内核访问。这些标志用于实施只读存储器和保护的内核空间,我们之前看到的。

位D和A是访问的。有一个写脏页,有一个读或写访问的页面。这两个标志都粘处理器只设置他们,他们必须被清除的内核。最后,PTE存储的起始物理地址对应到这个页面,为4KB对齐。这天真的领域是一些疼痛的来源,因为它限制到4 GB的可寻址的物理内存。该其他PTE领域的是新的一天,因为是物理地址扩展。

虚拟页面,因为其所有的字节的内存保护单元共享的U / S和R / W标志。然而,相同的物理内存可以被映射不同的页面,可能有不同的保护标志。注意执行权限无处可看到在PTE。这就是为什么经典的x86寻呼允许在堆栈上执行代码,使其更容易利用堆栈的缓冲区溢出(它仍然可以利用非可执行堆栈返回到libc中和其他技术)。这种缺乏的一个PTE不执行标志说明一个更广泛的事实:在VMA的权限标志,可能会或可能不会翻译干净的硬件保护。内核做是可以的,但最终的架构限制什么是可能的。

虚拟内存不存储任何东西,它只是一个程序的地址空间映射到底层的物理内存作为一个大的块称为物理地址空间,这是由处理器访问。虽然内存总线上的操作较为复杂,我们可以忽略,在这里,假设物理地址范围从零到顶部的可用内存以字节为单位。物理地址空间被分解到页面帧的内核。处理器不知道或关心帧,但他们是至关重要的页框的内核,因为是单位的物理内存管理Linux和Windows使用4KB的页帧在32位模式下,这里是一个例子机2GB的RAM:

物理地址空间

在Linux中,每个页框的描述符几个标志进行跟踪。这些描述符跟踪整个物理内存在计算机中的每一页帧精确的状态是已知的。物理内存管理的哥们内存分配技术,因此一个页框是免费的,如果它是通过伙伴系统的分配。分配的页框可能是匿名的,保存程序数据,或者它可能是在页面缓存,数据存储在文件或块设备的。还有其他一些异国情调的页面帧的用途,但现在他们独自离开。Windows有一个类似的页面帧号(PFN)数据库来跟踪物理内存。

让我们把虚拟内存区域,页表项和页面框架来理解这一切是如何工作的。下面是一个例子的用户堆:

物理地址空间

蓝色的矩形代表在VMA范围内的页面,,而箭头页映射到页面帧的页表项。一些虚拟的网页缺乏箭头,这意味着其相应的PTE有明确的Present标志。这可能是因为页面从未被触及或已被换出,因为它们的内容。无论在哪一种情况下,对这些网页的访问将导致页面错误,即使它们是在VMA。VMA和页表不同意,但这种情况经常发生,这似乎很奇怪。

一个VMA是像你的程序和内核之间的合同。你问的事情做了(内存分配,文件映射等),内核中说,“肯定”,它创建或更新相应的VMA。不过,这并不实际兑现请求权,它要等待一个页面故障发生时做实事。内核是一个懒散的,骗人的败类袋,这是虚拟内存的基本原则。它适用于大多数情况下,一些熟悉的一些令人吃惊的,但是规则是VMA的纪录已经商定,而PTE的反映究竟是懒惰的内核。这两个数据结构,管理程序的内存发挥作用,在解决缺页,释放内存,交换内存,等等。让我们来简单的情况下,内存分配:

例如,的需求分页和内存分配

当程序要求更多通过BRK()系统调用的内存,内核更新的堆VMA,并调用它的好。在这一点上没有页面帧的实际分配和新的页面在物理内存中是不存在的。一旦程序试图访问的网页,该处理器页面错误和do_page_fault()被调用。它搜索的VMA覆盖故障的虚拟地址使用find_vma() 。如果找到了,在VMA上的权限,还要检查对试图访问(读或写)。如果没有合适的VMA,没有合同包括企图内存,访问将被处以分割故障的过程。

当一个VMA被发现,内核必须处理的故障看的的PTE内容和类型的VMA。在我们的例子中,PTE显示的页面是不存在的。事实上,我们PTE是完全空白(零),在Linux从来没有被映射的虚拟页。由于这是一个匿名的VMA,我们有一个纯粹的RAM事必须处理的do_anonymous_page() ,分配一个页框,使PTE发生故障的虚拟页映射到新分配的框架。

事情可能会有所不同。PTE换出的页面,例如,0 Present标志,但不是空白的。相反,它存储交换位置的页面内容,它必须从磁盘中读取并加载到一个页面的框架的do_swap_page()在什么是所谓的重大故障

总结上半年我们的旅游内核的用户内存管理。我们将在下一篇文章中,将文件转换成组合,以建立一个完整的画面,记忆的基础,包括对性能的影响。

分享到:
评论

相关推荐

    awesome-linux-system-cn:收集网络上讲解Linux系统原理的优秀文章、文档

    采集网络上关于Linux系统相关的优秀资源。 目录 系统调用 :Linux系统调用权威指南。 :上文的中译版。...How The Kernel Manages Your Memory:内核如何管理内存 Understanding the Memory Layout of L

    how_linux_works_2nd_edition

    How the kernel manages devices, device drivers, and processes How networking, interfaces, firewalls, and servers work How development tools work and relate to shared libraries How to write ...

    How Linux Works: What Every Superuser Should Know, 2nd Edition

    How the kernel manages devices, device drivers, and processes How networking, interfaces, firewalls, and servers work How development tools work and relate to shared libraries How to write effective ...

    深入理解Linux内核(第三版).rar

    It manages limited memory so well that hundreds of processes can share the system efficiently, and expertly organizes data transfers so that the CPU isn't kept waiting any longer than necessary for ...

    Windows Internals 5th

    Understand how the core system and management mechanisms work from the object manager to services to the registry Explore internal system data structures using tools like the kernel debugger Grasp the...

    Microsoft.Press.Windows.Internals.5th.Edition.Jun.2009

    Understand how Windows manages physical and virtual memory Tour the Windows networking stack from top to bottom—including APIs, protocol drivers, and network adapter drivers Troubleshoot file-system ...

    Windows Internals: Including Windows Server 2008 and Windows Vista, Fifth Edition

    •Understand how Windows manages physical and virtual memory •Tour the Windows networking stack from top to bottom—including APIs, protocol drivers, and network adapter drivers •Troubleshoot file...

    [深入解析Windows操作系统.第5版].Microsoft.Press.Windows.Internals.5th.Edition.part1.rar

    Go inside the Windows security model to see how it authorizes access to data • Understand how Windows manages physical and virtual memory • Tour the Windows networking stack from top to bottom—...

    [深入解析Windows操作系统.第5版].Microsoft.Press.Windows.Internals.5th.Edition.part2.rar

    Go inside the Windows security model to see how it authorizes access to data • Understand how Windows manages physical and virtual memory • Tour the Windows networking stack from top to bottom—...

    微软内部资料-SQL性能优化2

    Before we look at how SQL Server uses and manages its memory, we need to ensure a full understanding of the more common memory related terms. The following definitions will help you understand how SQL...

    Hadoop from the beginning: The basics

    You will know how it manages the data files across different nodes in the cluster. The author helps you familiarize yourself with the various commands that you can use to perform various tasks within...

    3Process Management.docx

    This chapter introduces the... It defines the process, as well as related concepts such as threads, and then discusses how the Linux kernel manages each process: how they are enumerated within the kernel

    Linux System Programming

    The chapter culminates with a discussion on how the Linux kernel implements and manages files. Chapter 3, Buffered I/O This chapter discusses an issue with the basic file I/O interfaces—buffer size ...

    Linux系统编程

    culminates with a discussion on how the Linux kernel implements and manages files. Chapter 3, Buffered I/O This chapter discusses an issue with the basic file I/O interfaces—buffer size management— ...

    BSD4.4设计手册.7z

    2.5.2 Memory Management Inside the Kernel 2.6 I/O System 2.6.1 Descriptors and I/O 2.6.2 Descriptor Management 2.6.3 Devices 2.6.4 Socket IPC 2.6.5 Scatter/Gather I/O 2.6.6 Multiple Filesystem Support...

    Programming Rust Fast, Safe Systems Development

    Written by two experienced systems programmers, this book explains how Rust manages to bridge the gap between performance and safety, and how you can take advantage of it. Topics include: How Rust ...

    AngularJS Directives Cookbook(PACKT,2015)

    and allows you to focus more closely on improving the way that user interaction impacts the DOM and the way your app manages data. If you're already using Angular, you probably recognize the power of...

    DevOps with Kubernetes

    Kubernetes is an orchestrator that creates and manages your containers on clusters of servers. This book will guide you from simply deploying a container to administrate a Kubernetes cluster, and ...

Global site tag (gtag.js) - Google Analytics