Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Excellent, excellent article! I have a question though.

> Couldn't we rearrange the memory to get a block of 6 contiguous bytes? Some sort of defragmentation process?

> Sadly not. Remember earlier we talked about how the return value of malloc is the address of a byte in memory? Moving allocations won't change the pointers we have already returned from malloc. We would change the value those pointers are pointed at, effectively breaking them. This is one of the downsides of the malloc/free API.

But why not? Couldn't we store information about old pointers somewhere and match them with new addresses when defragmenting? Some kind of virtual memory driver that would map old pointers to new adresses transparently for the programs? Or would it be too much overhead for too little benefit?



Most OSes today do that "transparently" with virtual memory, but usually with a coarse granularity (e.g. 4k). A page table is just a translation of "pointers" to "memory addresses"; the OS can rearrange physical memory as it sees fit without the program having to update its pointers.

In OSes without virtual memory, one option is to do the same non-transparently: instead of returning pointers, malloc and friends work with "pointers to pointers" (called handles), so there is one extra level of indirection, and now the OS is free to rearrange this 2nd level as it sees fit. Whenever a program wants to read/write the data behind a handle, it must dereference the handle to get to the real pointer, but it also must let the OS know that it is currently using the real pointer -- this is to avoid the OS moving it around. This is usually called "locking/unlocking" the handle.

Some classic examples are Windows 3.x, Mac OS (toolbox), PalmOS, etc.

https://en.wikipedia.org/wiki/Classic_Mac_OS_memory_manageme...


> Some kind of virtual memory driver that would map old pointers to new adresses transparently for the programs

You would need hardware support for this, since the hardware is what decides what gets returned when a program attempts to read from a memory location.

Hardware already does support virtual memory but the granularity is the page (which are a minimum of 4KiB in most OSs).


To do that you either need a structure that you update every time a pointer is created, copied, moved or deleted (too much overhead), or you need a way to scan the entire memory and get all the pointers. And at the point where you have a piece of code that knows where every pointer is, you already know which pointers aren't used anywhere anymore so it's a waste to not have it also call free() for you.

Once you have it call free() for you, your piece of code is now a compacting GC, like Java's for example.


> But why not? Couldn't we store information about old pointers somewhere and match them with new addresses when defragmenting? Some kind of virtual memory driver that would map old pointers to new adresses transparently for the programs? Or would it be too much overhead for too little benefit?

In languages where memory is managed for you, you can absolutely do this, since the runtime can find every single pointer to an object and rewrite it.

Virtual memory can let you do this, but would require a separate page for each allocation (since virtual memory operates on a page-level). Given that the smallest page on modern architectures is 4k, this would mean using 4k of ram for each allocation (and rounding up each allocation to a 4k page boundary).

On top of that, it's an OS system call to map and unmap pages, which means you incur a system-call on every allocation and deallocation, which is much slower than using a user-space allocator.


You’d need cooperation between your malloc implementation and the application code. It’s possible, but tricky. If your malloc returned a pointer to a pointer, and promised to keep the first level pointer up to date, and was able to lock your application whenever it moved things around, it could work.

Someone else already mentioned, but garbage collected languages do actually do this. Because they’re fully in control of memory (the language exposes no raw pointers), they’re able to create the layer of indirection you’ve suggested and move things around as they please. I know at minimum the JVM does this. The term to search for is “heap compaction.”

There’s also the weird and wonderful work of Emery Berger et al with their “Mesh” malloc implementation, which blows my mind: https://youtu.be/XRAP3lBivYM.


In a language like C that isn't really possible because the language can't keep track of all of the places that memory address is stored.

If malloc were to return something like an address that holds the address of memory allocated there is nothing preventing the program from reading that address, doing math on it, and storing it somewhere else.


Early MacOS did this with handles (basically pointers to pointers.) You'd lock them to read/write and when they were unlocked, the OS was free to move the underlying memory and change the pointer in the handle.


Well, that's what some GCs do, and they do indeed defragment the heap.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: