Skip to content Skip to sidebar Skip to footer

Writing Into A Numpy Memmap Still Loads Into Ram Memory

I'm testing NumPy's memmap through IPython Notebook, with the following code Ymap = np.memmap('Y.dat', dtype='float32', mode='w+', shape=(5e6, 4e4)) As you can see, Ymap's shape i

Solution 1:

A (non-anonymous) mmap is a link between a file and RAM that, roughly, guarantees that when RAM of the mmap is full, data will be paged to the given file instead of to the swap disk/file, and when you msync or munmap it, the whole region of RAM gets written out to the file. Operating systems typically follow a lazy strategy wrt. disk accesses (or eager wrt. RAM): data will remain in memory as long as it fits. This means a process with large mmaps will eat up as much RAM as it can/needs before spilling over the rest to disk.

So you're right that an np.memmap array is an out-of-core array, but it is one that will grab as much RAM cache as it can.

Solution 2:

As the docs say:

Memory-mapped files are used for accessing small segments of large files on disk, without reading the entire file into memory.

There's no true magic in computers ;-) If you access very little of a giant array, a memmap gimmick will require very little RAM; if you access very much of a giant array, a memmap gimmick will require very much RAM.

One workaround that may or may not be helpful in your specific code: create new mmap objects periodically (and get rid of old ones), at logical points in your workflow. Then the amount of RAM needed should be roughly proportional to the number of array items you touch between such steps. Against that, it takes time to create and destroy new mmap objects. So it's a balancing act.

Post a Comment for "Writing Into A Numpy Memmap Still Loads Into Ram Memory"