FREAD vs MMAP

Recently I attended a tech talk where the speaker was telling about how did their company benefitted from using the MMAPs instead of more traditional methods of reading files like FREADs.

One of the questions which struck a chord for me was “When not to use Memory Mapped Files”. There were certain articles on the internet but I was not able to convince myself with those articles. So I took out some time and thought to write about the topic myself in details. So before diving deep into this topic, let’s first understand the basics 

Understanding READ() System call

val filePath = "/tmp/file1.txt"
val file = new File(filePath)
val byte = new FileInputStream(file).read()

In the above code sample, we can clearly see,

  • We have a file located at “/tmp/file1.txt”
  • We open the file handle
  • We create an inputstream from the file handle and then read an byte from the inputstream

Now lets look at what actually would be happening within the kernel when we issues the read request

On Issuing the READ REQUEST

  • When the user space program asks for reading from the InputStream, user space thread issues a READ() System call
  • This User Space Thread goes into blocking mode
  • Kernel Issue a DMA Request to the Block Device

Blank Diagram (18).png

On DMA Request Completion

  • DMA transfers the relevant bytes to the Kernel Memory aka OS Disk Cache
  • After the DMA is complete, the processor copies the bytes from the Kernel Memory to the Process Memory. Eg. In this case , we can see “Sheldon” being copied first from the file to the Kernel Memory and then to User Memory.

Blank Diagram (17).png

So READ System Call copies the content from the memory twice

  • Device to Kernel Memory via DMA
  • Kernel Memory to User Memory

Understanding MMAP() System call

val filePath = "/tmp/path"
val file = new File(filePath)
val randomAccessFile = new RandomAccessFile(file)
val memoryMappedByteBuffer = randomAccessFile.getChannel.map(FileChannel.MapMode.READ_WRITE, 0 , 100)

In the above code sample , we can see

  • We open a RandomAccessFile handle to the file.
  • After opening the handle, we read the contents of the file from the offset specified

On Issuing the READ REQUEST

  • When the user space program asks for reading from the InputStream, user space thread issues a READ() System call
  • This User Space Thread goes into blocking mode
  • Kernel Issue a DMA Request to the Block Device

Blank Diagram (18).png

  • After the kernel issues a DMA request, it also creates a page table entry from the process address space and points it down to the memory page which will contain the data after the DMA request is completed. NOTE: This is just creating a pointer from within the process address space via page table magic and the page content is not copied from the kernel address space to process address space.

Blank Diagram (20).png

On DMA Request Completion

  • DMA transfers the relevant bytes to the Kernel Memory aka OS Disk Cache
  • After the DMA is complete, the processor does not copies the bytes from the Kernel Memory to the Process Memory and instead just creates a mapping in the virtual memory of the process to the kernel memory which contains those actual bytes from the device.Blank Diagram (19).png

So MMAP System Call copies the content just once from the device to the kernel memory.

Reading 1 GB file

Let’s try reading 1 GB of a file via both system calls and understand the differences in execution.

With READ System Call

val fileReader = new FileReader(new File("testfile.txt"))
(1 to numBytes).foreach(e => {
  fileReader.read()
})

Lets go through what all operations would be going under the good

  • First System Call would be for creating a file descriptor. This system call would create all the necessary kernel data structures which will be used for communicating with this file.
  • After creating file descriptor, every-time a process issues a READ on the file descriptor, it issues a SYSTEM CALL which essentially does all the steps which we talked about previously in Section: “Understanding READ() System Call”
    • This seems like a costly operation in itself because we are issuing a SYSTEM CALL for each of the bytes which we are reading. And also each time the page gets copied twice from the Device to Kernel Memory and then from Kernel Memory to User Memory.

With MMAP System Call

val numBytes = 1.Gb
val file = new RandomAccessFile("testfile.txt", "r")
val out = file.getChannel.map(FileChannel.MapMode.READ_WRITE, 0, numBytes)

(1 to numBytes).foreach(e => {
  out.get(e)
})

Lets go through what all operations would be going under the good

  • First System Call would be for creating a Memory Mapping from the file to a particular region in Virtual Memory for the process. This system call does all the work needed for setting up the relevant kernel data structures for creating the memory mapping from the file to a region in physical memory for the DMA.
  • After the mapping is complete, we issue a GET on a particular memory location within the process virtual memory. This is not a SYSTEM call, however due to unavailability of the page ( because of ongoing transfer of data from the mapped file to the memory location via DMA ) , there might be a page fault which might lead to blocking of the current call.

Conclusions

So all in all , we now understand the basic differences between MMAP and READ system calls and why MMAP system call might be more performant as compared to READ system call. In the next blog post, we will try to understand why shouldnt we blindly use MMAP everywhere and what might be the downsides with MMAP.

Thanks everyone for reading this post !!!!!

References

Leave a Reply