Slicing direct buffers to workaround 4K overhead

Direct ByteBuffers can have an unexpected massive overhead. For every allocation, a number of bytes equal to the system pagesize (typically 4K) is added. The reason for this is that MappedByteBuffers must be page-aligned. The code used behind the scenes in ByteBuffer.allocateDirect(int size) looks a little something like this:

private static ByteBuffer malloc(int size)
   int pageSize = unsafe.getPageSize(); // typically 4096
   long pointer = sun.misc.Unsafe.malloc(size + pageSize);
   long base = (pointer + (pageSize-1)) / pageSize * pageSize;
   ByteBuffer buffer = unsafe.createBufferAt(base);
   return buffer;

Code like this will have a massive overhead:
for(int i=0; i<count; i++)
   buffers[i] = ByteBuffer.allocateDirect(64);

Instead use an approach like below:
public class DirectBufferProvider
   private static final int ALLOCATION_SIZE = 1024*1024;

   private ByteBuffer currentBuffer = null;

   public ByteBuffer allocate(int size)
      if(size >= ALLOCATION_SIZE)
         return ByteBuffer.allocateDirect(size);

      if(currentBuffer == null || size > currentBuffer.remaining())
         currentBuffer = ByteBuffer.allocateDirect(ALLOCATION_SIZE);

      currentBuffer.limit(currentBuffer.position() + size);
      ByteBuffer result = currentBuffer.slice();
      return result;

   private static DirectBufferProvider global_synced = new DirectBufferProvider();

   public static ByteBuffer allocateDirect(int size)
         return global_synced.allocate(size);

Note that unlike ByteBuffer.allocateDirect(), the code in DirectBufferProvider.allocate() is not threadsafe. The static (convenience) method is synchronized and should thus not be used in heavily multithreaded code. Using ThreadLocals is even worse, performance wise, so just make new DirectBufferProvider instances when you need them.

No comments:

Post a Comment