- Pipeline optimizations. Write do not make syscalls. Flush makes syscalls to flush out to the socket all previous writes. Limit flushes as much as possible, but also limit writes as well since it need to traverse pipeline.
- GS-Pressure. Use VoidChannelPromise to reduce object creation if not interested in future result and no need to write listener in any channel outbound handler.
- Correctly write with respect to slow receivers. Make use of Channel.isWritable() to prevent out of memory error.
- Configure low and high write watermarks.
- Pass custom events through pipeline. Good fit for handshake notifications and more.
- Prefer ByteToMessageDecoder over ReplayingDecoder. ReplayingDecoder is slower because of more overhead in methods and needs to handle ReplayingError. Use ByteToMessageDecoder if it is possible without making things complicated.
- Use pooled direct buffers.
- Write direct buffers… always.
- Use ByteBufProcessor when need to find pattern in a ByteBuf. It is faster because it can eliminate range checks, can be created and shared, easier to inline by the JIT.
- Prefer alloc() over Unpooled.
- Prefer slice() and duplicate() over copy. Since they do not create extra copy of the buffer.
- Prefer bulk operations over loops. Because otherwise need range checks on each get.
- Use DefaultByteBufHolder for messages with payload. Gets reference-counting and release resources for free.
- File transfer. Use zero-memory-copy for efficient transfer of raw file content with DefaultFileRegion.
- Never block or perform computationally intensive operations on the EventLoop.
- EventLoop extends ScheduledExecturoService, so use it! Schedule and execute tasks in EventLoop.
- Reuse EventLoopGroup if you can. Sharing the same EventLoopGroup allows to keep the resource usage (like Thread-usage) to a minimum.
- Share EventLoop for proxy like applications to reduce context-switching.
- Combine operations when call outside EventLoop. To reduce overhead of wakeups and object creation.
- Operations from inside ChannelHandler. Use shortest path on pipeline if possible.
- Share ChannelHandlers if stateless.
- Remove ChannelHandler once not needed anymore. This keeps the pipeline as short as possible and so eliminate overhead of traversing as much as possible.
- Use proper buffer type in MessageToByteEncoder. This saves extra byte copies.
- Use auto-read flag to control flow. This can also be quite useful when writing proxy like applications.
- Don't use JDKs SSLEngine if performance matters. Use Twitters OpenSSL based SSLEngine. Netty will ship it by its own.
- Prefer inner static classes over anonymous classes for channel future listeners. You never know when ChannelFuture will be notified so it can prevent objects from garbage collection.
- Native Transport (epoll) for less GC and lower latency. Only works on Linus as epoll is supported atm.
References: