Describe the bug
Followed this doc and have set the grpcMaxMessageSizeBytes
to 400000000
here's my config.yaml from model-serving-config ConfigMap
podsPerRuntime: 1
metrics:
enabled: true
grpcMaxMessageSizeBytes: 400000000
When I make a grpc call, it throws
Dec 01, 2022 8:15:15 PM io.grpc.netty.NettyServerTransport notifyTerminated
INFO: Transport failed
io.netty.util.internal.OutOfDirectMemoryError: failed to allocate 4194304 byte(s) of direct memory (used: 75497758, max: 76546048)
at io.netty.util.internal.PlatformDependent.incrementMemoryCounter(PlatformDependent.java:806)
at io.netty.util.internal.PlatformDependent.allocateDirectNoCleaner(PlatformDependent.java:735)
at io.netty.buffer.PoolArena$DirectArena.allocateDirect(PoolArena.java:649)
at io.netty.buffer.PoolArena$DirectArena.newChunk(PoolArena.java:624)
at io.netty.buffer.PoolArena.allocateNormal(PoolArena.java:203)
at io.netty.buffer.PoolArena.tcacheAllocateNormal(PoolArena.java:187)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:136)
at io.netty.buffer.PoolArena.allocate(PoolArena.java:126)
at io.netty.buffer.PooledByteBufAllocator.newDirectBuffer(PooledByteBufAllocator.java:396)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:188)
at io.netty.buffer.AbstractByteBufAllocator.directBuffer(AbstractByteBufAllocator.java:179)
at io.netty.buffer.AbstractByteBufAllocator.buffer(AbstractByteBufAllocator.java:116)
at io.netty.handler.codec.ByteToMessageDecoder.expandCumulation(ByteToMessageDecoder.java:541)
at io.netty.handler.codec.ByteToMessageDecoder$1.cumulate(ByteToMessageDecoder.java:97)
at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:277)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:357)
at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:365)
at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800)
at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:487)
at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:385)
in triton, I can see that the model is working fine and you can also see the input shape and that the model did release the response
I am not sure what other changes I need to get this to work.
I suppose this is where it's being set. I tried to back track it to some setting but couldn't.
I1201 20:15:13.885755 1 grpc_server.cc:3466] Process for ModelInferHandler, rpc_ok=1, 15 step START
I1201 20:15:13.885773 1 grpc_server.cc:3459] New request handler for ModelInferHandler, 17
I1201 20:15:13.885778 1 model_repository_manager.cc:593] GetModel() '6230834ea7f575001e824ce9__isvc-14826e7e9a' version -1
I1201 20:15:13.885783 1 model_repository_manager.cc:593] GetModel() '6230834ea7f575001e824ce9__isvc-14826e7e9a' version -1
I1201 20:15:13.885792 1 infer_request.cc:675] prepared: [0x0x7f3bc0007b50] request id: , model: 6230834ea7f575001e824ce9__isvc-14826e7e9a, requested version: -1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7f3bc0007e58] input: CLIP, type: UINT8, original shape: [1,32,589,617,3], batch + shape: [1,32,589,617,3], shape: [32,589,617,3]
override inputs:
inputs:
[0x0x7f3bc0007e58] input: CLIP, type: UINT8, original shape: [1,32,589,617,3], batch + shape: [1,32,589,617,3], shape: [32,589,617,3]
original requested outputs:
requested outputs:
classes
scores
I1201 20:15:13.885874 1 python.cc:616] model 6230834ea7f575001e824ce9__isvc-14826e7e9a, instance 6230834ea7f575001e824ce9__isvc-14826e7e9a, executing 1 requests
I1201 20:15:13.947333 1 infer_response.cc:166] add response output: output: classes, type: BYTES, shape: [1,1]
I1201 20:15:13.947355 1 grpc_server.cc:2555] GRPC: using buffer for 'classes', size: 18, addr: 0x7f3aec004b90
I1201 20:15:13.947360 1 infer_response.cc:166] add response output: output: scores, type: FP32, shape: [1,1]
I1201 20:15:13.947363 1 grpc_server.cc:2555] GRPC: using buffer for 'scores', size: 4, addr: 0x7f3aec004d70
I1201 20:15:13.947367 1 grpc_server.cc:3618] ModelInferHandler::InferResponseComplete, 15 step ISSUED
I1201 20:15:13.947375 1 grpc_server.cc:2637] GRPC free: size 18, addr 0x7f3aec004b90
I1201 20:15:13.947379 1 grpc_server.cc:2637] GRPC free: size 4, addr 0x7f3aec004d70
I1201 20:15:13.947442 1 grpc_server.cc:3194] ModelInferHandler::InferRequestComplete
I1201 20:15:13.947451 1 python.cc:1960] TRITONBACKEND_ModelInstanceExecute: model instance name 6230834ea7f575001e824ce9__isvc-14826e7e9a released 1 requests
I1201 20:15:13.947455 1 grpc_server.cc:3466] Process for ModelInferHandler, rpc_ok=1, 15 step COMPLETE
Additional context
The model being used is a video sequence classification model and the input for it is a sequence of 32 cropped frames, hence such huge input size. I did try using encoding the cropped sequence into h.264 and decoding in the model.py but it adds a lot of overhead on inference speed. hence I am trying to infer using the large input tensor.
bug