“Why is it so slow even though I have a GPU?” I’d like to share my three-week struggle, which began with this single question. Introduction While developing the Vision AI service, I chose Nvidia Triton Inference Server as the framework for model serving. Its features—such as multi-framework support, dynamic batching, and ensemble pipelines—were excellent, and I was particularly drawn to its ab