High-performance LLM inference engine in C++ and CUDA.
Tiny-vLLM is a C++ and CUDA-based inference engine designed for running Large Language Models (LLMs) efficiently. It prioritizes high performance and low resource usage, making it suitable for various deployment scenarios.
Unknown