Advanced Neural Network Applications
External reference: https://openalex.org/T10036
-
Liger+ dynamically balances latency and throughput in large model inference Distributed inference system using interleaved parallelism to dynamically balance latency-throughput trade-offs via task-aware batch management and strategic kernel scheduling across multiple GPUs.

