Data parallelism
-
Liger+ dynamically balances latency and throughput in large model inference
Distributed inference system using interleaved parallelism to dynamically balance latency-throughput trade-offs via task-aware batch management and strategic kernel scheduling across multiple GPUs.

