Data parallelism

  1. Liger+ dynamically balances latency and throughput in large model inference
    Distributed inference system using interleaved parallelism to dynamically balance latency-throughput trade-offs via task-aware batch management and strategic kernel scheduling across multiple GPUs.