-
Replay-as-a-Service reduces tail latency in storage-disaggregated databases
Study presents Replay-as-a-Service technique to reduce tail latency in storage-disaggregated OLTP databases by decoupling log replay from storage engine, achieving 40% latency reduction.
-
Liger+ dynamically balances latency and throughput in large model inference
Distributed inference system using interleaved parallelism to dynamically balance latency-throughput trade-offs via task-aware batch management and strategic kernel scheduling across multiple GPUs.
-
MLOps optimizations for high-load recommendation systems
Engineering optimization of MLOps processes for high-load recommendation systems integrating streaming features, parameter servers, and online training for latency and quality under scale.