Dram
-
PAT: Accelerating LLM Decoding via P refix- A ware A t tention with Resource Efficient Multi-Tile Kernel
PAT optimizes LLM decode-phase attention by exploiting shared request prefixes and adaptive kernel tiling, reducing memory bandwidth bottlenecks in multi-request serving scenarios.

