-
Thoth: Uncovering Data-Dependent Memory Access Patterns via Annotation-Directed Load Sampling
Thoth hardware prefetcher improves performance on sparse data structures by tracking producer-consumer load pairs and using annotation-directed sampling to capture complex memory access patterns.
-
PAT: Accelerating LLM Decoding via P refix- A ware A t tention with Resource Efficient Multi-Tile Kernel
PAT optimizes LLM decode-phase attention by exploiting shared request prefixes and adaptive kernel tiling, reducing memory bandwidth bottlenecks in multi-request serving scenarios.
-
Angular query orchestration reduced redundant GraphQL requests
Framework-aware query orchestration for Angular micro-frontends optimizes GraphQL data fetching through compile-time type safety and runtime deduplication, reducing API calls by 62% and improving.