Cache

  1. Thoth: Uncovering Data-Dependent Memory Access Patterns via Annotation-Directed Load Sampling
    Thoth hardware prefetcher improves performance on sparse data structures by tracking producer-consumer load pairs and using annotation-directed sampling to capture complex memory access patterns.
  2. PAT: Accelerating LLM Decoding via P refix- A ware A t tention with Resource Efficient Multi-Tile Kernel
    PAT optimizes LLM decode-phase attention by exploiting shared request prefixes and adaptive kernel tiling, reducing memory bandwidth bottlenecks in multi-request serving scenarios.
  3. Angular query orchestration reduced redundant GraphQL requests
    Framework-aware query orchestration for Angular micro-frontends optimizes GraphQL data fetching through compile-time type safety and runtime deduplication, reducing API calls by 62% and improving.