Parallel Computing and Optimization Techniques

External reference: https://openalex.org/T10054

Adaptive CPU frequency scaling for energy-efficient and sustainable edge computing under renewable energy uncertainty
Deep reinforcement learning improves CPU frequency scaling for edge computing systems powered by renewable energy, reducing prediction error by 35% and optimizing the energy-latency tradeoff.
Thoth: Uncovering Data-Dependent Memory Access Patterns via Annotation-Directed Load Sampling
Thoth hardware prefetcher improves performance on sparse data structures by tracking producer-consumer load pairs and using annotation-directed sampling to capture complex memory access patterns.
CXL-SpecKV: A Disaggregated FPGA Speculative KV-Cache for Datacenter LLM Serving
System offloads key-value caches to remote FPGA memory using CXL interconnects, achieving 3.2× throughput gains and 2.8× memory cost reduction for datacenter LLM serving.
It’s about Time: Temporal Abstractions for Asynchronous GPU Tensor Computations
Framework for temporal abstractions that simplify coordination of asynchronous GPU tensor computations, reducing complexity and hardware-dependent errors in specialized concurrent execution.
SLAWS: Spatial Locality Analysis and Workload Orchestration for Sparse Matrix Multiplication
SLAWS framework enhances sparse matrix multiplication by analyzing data locality patterns and orchestrating workloads adaptively, overcoming limitations of fixed-architecture accelerators.
PAT: Accelerating LLM Decoding via P refix- A ware A t tention with Resource Efficient Multi-Tile Kernel
PAT optimizes LLM decode-phase attention by exploiting shared request prefixes and adaptive kernel tiling, reducing memory bandwidth bottlenecks in multi-request serving scenarios.
Queueing model reduces energy use in ternary optical computers
Study proposes queuing-based service model to optimize energy consumption and performance in ternary optical computers through threshold-based scheduling.
Liger+ dynamically balances latency and throughput in large model inference
Distributed inference system using interleaved parallelism to dynamically balance latency-throughput trade-offs via task-aware batch management and strategic kernel scheduling across multiple GPUs.
WaSC decouples WASM system access with low startup and memory use
WaSC hardens WebAssembly sandboxes through system interface decoupling, achieving machine-level isolation while maintaining WASM performance advantages for serverless computing environments.
Integrating Quantum Software Tools with(in) MLIR
A practical guide for integrating quantum software tools using MLIR infrastructure, demonstrated through a case study connecting PennyLane and Munich Quantum Toolkit.