공부 흔적을 남깁니다.
GitHub : https://github.com/trytoYH
Email : [email protected]
<aside> 👉 Contents
Prompt Compression
ORCA: A Distributed Serving System for Transformer-Based Generative Models
Fast Inference from Transformers via Speculative Decoding
RPC
</aside>
Contents
Untitled