Hypura: Storage-Tier-Aware LLM Inference Scheduler for Apple Silicon
Hypura is an open-source LLM inference scheduler that enables running large language models that exceed physical memory on Apple Silicon Macs. It intelligently distributes model tensors across GPU, RAM, and NVMe storage tiers based on access patterns and bandwidth costs.
The project solves a critical limitation: a 32GB M1 Max cannot naively load a 40GB model without the OS swap-thrashing until the OOM killer intervenes. Hypura makes previously impossible inference scenarios usable β running Mixtral 8x7B at 2.2 tokens/second and Llama 70B at 0.3 tokens/second on hardware where llama.cpp simply crashes.
Key features include expert-streaming mode for MoE models like Mixtral with 99.5% cache hit rate via neuron caching, dense FFN-streaming for non-MoE models like Llama 70B, an Ollama-compatible HTTP API, and zero overhead for models that fit in memory.
Created on March 13, 2026, Hypura is trending on Hacker News with 194 points and has gained 346 stars on GitHub. It represents a meaningful step toward democratizing large model inference on consumer Apple hardware.
GitHub: https://github.com/t8/hypura
← Back to all articles
The project solves a critical limitation: a 32GB M1 Max cannot naively load a 40GB model without the OS swap-thrashing until the OOM killer intervenes. Hypura makes previously impossible inference scenarios usable β running Mixtral 8x7B at 2.2 tokens/second and Llama 70B at 0.3 tokens/second on hardware where llama.cpp simply crashes.
Key features include expert-streaming mode for MoE models like Mixtral with 99.5% cache hit rate via neuron caching, dense FFN-streaming for non-MoE models like Llama 70B, an Ollama-compatible HTTP API, and zero overhead for models that fit in memory.
Created on March 13, 2026, Hypura is trending on Hacker News with 194 points and has gained 346 stars on GitHub. It represents a meaningful step toward democratizing large model inference on consumer Apple hardware.
GitHub: https://github.com/t8/hypura
Comments