DEV Community

# gpu

Posts

đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.
From one model to seven — what it took to make TurboQuant model-portable

From one model to seven — what it took to make TurboQuant model-portable

Comments
3 min read
Mixture of Experts

Mixture of Experts

Comments
3 min read
124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

Comments
5 min read
MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected

MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected

Comments
5 min read
The Memory Bandwidth Gap Is 49x and Growing — Why Local LLMs Hit a Ceiling

The Memory Bandwidth Gap Is 49x and Growing — Why Local LLMs Hit a Ceiling

Comments
7 min read
PyRadiomics Inefficiency in Large-Scale Studies Addressed by GPU Acceleration for Faster Processing

PyRadiomics Inefficiency in Large-Scale Studies Addressed by GPU Acceleration for Faster Processing

Comments
8 min read
Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

Comments
4 min read
I Tested TurboQuant KV Cache Compression on Consumer GPUs. Here's What Actually Happened.

I Tested TurboQuant KV Cache Compression on Consumer GPUs. Here's What Actually Happened.

Comments
6 min read
GPU Server for AI Inference: Bare Metal vs. Cloud vGPU

GPU Server for AI Inference: Bare Metal vs. Cloud vGPU

Comments
2 min read
I fused 1,500 GPU dispatches into one. Here's what happened.

I fused 1,500 GPU dispatches into one. Here's what happened.

Comments
2 min read
Nvidia GreenBoost Lets You Fake More VRAM — And It Actually Kind of Works

Nvidia GreenBoost Lets You Fake More VRAM — And It Actually Kind of Works

Comments
4 min read
Boost Local LLMs: TurboQuant KV Cache, Fast Cold Starts, & Rust GPU Dev

Boost Local LLMs: TurboQuant KV Cache, Fast Cold Starts, & Rust GPU Dev

Comments
4 min read
Fix Zombie VRAM: Clear GPU Memory Without Rebooting

Fix Zombie VRAM: Clear GPU Memory Without Rebooting

1
Comments
4 min read
I shipped Google's TurboQuant as a vLLM plugin 72 hours after the paper — here's what nobody else tested

I shipped Google's TurboQuant as a vLLM plugin 72 hours after the paper — here's what nobody else tested

2
Comments
3 min read
Local LLM Power-Ups: Voxtral TTS, TurboQuant, & Sub-Second Cold Starts

Local LLM Power-Ups: Voxtral TTS, TurboQuant, & Sub-Second Cold Starts

Comments
3 min read
đź‘‹ Sign in for the ability to sort posts by relevant, latest, or top.