Understanding: Gpu | Knowledge Hub

👋 Sign in for the ability to sort posts by relevant, latest, or top.

Alberto Nieto

Apr 1

From one model to seven — what it took to make TurboQuant model-portable

#python #vllm #gpu #triton

3 min read

Krun_pro

Apr 1

Mixture of Experts

#mixture #moe #gpu

3 min read

Ingero

Apr 1

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

#pytorch #gpu #python #cuda

5 min read

plasmon

Mar 31

MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected

#llm #machinelearning #ai #gpu

5 min read

plasmon

Mar 31

The Memory Bandwidth Gap Is 49x and Growing — Why Local LLMs Hit a Ceiling

#hardware #ai #machinelearning #gpu

7 min read

Roman Dubrovin

Mar 31

PyRadiomics Inefficiency in Large-Scale Studies Addressed by GPU Acceleration for Faster Processing

#radiomics #gpu #pytorch #medicalimaging

8 min read

Ingero

Mar 31

Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

#pytorch #cuda #python #gpu

4 min read

Christopher Maher

Mar 30

I Tested TurboQuant KV Cache Compression on Consumer GPUs. Here's What Actually Happened.

#llm #kubernetes #gpu #ai

6 min read

INGATE GmbH

Mar 31

GPU Server for AI Inference: Bare Metal vs. Cloud vGPU

#ai #gpu #cloudcomputing #infrastructure

2 min read

Ahmet Barış Günaydın

Mar 30

I fused 1,500 GPU dispatches into one. Here's what happened.

#webgpu #javascript #performance #gpu

2 min read

TechPulse Lab

Mar 29

Nvidia GreenBoost Lets You Fake More VRAM — And It Actually Kind of Works

#nvidia #opensource #ai #gpu

4 min read

soy

Mar 28

Boost Local LLMs: TurboQuant KV Cache, Fast Cold Starts, & Rust GPU Dev

#gpu #ai #performance

4 min read

Jakson Tate

Mar 28

Fix Zombie VRAM: Clear GPU Memory Without Rebooting

#linux #gpu #docker #devops

4 min read

Alberto Nieto

Mar 27

I shipped Google's TurboQuant as a vLLM plugin 72 hours after the paper — here's what nobody else tested

#ai #python #machinelearning #gpu

3 min read

soy

Mar 27

Local LLM Power-Ups: Voxtral TTS, TurboQuant, & Sub-Second Cold Starts

#gpu #ai #performance

3 min read

👋 Sign in for the ability to sort posts by relevant, latest, or top.

DEV Community

# gpu

From one model to seven — what it took to make TurboQuant model-portable

Mixture of Experts

124x Slower: What PyTorch DataLoader Actually Does at the Kernel Level

MoE Beat Dense 27B by 2.4x on 8GB VRAM — The 35B-A3B Benchmark Nobody Expected

The Memory Bandwidth Gap Is 49x and Growing — Why Local LLMs Hit a Ceiling

PyRadiomics Inefficiency in Large-Scale Studies Addressed by GPU Acceleration for Faster Processing

Tracing a 13x PyTorch Slowdown to a Hidden NumPy Synchronization

I Tested TurboQuant KV Cache Compression on Consumer GPUs. Here's What Actually Happened.

GPU Server for AI Inference: Bare Metal vs. Cloud vGPU

I fused 1,500 GPU dispatches into one. Here's what happened.

Nvidia GreenBoost Lets You Fake More VRAM — And It Actually Kind of Works

Boost Local LLMs: TurboQuant KV Cache, Fast Cold Starts, & Rust GPU Dev

Fix Zombie VRAM: Clear GPU Memory Without Rebooting

I shipped Google's TurboQuant as a vLLM plugin 72 hours after the paper — here's what nobody else tested

Local LLM Power-Ups: Voxtral TTS, TurboQuant, & Sub-Second Cold Starts