1 Run DeepSeek R1 Locally with all 671 Billion Parameters
juliorqv468653 edited this page 5 months ago


Recently, I showed how to easily run distilled variations of the DeepSeek R1 design in your area. A distilled model is a compressed variation of a bigger language design, where understanding from a larger design is transferred to a smaller one to reduce resource use without losing too much performance. These models are based upon the Llama and Qwen architectures and be available in variations varying from 1.5 to 70 billion parameters.

Some explained that this is not the REAL DeepSeek R1 which it is impossible to run the full model in your area without several hundred GB of memory. That seemed like a difficulty - I believed! First Attempt - Heating Up with a 1.58 bit Quantized Version of DeepSeek R1 671b in Ollama.cpp

The developers behind Unsloth dynamically quantized DeepSeek R1 so that it could work on as low as 130GB while still gaining from all 671 billion specifications.

A quantized LLM is a LLM whose specifications are kept in lower-precision formats (e.g., 8-bit or [smfsimple.com](https://www.smfsimple.com/ultimateportaldemo/index.php?action=profile