Featured image of post Recent Experience with DeepSeek Ollama

Recent Experience with DeepSeek Ollama

During the Spring Festival, DeepSeek was almost entirely unavailable due to well-known reasons. Especially the API, which often went down for entire days. Even though the webpage was restored this morning, when I opened Cline in VS Code, it just kept spinning without any response. This forced me to purchase a third-party API to use. Additionally, I took the opportunity to test the local deployment of DeepSeek Ollama.

DeepSeek Service Status

Deployment Methods

After DeepSeek V3 and R1 were open-sourced, they quickly became the hottest large models. Riding on this wave, I deployed such a large model locally for the first time. Previously, I had mistakenly thought that only high-end GPUs could handle it, but to my surprise, it could also run without a GPU.

Oracle Cloud Deployment

My first test was on an ARM server on Oracle Cloud. This is a free 4C24G ARM server.

Since I had already installed many applications on this server, I manage them using Docker through 1Panel. So, I directly downloaded and installed the ollama and ollama-webui applications from the app store. (However, the latter is not very useful.)

1Panel Applications

Almost no setup was required. After installing ollama, I simply entered the following command in the terminal within the container to operate.

1
ollama run deepseek-r1:1.5b

Oracle Cloud Deployment of DeepSeek 1.5b

Thanks to Oracle Cloud’s high-speed network, pulling a 4.7GB deepseek:r1-7b took only about 15 seconds, making it ready to use immediately.

Oracle Cloud Pull Speed for DeepSeek

DeepSeek currently has 7 ollama versions, ranging from small to large: 1.5b, 7b, 8b, 14b, 32b, 70b, and 671b. The corresponding repository sizes are as follows.

Model1.5b7b8b14b32b70b671b
Repository Size1.1GB4.7GB4.9GB9.0GB20GB43GB404GB

Additionally, there are some other versions of DeepSeek r1 repositories on ollama, which are several times larger than the default versions. However, I didn’t notice much difference.

Oracle Cloud Deployment of DeepSeek 7b

The repository size is a critical metric because it determines whether the model can run on a device. For example, the 1.5b version, which corresponds to a 1.1GB file, requires 1GB of memory or VRAM to load. Similarly, the 70b version requires nearly 40GB of memory or VRAM to load.

ollama-webui

This is an application within 1Panel. As the name suggests, it pulls the DeepSeek API to a webpage for use. Configuration is also very simple; you just need to fill in the interface address of the ollama application during installation.

In Docker, you can directly fill in the Docker internal network IP if it’s on the same server. If you need to use domain name reverse proxy, remote server IP, or enable HTTPS, you can refer to the 1Panel documentation.

However, it’s important to note that the ollama interface requires the http:// prefix. Simply filling in ip:port won’t work.

ollama-webui Settings

ollama-webui Interface


Windows PC Deployment

After deploying two small DeepSeek models on Oracle, I found it quite fascinating. It didn’t use any intimidating GPU but ran directly on CPU + memory. Although running DeepSeek 7B on a weak, free Oracle VPS was quite painful, with an output of only 2 tokens/s, running the 1.5b version at 11 tokens/s was quite impressive. So, I thought of installing the 7b version on my computer for fun. After all, according to the official website, this 7b version even outperforms GPT-4o in some aspects.

ModelAIME 2024AIME 2024MATH-500GPQA DiamondLiveCodeBenchCodeForces
GPT-4o9.313.474.649.932.9759
Claude-3.516.026.778.365.038.9717
o1-mini63.680.090.060.053.81820
QwQ-32B44.060.090.654.541.91316
DeepSeek-R1–1.5B28.952.783.933.816.9954
DeepSeek-R1–7B55.583.392.849.137.61189
DeepSeek-R1–14B69.780.093.959.153.11481
DeepSeek-R1–32B72.683.394.362.157.21691
DeepSeek-R1–8B50.480.089.149.039.61205
DeepSeek-R1–70B70.086.794.565.257.51633

Installation Steps

  1. Download and install the ollama software from the official website (Official Download Link). After installation, a small llama icon will appear in the bottom right corner of the desktop.

ollama windows

  1. Press Windows+R to bring up the Run window, enter cmd to open the command line tool.

CMD

  1. Enter ollama run deepseek-r1:1.5b to run the command, which will automatically download the DeepSeek 1.5b version (modify the number for other versions). It will be ready to use immediately after downloading.

Running DeepSeek Local Version

Some Test Results

I have only tested it on two of my weak laptops. One has no dedicated GPU, and the other does. Here, I’ll mainly talk about the one without a dedicated GPU.

  1. The laptop CPU decoding DeepSeek 1.5b achieves about 22 tokens/s, which is quite fast.

11th Gen Intel i5 Laptop CPU Decoding DeepSeek 1.5b

  1. The laptop CPU decoding DeepSeek 7b achieves about 7 tokens/s, which is acceptable.

11th Gen Intel i5 Laptop CPU Decoding DeepSeek 7b

  1. Testing with specified CPU core counts. Since my CPU is 4 cores and 8 threads, I tested the impact of modifying the CPU core count on decoding DeepSeek 7b. The results showed that the default settings were the best. Perhaps it’s because my CPU is too weak.

Specifying 8 Cores for Full Power Decoding DeepSeek 7b, Speed Decreases Instead

Specifying 2 Cores for Forced Decoding DeepSeek 7b, Speed Drops by 40%

Method to specify CPU core count: /set parameter num_thread 8 (Here, threads refer to physical cores, not CPU threads). If using Docker, simply modify the maximum available CPU count for the project. For example, limiting a 4-core server to a maximum of 3.5 cores will prevent the CPU from being fully loaded at 100%.

Conclusion

The local version of DeepSeek is just a supplement. Unless you have a very powerful server capable of directly installing the 671b version, third-party APIs online are still more useful. I estimate that the 671b version would require at least 384GB of memory to run without a GPU. Similarly, to handle that much memory, you would probably need a CPU with at least 32 threads. But I haven’t played with such high-end hardware, so I can only envy it.

Finally, one advantage of this local version is that it is significantly less sensitive than the online version, especially the web version on the DeepSeek official website. Many political terms can be used normally in the local version, but are rejected on the official web version.

Built with Hugo, Powered by Github.
Total Posts: 322, Total Words: 429347.
本站已加入BLOGS·CN