Consider this example: An amazing new software tool emerges suddenly, turning technology industry expectations on their heads by delivering unprecedented performance at a fraction of the existing cost. The only catch? Its backstory is a bit shrouded in mystery and it comes from a region that is, for better or worse, in the media spotlight.

If you’re reading between the lines, you of course know that I’m talking about DeepSeek, a large language model (LLM) that uses an innovative training technique to perform as well as (if not better than) similar models for a purported fraction of the typical training cost. But there are well-founded concerns around the model, both geopolitical (the startup is China-based) and technological (Was its training data legitimate? How accurate is that cost figure?).

Some might say that the various concerns around DeepSeek, many of which start on the privacy side of the coin, are overblown. Others, including organizations, states, and even countries, have banned downloads of DeepSeek’s models.

Me? I just wanted to test the model’s crazy performance claims and understand how it works—even if it had bias, even if it was kind of weird, even if it was indoctrinating me into its subversive philosophy (that’s a joke, people). I was willing to take the risk to see how DeepSeek’s advances might be used today and influence AI moving forward. With that said, I certainly didn’t want to download DeepSeek to my phone or to any other network-connected device. I didn’t want to sign up to their service, give them my credentials, or leak my prompts to a web service.

So, I decided to run the model locally using RamaLama.

Spinning up DeepSeek with RamaLama

RamaLama is an open source project that facilitates local management and serving of AI models through the use of container technology. The RamaLama project is all about reducing friction in AI workflows. By using OCI containers as the foundation for deploying LLMs, RamaLama aims to mitigate or even eliminate issues related to dependency management, environment setup, and operational inconsistencies.

Upon launch, RamaLama inspects your system for GPU support. If no GPUs are detected it falls back to CPUs. RamaLama then uses a container engine such as Podman or Docker to download an image that includes all of the software necessary to run an AI model for your system’s setup. Once the container image is in place, RamaLama pulls the specified AI model from a model registry. At this point, it launches a container, mounts the AI model as a data volume, and starts either a chatbot or a REST API endpoint (depending on what you want).

A single command!

That part still makes me super-excited. So excited, in fact, that I recently sent an email to some of my colleagues encouraging them to try it for themselves as a way to (more safely and easily) test DeepSeek.

Here, for context, is what I said:

I want to show you how easy it is to test deepseek-r1. It’s a single command. I know nothing about DeepSeek, how to set it up. I don’t want to. But I want to get my hands on it so that I can understand it better. RamaLama can help!

Just type:


ramalama run ollama://deepseek-r1:7b

When the model is finished downloading, type the same thing you typed with granite or merlin and you can compare how they perform by looking at their results. It’s interesting how DeepSeek tells itself what to include in the story before it writes the story. It’s also interesting how it confidently says things that are wrong 🙂

What DeepSeek thinks

I included in my email the results of a query I entered into DeepSeek. I asked it to write a story about a certain open source-forward software company.

DeepSeek returned an interesting narrative, not all of which was accurate, but what was really cool was the way that DeepSeek “thought” about its own “thinking” — in an eerily human and transparent way. Before generating the story, DeepSeek—which, like OpenAI o1, is a reasoning model—spent a few moments muddling through how it would put the story together. And it showed its thinking, 760-plus words’ worth. For example, it reasoned that the story should have a beginning, a middle, and an end. It should be technical, but not too technical. It should talk about products and how they are being used by businesses. It should have a positive conclusion, and so on. 

This process was like a writer and editor talking through a story. Or healthcare professionals collaborating on a patient’s care plan. Or development and security teams discussing how to work together to protect an application. I can see DeepSeek being used as a tool in these and other collaborations, but I certainly don’t want it to replace them.

Indeed, based on my trial run of DeepSeek with RamaLama, I determined that I would feel comfortable using the LLM for tasks such as generating config files or in situations where inputs and outputs are pretty well packaged up—like, “Hey, analyze this cluster and tell me if you know whether Kubernetes is healthy.” However, the glaring hallucinations in DeepSeek’s narrative about the aforementioned open source company led me to determine that DeepSeek should not be considered the supreme authority for any kind of open-ended questions whose answers have impactful ramifications.

And, honestly, I would say that today about any public LLM.

The value of RamaLama

I think that’s where the value proposition of RamaLama comes in. You can do this kind of testing and iterating on AI models without compromising your own data. When you’re done running the model locally, it can just be deleted. This is something that Ollama also does, but RamaLama’s ability to containerize models provides portability across runtimes and the ability to leverage existing infrastructure (including container registries and CI/CD workflows). RamaLama also optimizes software for specific GPU configurations and generates a Podman Quadlet file that makes it easier for developers eventually to go from experimentation to production.

These kinds of capabilities will be increasingly important as more companies invest more time, money, and trust in AI.

Indeed, DeepSeek has a plethora of potential issues but it has challenged conventional wisdom and therefore has the potential to move AI thinking and applications forward. Curiosity mixed with a healthy dose of caution should drive our work with new technology, so it will be important to continue to use and develop safe spaces such as RamaLama.

Generative AI Insights provides a venue for technology leaders—including vendors and other outside contributors—to explore and discuss the challenges and opportunities of generative artificial intelligence. The selection is wide-ranging, from technology deep dives to case studies to expert opinion, but also subjective, based on our judgment of which topics and treatments will best serve InfoWorld’s technically sophisticated audience. InfoWorld does not accept marketing collateral for publication and reserves the right to edit all contributed content. Contact doug_dineley@foundryco.com.