What to consider when selecting a standalone model?
- Where you want to run it, or in short, the compute requirement. For example, if you plan to run the model on an IoT device, a smartphone or laptop – aim for models in the range of 1-11B parameters.
- What use case it is good for. Not all models are equal, what they are good at depends largely on what data set it was trained on.
- The license model. Regardless of functionality, the license model still needs to be scrutinized. It may restrict you from specific use cases or fine-tuning it on own data.
For example, OpenAI does not allow the use of Whisper in high stakes decisions.
“Our usage policies prohibit use in certain high-stakes decision-making contexts, and our model card for open-source use includes recommendations against use in high-risk domains”
(which basic rules out applicability in a wide range of contexts without human supervision)
Models to aim for – the short summary
- For balance between complexity and speed: Llama 3.2 8b is the best choice.
- For handling large documents and complex reasoning: Aim for Mistral Nemo 12b, but keep in mind it will require more resources and be slower.
- For coding tasks in Java, Python & Rust consider Zamba2. For coding tasks in C# consider DeepSeek Coder.
General notes
2b – 8b models are smaller, faster, and can handle simpler tasks. They’re more lightweight and good for smaller, real-time tasks.
10b – 70b models are larger, slower, and better for complex, nuanced tasks. They have better performance on language understanding and generation, but are resource-intensive.
90 – 405B models – for a server park with high end GPU’s.
LLaVA (Large-scale Language and Vision Assistant) is a multimodal model that aims to achieve general vision and language understanding by combining visual encoders and large-scale language models.
Quantization reduces the memory usage of a model. It does so by reducing the number of bits used for precision during training. However, the trade-off is accuracy. As a rule of thumb aim for Q4 and higher.
Standalone AI Models
Model | Parameters | Quantization | Best for |
Llama3.2 | 1B, 3B, 11B, 90B, 405B | Q8, FP16 | Advanced tools, large-scale tasks Models 1-3B are text only, Models 11B and above can reason with high resolution images. |
Gemma2 | 2B, 9B, 27B | Q8 | Efficient text generation and language tasks |
Mistral-Nemo | 12B, 70B | Q4 | Long-context tasks, multi-lingual support. (Available under Aparche) |
LLaVA Phi 3 | 3,8B | Visual recognition, chart intepretation. Trained on additional document, chart and diagram data sets. Phi 3 is fine tuned LLaVA model f with strong performance benchmarks on par with the original LLaVA model. | |
Nvidia NVLM 1.0 | 72B | Visual tasks, summarize manual notes, intepret images. Release notes. | |
Qwen2 0 | 1.5B, 7B, 72B | Q4, Q8 | For: Text processing, general AI tasks |
Deepseek-Coder | 16B, 236B | Q8, FP16 | For: Code generation, fill-in-the-middle tasks |
CodeGemma | 2B, 7B | Q8 | Code generation, instruction-following tasks |
Zamba 2 Instruction Tuned | 1.2B, 2.7B | Coding Rust, Python, Java and chat conversations. Low memory footprint model that beats Gemma2, Mistral 7B. Apache 2.0 licenced. |
How can I take a standalone model for a spin?
If you would like to try out a standalone model on your own computer, but without coding in Python, then try out one of these three tools.