llamanator-project/ollama-bench: Benchmarking tool for Ollama >=0.1.33 using the Ollama Python Library
Ollama Bench 🦙📈 Ollama 长椅 🦙 📈
Ollama Bench is a benchmarking program that I (meaning me, ChatGPT and Claude) developed based on the Ollama Python Library to test the performance and capabilities of the parallel requests and multiple model loading capabilities released in Ollama 0.1.33.
Ollama Bench 是我(指我、ChatGPT 和 Claude)基于 Ollama Python 库开发的一个基准测试程序,用于测试 Ollama 0.1.33 中发布的并行请求和多模型加载功能的性能和能力。
Tools Used 使用的工具
Requirements
- Mac/Linux
- Python3
- Ollama >=0.1.33
Usage
- Clone this repo
git clone https://github.com/llamanator-project/ollama-bench.gitandcd ollama-bench - Add a bank of prompts as a
somefile.txtformat with each prompt on a new line separated by a comma. Refer to the files in the./sample-promptsdir for the format, or just use one of the samples for your run - Install the required python packages
pip install -r requirements.txt - (Optional) Set your Ollama host URL as an environment variable
export OLLAMA_HOST=http://127.0.0.1:11434or your Ollama host that you want to test against - Run
python3 ollama-bench.py -f ./your-question-file.txtand you will be prompted for some input - Fill in the inputs and confirm to start the run
Enter Ollama host URL:Enter the target of your Ollama server like http://127.0.0.1:11434Available models: {model list} Select up to 10 models by number (e.g., 1, 3, 5):The list of available models on your server will be queried and presented. Select the models you want to use like1, 3, 5etc. This will test the multiple model loading capabilities of Ollama 0.1.33. The selected models will be used in a round-robin method to query throughout the runHow many concurrent requests would you like to run?Select how many times you want to prompt the models. If your prompt list is shorter than the number you put here, the program will loop over your prompt list to complete the number requestedHow much time in seconds should wait between requests?This is how long to wait in between prompts. Lowest is 1 second.Enter the timeout duration in seconds (after which a request will be marked as running for too long):Choose a value in seconds for how long before a request is automatically stopped. This comes in handy as some models start looping and generating nonsense forever. The program will monitor how long the run takes and automatically kill the request after the defined time.Please confirm the details of the run by typing 'yes':You will be prompted to confirm the details of the run by typingyes
- View the results in the
./ollama-benchmark-results/{current_datetime}-benchmarkdir for your specific run
Outputs
- During the run, all successful requests and errors will be put into individual
.csvfiles in the./ollama-benchmark-resultsdir under a dir called{current_datetime}-benchmarkfor each run. - At the end of the run, or if you hit
ctrl + cone time, the results will be compiled intobenchmark_results.csvanderror_log.csv- The
benchmark_results.csvcontains all of the prompts, answers, model and stats about each request. At the bottom of the csv, the results are summarized with the following:Total Tokens Received in Response:How many tokens were received across all requestsHighest Tokens per Second:Faster tokens per second response timeLowest Tokens per Second:Slowest tokens per second response timeAverages:Average tokens per second across all requestsModel Usage Summary:The number of times each model selected for this run was used
- The
Ollama Setup
In order to full test this, you will need Ollama 0.1.33 or newer and make sure you have added the following options on your appropriate OS:
OLLAMA_NUM_PARALLEL=x: This sets the number of parallel requests Ollama will accept. An error will be sent when you exceed the number set here.OLLAMA_MAX_LOADED_MODELS=x: This set the number of models that Ollama will be allowed to load at one time. This is useful for testing the multi-model loading capabilities of Ollama.
You will need to set these and tune them to determine the best results for your particular system.
Issues and Improvements
There will likely be issues. I am not a developer and I fully leveraged ChatGPT and Claude to write this.
If you find an issue, please feel free to open an issue on the Issues tab or submit a PR for the fix.
If you want to improve the script feel free to submit a PR and I will review and merge.