Slide 10
Slide 10 text
Michaels-Mac-Studio:llama.cpp $ ./main -m ./models/llama-2-7b-chat.ggmlv3.q4_0.bin --temp 0.1 -p "### Instruction: What is
the height of Mount Fuji?
### Response:" -b 512 -ngl 32
main: build = 944 (8183159)
main: seed = 1691074574
llama.cpp: loading model from ./models/llama-2-7b-chat.ggmlv3.q4_0.bin
llama_model_load_internal: format = ggjt v3 (latest)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 512
…
ggml_metal_init: recommendedMaxWorkingSetSize = 49152.00 MB
ggml_metal_init: hasUnifiedMemory = true
ggml_metal_init: maxTransferRate = built-in GPU
llama_new_context_with_model: max tensor size = 70.31 MB
…
system_info: n_threads = 16 / 20 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 |
ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 |
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k =
40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.100000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent
= 5.000000
generate: n_ctx = 512, n_batch = 512, n_predict = -1, n_keep = 0
### Instruction: What is the height of Mount Fuji?
### Response: The height of Mount Fuji is 3,776 meters (12,421 feet) above sea level. [end of text]
llama_print_timings: load time = 4943.32 ms
llama_print_timings: sample time = 20.34 ms / 28 runs ( 0.73 ms per token, 1376.33 tokens per second)
llama_print_timings: prompt eval time = 503.32 ms / 19 tokens ( 26.49 ms per token, 37.75 tokens per second)
llama_print_timings: eval time = 372.99 ms / 27 runs ( 13.81 ms per token, 72.39 tokens per second)
llama_print_timings: total time = 899.05 ms
ggml_metal_free: deallocating
実行してみる!