§ feed · storyline

whisper.cpp v1.6.0

whisper.cpp v1.6.0 adds optional Flash Attention support for faster inference on CUDA and Metal devices and fixes a slowdown bug in the main binary.

May 15 · 09:13:56 · primary fetch1 sourceupdated May 15 · 09:13:56

Overview Can optionally enable Flash Attention for faster processing on CUDA and Metal devices (#2152) Faster ppc64 performance (40aeeeecc4b8700b2a7e50cbcfa5c5412f2626ab) (not tested) Fix `main` slowdown bug (#2070) Shoutout to @JohannesGaessler for contributing efficient FA CUDA kernels Some performance numbers for this release: M1 Pro | CPU | Config | Model | Th | FA | Enc. | Dec. | Bch5 | PP | Commit | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | M1 Pro | METAL | tiny | 1 | 0 | 39.21 | 1.74 | 0.61 | 0.04 | 22c96b4 | | M1 Pro | METAL | base | 1 | 0 | 70.76 | 2.60 | 0.93 | 0.06 | 22c96b4 | | M1 Pro | METAL | small | 1 | 0 | 217.28 | 6.42 | 2.14 | 0.17 | 22c96b4 | | M1 Pro | METAL | medium | 1 | 0 | 596.74 | 14.43 | 4.75 | 0.45 | 22c96b4 | | CPU | Config | Model | Th | FA | Enc.

| Dec. | Bch5 | PP | Commit | | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | | M1 Pro | METAL | tiny | 1 | 1 | 30.77 | 1.59 | 0.54 | 0.03 | 22c96b4 | | M1 Pro | METAL | base | 1 | 1 | 60.42 | 2.29 | 0.81 | 0.05 | 22c96b4 | | M1 Pro | METAL | small | 1 | 1 | 183.82 | 5.12 | 1.81 | 0.14 | 22c96b4 | | M1 Pro | METAL | medium | 1 | 1 | 517.92 | 11.60 | 4.01 | 0.38 | 22c96b4 |…

read full article on github.com ↗

§ sources1 publication · timeline below

github.comwhisper.cpp v1.6.0primary09:13:56