Wordchipper 0.9: Fastest Gun in the West
wordchipper 0.9 is out!
wordchipper is a high-performance Rust byte-pair encoder tokenizer for the OpenAI GPT-2 tokenizer family.
With throughput speedups relative to tiktoken-rs in rust on a 64 core machine of ~4.3-5.7x
(4 to 64 cores) for general regex BPE vocabularies, and ~6.9x-9.2x when using custom DFA lexers for specific OpenAI
vocabularies. Under python wrappers, we see a range of ~2x-4x (4 to 64 cores) speedups
over tiktoken.
We’re publishing a paper on this work: