A benchmark of Thai word tokenizers written in various programming languages

Written on 2017-06-04

The origin post was at https://veer66.wordpress.com/2017/01/19/benchmark-thai-word-tokenizers/ posted on 2017/01/19.

I wonder about speed of programs written in different languages. For example, I wonder whether one written in Kotlin and ran on JVM is slower than one written in Go. Although there are several existing benchmarks, this is one may be still important at least for me, because Thai word tokenizer is my real task.

So @iporsut and me wrote some programs in different programming languages and tried to optimize them.

I conducted the experiment on my laptop computer, which has Intel® Core™ i3-4030U CPU @ 1.90GHz × 4, on a 20MB Thai text corpus.

  • Rust #1: 3.366 #2: 3.247 #3 3.241 #Avg: 3.284
  • Go #1: 5.415 #2: 5.405 #3 5.416 #Avg: 5.412
  • Crystal #1: 5.637 #2: 5.679 #3 5.649 #Avg: 5.655
  • Kotlin+Clojure #1: 6.547 #2: 6.743 #3 6.628 #Avg: 6.639
  • Julia #1: 38.316 #2: 38.112 #3 38.237 #Avg: 38.221
  • JavaScript #1: 49.349 #2: 49.084 #3 49.901 #Avg: 49.445
  • Python #1: 50.624 #2: 50.803 #3 50.869 #Avg: 50.765
  • Clojure+Kotlin #1: 63.502 #2: 67.561 #3 67.303 #Avg: 66.122

Additional setup

Future work

@iporsut has already written multicore versions, so maybe next month I will conduct another experiment.


Unless otherwise credited all material Creative Commons License by Vee Satayamas