langlab.core.multi-stemmers
Module contains stemming algorithms returning multiple results.
make-multi-stem-hunspell
(make-multi-stem-hunspell aff-fname-or-stream dic-fname-or-stream)Creates Hunspell stemming function based on dictionaries stored in aff-fname and dic-fname. Each parameter can be either file name or stream. It returns a stemming function with signature String -> String.
Note. Since the hunspell is a dictionary stemmer, the created function returns original word, if it encounters unknown term. Complementary function make-multi-stem-hunspell-raw returns empty collection in this case
make-multi-stem-hunspell-raw
(make-multi-stem-hunspell-raw aff-fname-or-stream dic-fname-or-stream)Creates Hunspell stemming function based on dictionaries read from aff-fname-or-stream dic-fname-or-stream. Each parameter can be either file name or stream. It returns a stemming function with signature String -> String.
Note. Since the hunspell is a dictionary stemmer, the created function returns empty collection, if it encounters unknown term. Complementary function make-multi-stem-hunspell returns the original word in this case.
merge-multiple-words
(merge-multiple-words words sep)Sorts words and merges them into one string with a separator sep in between.
Can be used to merge multiple stems into one word and hence convert multi-stemmer to stemmer.
pl-multi-stem-morfologik
(pl-multi-stem-morfologik word)Returns a seq of stems for word generated by Polish Morfologik stemmer.
select-longest-word
(select-longest-word words)Selects the longest string out of words.
Can be used to select one of the multiple stems and hence convert multi-stemmer to stemmer.
select-shortest-word
(select-shortest-word words)Selects the shortest string out of words.
Can be used to select one of the multiple stems and hence convert multi-stemmer to stemmer.