langlab.core.multi-stemmers
Module contains stemming algorithms returning multiple results.
make-multi-stem-hunspell
(make-multi-stem-hunspell aff-fname-or-stream dic-fname-or-stream)
Creates Hunspell stemming function based on dictionaries stored in aff-fname
and dic-fname
. Each parameter can be either file name or stream. It returns a stemming function with signature String -> String.
Note. Since the hunspell is a dictionary stemmer, the created function returns original word, if it encounters unknown term. Complementary function make-multi-stem-hunspell-raw
returns empty collection in this case
make-multi-stem-hunspell-raw
(make-multi-stem-hunspell-raw aff-fname-or-stream dic-fname-or-stream)
Creates Hunspell stemming function based on dictionaries read from aff-fname-or-stream
dic-fname-or-stream
. Each parameter can be either file name or stream. It returns a stemming function with signature String -> String.
Note. Since the hunspell is a dictionary stemmer, the created function returns empty collection, if it encounters unknown term. Complementary function make-multi-stem-hunspell
returns the original word in this case.
merge-multiple-words
(merge-multiple-words words sep)
Sorts words
and merges them into one string with a separator sep
in between.
Can be used to merge multiple stems into one word and hence convert multi-stemmer to stemmer.
pl-multi-stem-morfologik
(pl-multi-stem-morfologik word)
Returns a seq of stems for word
generated by Polish Morfologik stemmer.
select-longest-word
(select-longest-word words)
Selects the longest string out of words
.
Can be used to select one of the multiple stems and hence convert multi-stemmer to stemmer.
select-shortest-word
(select-shortest-word words)
Selects the shortest string out of words
.
Can be used to select one of the multiple stems and hence convert multi-stemmer to stemmer.