langlab.core.multi-stemmers

Module contains stemming algorithms returning multiple results.

make-multi-stem-hunspell

(make-multi-stem-hunspell aff-fname-or-stream dic-fname-or-stream)

Creates Hunspell stemming function based on dictionaries stored in aff-fname and dic-fname. Each parameter can be either file name or stream. It returns a stemming function with signature String -> String.

Note. Since the hunspell is a dictionary stemmer, the created function returns original word, if it encounters unknown term. Complementary function make-multi-stem-hunspell-raw returns empty collection in this case

make-multi-stem-hunspell-raw

(make-multi-stem-hunspell-raw aff-fname-or-stream dic-fname-or-stream)

Creates Hunspell stemming function based on dictionaries read from aff-fname-or-stream dic-fname-or-stream. Each parameter can be either file name or stream. It returns a stemming function with signature String -> String.

Note. Since the hunspell is a dictionary stemmer, the created function returns empty collection, if it encounters unknown term. Complementary function make-multi-stem-hunspell returns the original word in this case.

merge-multiple-words

(merge-multiple-words words sep)

Sorts words and merges them into one string with a separator sep in between.

Can be used to merge multiple stems into one word and hence convert multi-stemmer to stemmer.

pl-multi-stem-morfologik

(pl-multi-stem-morfologik word)

Returns a seq of stems for word generated by Polish Morfologik stemmer.

select-longest-word

(select-longest-word words)

Selects the longest string out of words.

Can be used to select one of the multiple stems and hence convert multi-stemmer to stemmer.

select-shortest-word

(select-shortest-word words)

Selects the shortest string out of words.

Can be used to select one of the multiple stems and hence convert multi-stemmer to stemmer.