langlab.core.stopwords

Module contains predefined sets of stopwords/articles for various languages and functions to operate on them.

All constants are lowercase.

ar-get-stopwords-lucene

(ar-get-stopwords-lucene)

Returns default Arabic stopword set used by Lucene.

bg-get-stopwords-clef

(bg-get-stopwords-clef)

Returns Bulgarian stopword set. Taken from http://members.unine.ch/jacques.savoy/clef/

bg-get-stopwords-lucene

(bg-get-stopwords-lucene)

Returns default Bulgarian stopword set used by Lucene.

br-get-stopwords-lucene

(br-get-stopwords-lucene)

Returns default Brazilian stopword set used by Lucene.

ca-get-stopwords-lucene

(ca-get-stopwords-lucene)

Returns default Catalan stopword set used by Lucene.

ca-get-stopwords-ranks

(ca-get-stopwords-ranks)

Returns Catalan stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

cz-get-stopwords-clef

(cz-get-stopwords-clef)

Returns Czech stopword set. Taken from http://members.unine.ch/jacques.savoy/clef/

cz-get-stopwords-lucene

(cz-get-stopwords-lucene)

Returns default Czech stopword set used by Lucene.

cz-get-stopwords-ranks

(cz-get-stopwords-ranks)

Returns Czech stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

da-get-stopwords-lucene

(da-get-stopwords-lucene)

Returns default Danish stopword set used by Lucene.

da-get-stopwords-ranks

(da-get-stopwords-ranks)

Returns Danish stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

de-get-articles

(de-get-articles)

Returns German article set.

de-get-stopwords-clef

(de-get-stopwords-clef)

Returns German stopword set. Taken from http://members.unine.ch/jacques.savoy/clef/

de-get-stopwords-lucene

(de-get-stopwords-lucene)

Returns default German stopword set used by Lucene.

de-get-stopwords-ranks

(de-get-stopwords-ranks)

Returns German stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

el-get-stopwords-lucene

(el-get-stopwords-lucene)

Returns default Greek stopword set used by Lucene.

en-get-articles

(en-get-articles)

Returns English article set.

en-get-stopwords-clef

(en-get-stopwords-clef)

Returns English stopword set. Taken from http://members.unine.ch/jacques.savoy/clef/

en-get-stopwords-lucene

(en-get-stopwords-lucene)

Returns default Bulgarian stopword set used by Lucene.

en-get-stopwords-ranks-long

(en-get-stopwords-ranks-long)

Returns English stopword set (longer list). Taken from http://www.ranks.nl/resources/stopwords.html.

en-get-stopwords-ranks-short

(en-get-stopwords-ranks-short)

Returns English stopword set (shorter list). Taken from http://www.ranks.nl/resources/stopwords.html.

en-get-stopwords-ranks-vlong

(en-get-stopwords-ranks-vlong)

Returns English stopword (very long list). Taken from http://www.ranks.nl/resources/stopwords.html.

es-get-articles

(es-get-articles)

Returns Spanish article set.

es-get-stopwords-long-clef

(es-get-stopwords-long-clef)

Returns Spanish stopword set (longer version). Taken from http://members.unine.ch/jacques.savoy/clef/

es-get-stopwords-lucene

(es-get-stopwords-lucene)

Returns default Spanish stopword set used by Lucene.

es-get-stopwords-ranks

(es-get-stopwords-ranks)

Returns Spanish stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

es-get-stopwords-short-clef

(es-get-stopwords-short-clef)

Returns Spanish stopword set (shorter version). Taken from http://members.unine.ch/jacques.savoy/clef/

eu-get-stopwords-lucene

(eu-get-stopwords-lucene)

Returns default Basque stopword set used by Lucene.

fa-get-stopwords-lucene

(fa-get-stopwords-lucene)

Returns default Persian stopword set used by Lucene.

fi-get-stopwords-lucene

(fi-get-stopwords-lucene)

Returns default Finish stopword set used by Lucene.

fi-get-stopwords-ranks

(fi-get-stopwords-ranks)

Returns Finish stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

fr-get-articles

(fr-get-articles)

Returns French article set.

fr-get-stopwords-clef

(fr-get-stopwords-clef)

Returns French stopword set. Taken from http://members.unine.ch/jacques.savoy/clef/

fr-get-stopwords-lucene

(fr-get-stopwords-lucene)

Returns default French stopword set used by Lucene.

fr-get-stopwords-ranks

(fr-get-stopwords-ranks)

Returns French stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

ga-get-stopwords-lucene

(ga-get-stopwords-lucene)

Returns default Irish stopword set used by Lucene.

gl-get-stopwords-lucene

(gl-get-stopwords-lucene)

Returns default Galician stopword set used by Lucene.

hi-get-stopwords-lucene

(hi-get-stopwords-lucene)

Returns default Hindi stopword set used by Lucene.

hu-get-stopwords-clef

(hu-get-stopwords-clef)

Returns Hungarian stopword set. Taken from http://members.unine.ch/jacques.savoy/clef/

hu-get-stopwords-lucene

(hu-get-stopwords-lucene)

Returns default Hungarian stopword set used by Lucene.

hu-get-stopwords-ranks

(hu-get-stopwords-ranks)

Returns Hungarian stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

hy-get-stopwords-lucene

(hy-get-stopwords-lucene)

Returns default Armenian stopword set used by Lucene.

id-get-stopwords-lucene

(id-get-stopwords-lucene)

Returns default Indonesian stopword set used by Lucene.

it-get-articles

(it-get-articles)

Returns Italian article set.

it-get-stopwords-clef

(it-get-stopwords-clef)

Returns Italian stopword set. Taken from http://members.unine.ch/jacques.savoy/clef/

it-get-stopwords-lucene

(it-get-stopwords-lucene)

Returns default Italian stopword set used by Lucene.

it-get-stopwords-ranks

(it-get-stopwords-ranks)

Returns Italian stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

lv-get-stopwords-lucene

(lv-get-stopwords-lucene)

Returns default Latvian stopword set used by Lucene.

nl-get-articles

(nl-get-articles)

Returns Dutch article set.

nl-get-stopwords-ranks

(nl-get-stopwords-ranks)

Returns Dutch stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

no-get-stopwords-lucene

(no-get-stopwords-lucene)

Returns default Norwegian stopword set used by Lucene.

no-get-stopwords-ranks

(no-get-stopwords-ranks)

Returns Norwegian stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

pl-get-stopwords-long-clef

(pl-get-stopwords-long-clef)

Returns Polish stopword set (longer version). Taken from http://members.unine.ch/jacques.savoy/clef/

pl-get-stopwords-lucene

(pl-get-stopwords-lucene)

Returns default Polish stopword set used by Lucene.

pl-get-stopwords-ranks

(pl-get-stopwords-ranks)

Returns Polish stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

pl-get-stopwords-short-clef

(pl-get-stopwords-short-clef)

Returns Polish stopword set (shorter version). Taken from http://members.unine.ch/jacques.savoy/clef/

pl-get-stopwords-wiki-long

(pl-get-stopwords-wiki-long)

Return Polish stopword set. Taken from http://pl.wikipedia.org/wiki/Wikipedia:Stopwords.

pl-get-stopwords-wiki-short

(pl-get-stopwords-wiki-short)

Return Polish stopword set. Taken from http://pl.wikipedia.org/wiki/Wikipedia:Stopwords. Extended with polysemous words bez i go.

pt-get-articles

(pt-get-articles)

Return Portuguese article set.

pt-get-stopwords-long-clef

(pt-get-stopwords-long-clef)

Returns Portuguese stopword set (longer version). Taken from http://members.unine.ch/jacques.savoy/clef/

pt-get-stopwords-lucene

(pt-get-stopwords-lucene)

Returns default Portuguese stopword set used by Lucene.

pt-get-stopwords-ranks

(pt-get-stopwords-ranks)

Returns Portuguese stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

pt-get-stopwords-short-clef

(pt-get-stopwords-short-clef)

Returns Portuguese stopword set (shorter version). Taken from http://members.unine.ch/jacques.savoy/clef/

ro-get-stopwords-clef

(ro-get-stopwords-clef)

Returns Romanian stopword set. Taken from http://members.unine.ch/jacques.savoy/clef/

ro-get-stopwords-lucene

(ro-get-stopwords-lucene)

Returns default Romanian stopword set used by Lucene.

ru-get-stopwords-clef

(ru-get-stopwords-clef)

Returns Russian stopword set. Taken from http://members.unine.ch/jacques.savoy/clef/

ru-get-stopwords-lucene

(ru-get-stopwords-lucene)

Returns default Russian stopword set used by Lucene.

ru-get-stopwords-ranks

(ru-get-stopwords-ranks)

Returns Russian stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

sv-get-stopwords-clef

(sv-get-stopwords-clef)

Returns Swedish stopword set. Taken from http://members.unine.ch/jacques.savoy/clef/

sv-get-stopwords-lucene

(sv-get-stopwords-lucene)

Returns default Swedish stopword set used by Lucene.

sv-get-stopwords-ranks

(sv-get-stopwords-ranks)

Returns Swedish stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.

th-get-stopwords-lucene

(th-get-stopwords-lucene)

Returns default Thai stopword set used by Lucene.

tr-get-stopwords-lucene

(tr-get-stopwords-lucene)

Returns default Turkish stopword set used by Lucene.

tr-get-stopwords-ranks

(tr-get-stopwords-ranks)

Returns Turkish stopword set. Taken from http://www.ranks.nl/resources/stopwords.html.