langlab.core.transformers
Module contains utilities for transforming tokens.
merge-tokens-with-space
(merge-tokens-with-space tokens)
Creates a string from tokens
seq, by inserting space between them.
trans-drop-punct
(trans-drop-punct tokens)
Drops all items from tokens
that contains only punctuation tokens.
trans-drop-punct-lower
(trans-drop-punct-lower tokens)
Drops all punctuation tokens and lowercases all tokens
.
trans-drop-set
(trans-drop-set drop-set tokens)
Drop all elements of tokens
that are included in the drop-set
. To generate drop-set
one of the functions returning stopwords or articles from module core.stopwords can be used.
trans-drop-set-all-case
(trans-drop-set-all-case drop-set tokens)
Drop all elements of tokens
that are included in the drop-set
. Ignore case. To generate drop-set
one of the functions returning stopwords or articles from module core.stopwords
can be used.
trans-drop-whitespace
(trans-drop-whitespace tokens)
From seq tokens
removes all entries that contain only whitespace.
trans-keep-letters-or-digits
(trans-keep-letters-or-digits tokens)
Drops all items from tokens
that contain other characters than letters or digits.
trans-merge-punct
(trans-merge-punct tokens)
In seq tokens
merges those groups that contain only punctuation.
(trans-merge-punct [ "Wow" "!" "!" "!" ])
[ "Wow" "!!!" ]
Inverse of trans-split-punct
.
trans-split-punct
(trans-split-punct tokens)
Split all punctuation tokens from tokens
into separate characters.
(trans-split-punct [ "Wow" "!!!" ])
[ "Wow" "!" "!" "!" ]
Inverse of trans-split-punct
.