langlab.core.transformers
Module contains utilities for transforming tokens.
merge-tokens-with-space
(merge-tokens-with-space tokens)Creates a string from tokens seq, by inserting space between them.
trans-drop-punct
(trans-drop-punct tokens)Drops all items from tokens that contains only punctuation tokens.
trans-drop-punct-lower
(trans-drop-punct-lower tokens)Drops all punctuation tokens and lowercases all tokens.
trans-drop-set
(trans-drop-set drop-set tokens)Drop all elements of tokens that are included in the drop-set. To generate drop-set one of the functions returning stopwords or articles from module core.stopwords can be used.
trans-drop-set-all-case
(trans-drop-set-all-case drop-set tokens)Drop all elements of tokens that are included in the drop-set. Ignore case. To generate drop-set one of the functions returning stopwords or articles from module core.stopwords can be used.
trans-drop-whitespace
(trans-drop-whitespace tokens)From seq tokens removes all entries that contain only whitespace.
trans-keep-letters-or-digits
(trans-keep-letters-or-digits tokens)Drops all items from tokens that contain other characters than letters or digits.
trans-merge-punct
(trans-merge-punct tokens)In seq tokens merges those groups that contain only punctuation.
(trans-merge-punct [ "Wow" "!" "!" "!" ])
[ "Wow" "!!!" ]
Inverse of trans-split-punct.
trans-split-punct
(trans-split-punct tokens)Split all punctuation tokens from tokens into separate characters.
(trans-split-punct [ "Wow" "!!!" ])
[ "Wow" "!" "!" "!" ]
Inverse of trans-split-punct.