langlab.algs.readability

Module contains functions for computing readability indices.

calc-automated-readability-index

(calc-automated-readability-index stats)

Calculates Automated Readability Index based on the given stats. Stats should include fields :n-words, :n-hard-words, :n-sentences, n-chars.

calc-coleman-liau-index

(calc-coleman-liau-index stats)

Calculates Coleman-Liau Readability Index based on the given stats. Stats should include fields :n-words, :n-hard-words, :n-sentences, n-chars.

calc-gunning-fog-index

(calc-gunning-fog-index stats)

Calculates Gunning Fog Readability Index based on the given stats. Stats should include fields :n-words, :n-hard-words, :n-sentences.

calc-text-stats

(calc-text-stats s env)

Calculates statistics of text s. The env supports the following keys mapping to functions

  • :split-sentences-f - splits text into sentences (mandatory),
  • :split-tokens-f - splits text to tokens (mandatory),
  • :trans-drop-punct-f - removes all non-words tokens (default trans-drop-punct),
  • :count-chars-f - count chars in string (default: en-count-chars-bi,
  • :is-hard-word-f - check if token is a hard word (default: count-latin-vowel-groups-without-final>2).

Result contains a map with the following fields: - :n-chars - total number of letters in words, - :n-words - number of words, - :n-hard-words - number of hard words according to is-hard-word-f, - :n-sentences - number of sentences.

count-sentences

(count-sentences s env)

Counts the number of sentences in s based on the provided env. The env supports key :split-sentences-f (mandatory).

count-words

(count-words s env)

Counts the number of words in s based on the provided env. The env supports keys:

  • :split-tokens-f (mandatory)
  • :trans-drop-punct-f (defaults to trans-drop-punct)