Full API documentation¶
Once a Linguistica object (such as lxa_object below with the Brown corpus)
is initialized, various methods and attributes are available for automatic
linguistic analysis:
>>> import linguistica as lxa
>>> lxa_object = lxa.read_corpus('path/to/english-brown.txt')
>>> words = lxa_object.wordlist() # using wordlist()
Basic information¶
number_of_word_tokens() |
Return the number of word tokens. |
number_of_word_types() |
Return the number of word types. |
Word ngrams¶
Parameter: max_word_tokens
wordlist() |
Return a wordlist sorted by word frequency in descending order. |
word_unigram_counter() |
Return a dict of words with their counts. |
word_bigram_counter() |
Return a dict of word bigrams with their counts. |
word_trigram_counter() |
Return a dict of word trigrams with their counts. |
Morphological signatures¶
Parameters: min_stem_length, max_affix_length, min_sig_count, suffixing
signatures() |
Return a set of morphological signatures. |
stems() |
Return a set of stems. |
affixes() |
Return a set of affixes. |
signatures_to_stems() |
Return a dict of morphological signatures to stems. |
signatures_to_words() |
Return a dict of morphological signatures to words. |
affixes_to_signatures() |
Return a dict of affixes to morphological signatures. |
stems_to_signatures() |
Return a dict of stems to morphological signatures. |
stems_to_words() |
Return a dict of stems to words. |
words_in_signatures() |
Return a set of words that are in at least one morphological signature. |
words_to_signatures() |
Return a dict of words to morphological signatures. |
words_to_sigtransforms() |
Return a dict of words to signature transforms. |
Word manifolds and syntactic word neighborhood¶
Parameters: max_word_types, min_context_count, n_neighbors, n_eigenvectors
words_to_neighbors() |
Return a dict of words to syntactic neighbors. |
neighbor_graph() |
Return the syntactic word neighborhood graph. |
words_to_contexts() |
Return a dict of words to contexts with counts. |
contexts_to_words() |
Return a dict of contexts to words with counts. |
Phonology¶
phone_unigram_counter() |
Return a dict of phone unigrams with counts. |
phone_bigram_counter() |
Return a dict of phone bigrams with counts. |
phone_trigram_counter() |
Return a dict of phone trigrams with counts. |
Tries¶
Parameter: min_stem_length
broken_words_left_to_right() |
Return a dict of words to their left-to-right broken form. |
broken_words_right_to_left() |
Return a dict of words to their right-to-left broken form. |
successors() |
Return a dict of word (sub)strings to their successors. |
predecessors() |
Return a dict of word (sub)strings to their predecessors. |
Other methods and attributes¶
parameters() |
Return the parameter dict. |
change_parameters(**kwargs) |
Change parameters specified by kwargs. |
use_default_parameters() |
Reset parameters to their default values. |
reset() |
Reset the Linguistica object. |
-
class
linguistica.lexicon.Lexicon(file_path=None, wordlist_file=False, corpus_object=None, wordlist_object=None, encoding='utf8', **kwargs)¶ A class for a Linguistica object.
-
affixes()¶ Return a set of affixes.
Return type: set(str)
-
affixes_to_signatures()¶ Return a dict of affixes to morphological signatures.
Return type: dict(str: set(tuple(str)))
-
biphone_dict()¶ Return a dict of phone bigrams to Biphone objects. A Biphone instance has the methods
spelling(),count(),frequency(),MI(), andweighted_MI().Return type: dict((str, str): Biphone instance)
-
broken_words_left_to_right()¶ Return a dict of words to their left-to-right broken form.
Return type: dict(str: list(str))
-
broken_words_right_to_left()¶ Return a dict of words to their right-to-left broken form.
Return type: dict(str: list(str))
-
change_parameters(**kwargs)¶ Change parameters specified by kwargs.
Parameters: kwargs – keyword arguments for parameters and their new values
-
contexts_to_words()¶ Return a dict of contexts to words with counts.
Return type: dict(tuple(str): dict(str: int))
-
neighbor_graph()¶ Return the syntactic word neighborhood graph.
Return type: networkx undirected graph
-
number_of_word_tokens()¶ Return the number of word tokens.
Return type: int
-
number_of_word_types()¶ Return the number of word types.
Return type: int
-
output_all_results(directory=None, verbose=False, test=False)¶ Output all Linguistica results to directory.
Parameters: directory – output directory. If not specified, it defaults to the current directory given by os.getcwd().
-
parameters()¶ Return the parameter dict.
Return type: dict(str: int)
-
phone_bigram_counter()¶ Return a dict of phone bigrams with counts.
Return type: dict(tuple(str): int)
-
phone_dict()¶ Return a dict of phone unigrams to Phone objects. A Phone instance has the methods
spelling(),count(),frequency(), andplog().Return type: dict(str: Phone instance)
-
phone_trigram_counter()¶ Return a dict of phone trigrams with counts.
Return type: dict(tuple(str): int)
-
phone_unigram_counter()¶ Return a dict of phone unigrams with counts.
Return type: dict(str: int)
-
predecessors()¶ Return a dict of word (sub)strings to their predecessors.
Return type: dict(str: set(str))
-
reset()¶ Reset the Linguistica object. While the file path information is retained, all computed objects (ngrams, signatures, word neighbors, etc) are reset to
NULL; if they are called again, they are re-computed.
-
run_all_modules(verbose=False)¶ Run all modules.
-
run_manifold_module(verbose=False)¶ Run the phon module.
-
run_ngram_module(verbose=False)¶ Run the ngram module.
-
run_phon_module(verbose=False)¶ Run the phon module.
-
run_signature_module(verbose=False)¶ Run the signature module.
-
run_trie_module(verbose=False)¶ Run the trie module.
-
signatures()¶ Return a set of morphological signatures.
Return type: set(tuple(str))
-
signatures_to_stems()¶ Return a dict of morphological signatures to stems.
Return type: dict(tuple(str): set(str))
-
signatures_to_words()¶ Return a dict of morphological signatures to words.
Return type: dict(tuple(str): set(str))
-
stems()¶ Return a set of stems.
Return type: set(str)
-
stems_to_signatures()¶ Return a dict of stems to morphological signatures.
Return type: dict(str: set(tuple(str)))
-
stems_to_words()¶ Return a dict of stems to words.
Return type: dict(str: set(str))
-
successors()¶ Return a dict of word (sub)strings to their successors.
Return type: dict(str: set(str))
-
use_default_parameters()¶ Reset parameters to their default values.
-
word_bigram_counter()¶ Return a dict of word bigrams with their counts.
Return type: dict(tuple(str): int)
-
word_phonology_dict()¶ Return a dict of words to Word objects. A Word instance has the methods
spelling(),phones(),count(),frequency(),unigram_plog(),avg_unigram_plog(),bigram_plog(), andavg_bigram_plog().Return type: dict(str: Word instance)
-
word_trigram_counter()¶ Return a dict of word trigrams with their counts.
Return type: dict(tuple(str): int)
-
word_unigram_counter()¶ Return a dict of words with their counts.
Return type: dict(str: in)
-
wordlist()¶ Return a wordlist sorted by word frequency in descending order. (So “the” will most likely be the first word for written English.)
Return type: list(str)
-
words_in_signatures()¶ Return a set of words that are in at least one morphological signature.
Return type: set(str)
-
words_to_contexts()¶ Return a dict of words to contexts with counts.
Return type: dict(str: dict(tuple(str): int))
-
words_to_neighbors()¶ Return a dict of words to syntactic neighbors.
Return type: dict(word: list(str))
-
words_to_phones()¶ Return a dict of words with their phones.
Return type: dict(str: list(str))
-
words_to_signatures()¶ Return a dict of words to morphological signatures.
Return type: dict(str: set(tuple(str)))
-
words_to_sigtransforms()¶ Return a dict of words to signature transforms.
Return type: dict(str: set(tuple(tuple(str), str))
-