Full API documentation

Once a Linguistica object (such as lxa_object below with the Brown corpus) is initialized, various methods and attributes are available for automatic linguistic analysis:

>>> import linguistica as lxa
>>> lxa_object = lxa.read_corpus('path/to/english-brown.txt')
>>> words = lxa_object.wordlist()  # using wordlist()

Basic information

number_of_word_tokens() Return the number of word tokens.
number_of_word_types() Return the number of word types.

Word ngrams

Parameter: max_word_tokens

wordlist() Return a wordlist sorted by word frequency in descending order.
word_unigram_counter() Return a dict of words with their counts.
word_bigram_counter() Return a dict of word bigrams with their counts.
word_trigram_counter() Return a dict of word trigrams with their counts.

Morphological signatures

Parameters: min_stem_length, max_affix_length, min_sig_count, suffixing

signatures() Return a set of morphological signatures.
stems() Return a set of stems.
affixes() Return a set of affixes.
signatures_to_stems() Return a dict of morphological signatures to stems.
signatures_to_words() Return a dict of morphological signatures to words.
affixes_to_signatures() Return a dict of affixes to morphological signatures.
stems_to_signatures() Return a dict of stems to morphological signatures.
stems_to_words() Return a dict of stems to words.
words_in_signatures() Return a set of words that are in at least one morphological signature.
words_to_signatures() Return a dict of words to morphological signatures.
words_to_sigtransforms() Return a dict of words to signature transforms.

Word manifolds and syntactic word neighborhood

Parameters: max_word_types, min_context_count, n_neighbors, n_eigenvectors

words_to_neighbors() Return a dict of words to syntactic neighbors.
neighbor_graph() Return the syntactic word neighborhood graph.
words_to_contexts() Return a dict of words to contexts with counts.
contexts_to_words() Return a dict of contexts to words with counts.

Phonology

phone_unigram_counter() Return a dict of phone unigrams with counts.
phone_bigram_counter() Return a dict of phone bigrams with counts.
phone_trigram_counter() Return a dict of phone trigrams with counts.

Tries

Parameter: min_stem_length

broken_words_left_to_right() Return a dict of words to their left-to-right broken form.
broken_words_right_to_left() Return a dict of words to their right-to-left broken form.
successors() Return a dict of word (sub)strings to their successors.
predecessors() Return a dict of word (sub)strings to their predecessors.

Other methods and attributes

parameters() Return the parameter dict.
change_parameters(**kwargs) Change parameters specified by kwargs.
use_default_parameters() Reset parameters to their default values.
reset() Reset the Linguistica object.
class linguistica.lexicon.Lexicon(file_path=None, wordlist_file=False, corpus_object=None, wordlist_object=None, encoding='utf8', **kwargs)

A class for a Linguistica object.

affixes()

Return a set of affixes.

Return type:set(str)
affixes_to_signatures()

Return a dict of affixes to morphological signatures.

Return type:dict(str: set(tuple(str)))
biphone_dict()

Return a dict of phone bigrams to Biphone objects. A Biphone instance has the methods spelling(), count(), frequency(), MI(), and weighted_MI().

Return type:dict((str, str): Biphone instance)
broken_words_left_to_right()

Return a dict of words to their left-to-right broken form.

Return type:dict(str: list(str))
broken_words_right_to_left()

Return a dict of words to their right-to-left broken form.

Return type:dict(str: list(str))
change_parameters(**kwargs)

Change parameters specified by kwargs.

Parameters:kwargs – keyword arguments for parameters and their new values
contexts_to_words()

Return a dict of contexts to words with counts.

Return type:dict(tuple(str): dict(str: int))
neighbor_graph()

Return the syntactic word neighborhood graph.

Return type:networkx undirected graph
number_of_word_tokens()

Return the number of word tokens.

Return type:int
number_of_word_types()

Return the number of word types.

Return type:int
output_all_results(directory=None, verbose=False, test=False)

Output all Linguistica results to directory.

Parameters:directory – output directory. If not specified, it defaults to the current directory given by os.getcwd().
parameters()

Return the parameter dict.

Return type:dict(str: int)
phone_bigram_counter()

Return a dict of phone bigrams with counts.

Return type:dict(tuple(str): int)
phone_dict()

Return a dict of phone unigrams to Phone objects. A Phone instance has the methods spelling(), count(), frequency(), and plog().

Return type:dict(str: Phone instance)
phone_trigram_counter()

Return a dict of phone trigrams with counts.

Return type:dict(tuple(str): int)
phone_unigram_counter()

Return a dict of phone unigrams with counts.

Return type:dict(str: int)
predecessors()

Return a dict of word (sub)strings to their predecessors.

Return type:dict(str: set(str))
reset()

Reset the Linguistica object. While the file path information is retained, all computed objects (ngrams, signatures, word neighbors, etc) are reset to NULL; if they are called again, they are re-computed.

run_all_modules(verbose=False)

Run all modules.

run_manifold_module(verbose=False)

Run the phon module.

run_ngram_module(verbose=False)

Run the ngram module.

run_phon_module(verbose=False)

Run the phon module.

run_signature_module(verbose=False)

Run the signature module.

run_trie_module(verbose=False)

Run the trie module.

signatures()

Return a set of morphological signatures.

Return type:set(tuple(str))
signatures_to_stems()

Return a dict of morphological signatures to stems.

Return type:dict(tuple(str): set(str))
signatures_to_words()

Return a dict of morphological signatures to words.

Return type:dict(tuple(str): set(str))
stems()

Return a set of stems.

Return type:set(str)
stems_to_signatures()

Return a dict of stems to morphological signatures.

Return type:dict(str: set(tuple(str)))
stems_to_words()

Return a dict of stems to words.

Return type:dict(str: set(str))
successors()

Return a dict of word (sub)strings to their successors.

Return type:dict(str: set(str))
use_default_parameters()

Reset parameters to their default values.

word_bigram_counter()

Return a dict of word bigrams with their counts.

Return type:dict(tuple(str): int)
word_phonology_dict()

Return a dict of words to Word objects. A Word instance has the methods spelling(), phones(), count(), frequency(), unigram_plog(), avg_unigram_plog(), bigram_plog(), and avg_bigram_plog().

Return type:dict(str: Word instance)
word_trigram_counter()

Return a dict of word trigrams with their counts.

Return type:dict(tuple(str): int)
word_unigram_counter()

Return a dict of words with their counts.

Return type:dict(str: in)
wordlist()

Return a wordlist sorted by word frequency in descending order. (So “the” will most likely be the first word for written English.)

Return type:list(str)
words_in_signatures()

Return a set of words that are in at least one morphological signature.

Return type:set(str)
words_to_contexts()

Return a dict of words to contexts with counts.

Return type:dict(str: dict(tuple(str): int))
words_to_neighbors()

Return a dict of words to syntactic neighbors.

Return type:dict(word: list(str))
words_to_phones()

Return a dict of words with their phones.

Return type:dict(str: list(str))
words_to_signatures()

Return a dict of words to morphological signatures.

Return type:dict(str: set(tuple(str)))
words_to_sigtransforms()

Return a dict of words to signature transforms.

Return type:dict(str: set(tuple(tuple(str), str))