Full API documentation¶
Once a Linguistica object (such as lxa_object
below with the Brown corpus)
is initialized, various methods and attributes are available for automatic
linguistic analysis:
>>> import linguistica as lxa
>>> lxa_object = lxa.read_corpus('path/to/english-brown.txt')
>>> words = lxa_object.wordlist() # using wordlist()
Basic information¶
number_of_word_tokens () |
Return the number of word tokens. |
number_of_word_types () |
Return the number of word types. |
Word ngrams¶
Parameter: max_word_tokens
wordlist () |
Return a wordlist sorted by word frequency in descending order. |
word_unigram_counter () |
Return a dict of words with their counts. |
word_bigram_counter () |
Return a dict of word bigrams with their counts. |
word_trigram_counter () |
Return a dict of word trigrams with their counts. |
Morphological signatures¶
Parameters: min_stem_length
, max_affix_length
, min_sig_count
, suffixing
signatures () |
Return a set of morphological signatures. |
stems () |
Return a set of stems. |
affixes () |
Return a set of affixes. |
signatures_to_stems () |
Return a dict of morphological signatures to stems. |
signatures_to_words () |
Return a dict of morphological signatures to words. |
affixes_to_signatures () |
Return a dict of affixes to morphological signatures. |
stems_to_signatures () |
Return a dict of stems to morphological signatures. |
stems_to_words () |
Return a dict of stems to words. |
words_in_signatures () |
Return a set of words that are in at least one morphological signature. |
words_to_signatures () |
Return a dict of words to morphological signatures. |
words_to_sigtransforms () |
Return a dict of words to signature transforms. |
Word manifolds and syntactic word neighborhood¶
Parameters: max_word_types
, min_context_count
, n_neighbors
, n_eigenvectors
words_to_neighbors () |
Return a dict of words to syntactic neighbors. |
neighbor_graph () |
Return the syntactic word neighborhood graph. |
words_to_contexts () |
Return a dict of words to contexts with counts. |
contexts_to_words () |
Return a dict of contexts to words with counts. |
Phonology¶
phone_unigram_counter () |
Return a dict of phone unigrams with counts. |
phone_bigram_counter () |
Return a dict of phone bigrams with counts. |
phone_trigram_counter () |
Return a dict of phone trigrams with counts. |
Tries¶
Parameter: min_stem_length
broken_words_left_to_right () |
Return a dict of words to their left-to-right broken form. |
broken_words_right_to_left () |
Return a dict of words to their right-to-left broken form. |
successors () |
Return a dict of word (sub)strings to their successors. |
predecessors () |
Return a dict of word (sub)strings to their predecessors. |
Other methods and attributes¶
parameters () |
Return the parameter dict. |
change_parameters (**kwargs) |
Change parameters specified by kwargs. |
use_default_parameters () |
Reset parameters to their default values. |
reset () |
Reset the Linguistica object. |
-
class
linguistica.lexicon.
Lexicon
(file_path=None, wordlist_file=False, corpus_object=None, wordlist_object=None, encoding='utf8', **kwargs)¶ A class for a Linguistica object.
-
affixes
()¶ Return a set of affixes.
Return type: set(str)
-
affixes_to_signatures
()¶ Return a dict of affixes to morphological signatures.
Return type: dict(str: set(tuple(str)))
-
biphone_dict
()¶ Return a dict of phone bigrams to Biphone objects. A Biphone instance has the methods
spelling()
,count()
,frequency()
,MI()
, andweighted_MI()
.Return type: dict((str, str): Biphone instance)
-
broken_words_left_to_right
()¶ Return a dict of words to their left-to-right broken form.
Return type: dict(str: list(str))
-
broken_words_right_to_left
()¶ Return a dict of words to their right-to-left broken form.
Return type: dict(str: list(str))
-
change_parameters
(**kwargs)¶ Change parameters specified by kwargs.
Parameters: kwargs – keyword arguments for parameters and their new values
-
contexts_to_words
()¶ Return a dict of contexts to words with counts.
Return type: dict(tuple(str): dict(str: int))
-
neighbor_graph
()¶ Return the syntactic word neighborhood graph.
Return type: networkx undirected graph
-
number_of_word_tokens
()¶ Return the number of word tokens.
Return type: int
-
number_of_word_types
()¶ Return the number of word types.
Return type: int
-
output_all_results
(directory=None, verbose=False, test=False)¶ Output all Linguistica results to directory.
Parameters: directory – output directory. If not specified, it defaults to the current directory given by os.getcwd()
.
-
parameters
()¶ Return the parameter dict.
Return type: dict(str: int)
-
phone_bigram_counter
()¶ Return a dict of phone bigrams with counts.
Return type: dict(tuple(str): int)
-
phone_dict
()¶ Return a dict of phone unigrams to Phone objects. A Phone instance has the methods
spelling()
,count()
,frequency()
, andplog()
.Return type: dict(str: Phone instance)
-
phone_trigram_counter
()¶ Return a dict of phone trigrams with counts.
Return type: dict(tuple(str): int)
-
phone_unigram_counter
()¶ Return a dict of phone unigrams with counts.
Return type: dict(str: int)
-
predecessors
()¶ Return a dict of word (sub)strings to their predecessors.
Return type: dict(str: set(str))
-
reset
()¶ Reset the Linguistica object. While the file path information is retained, all computed objects (ngrams, signatures, word neighbors, etc) are reset to
NULL
; if they are called again, they are re-computed.
-
run_all_modules
(verbose=False)¶ Run all modules.
-
run_manifold_module
(verbose=False)¶ Run the phon module.
-
run_ngram_module
(verbose=False)¶ Run the ngram module.
-
run_phon_module
(verbose=False)¶ Run the phon module.
-
run_signature_module
(verbose=False)¶ Run the signature module.
-
run_trie_module
(verbose=False)¶ Run the trie module.
-
signatures
()¶ Return a set of morphological signatures.
Return type: set(tuple(str))
-
signatures_to_stems
()¶ Return a dict of morphological signatures to stems.
Return type: dict(tuple(str): set(str))
-
signatures_to_words
()¶ Return a dict of morphological signatures to words.
Return type: dict(tuple(str): set(str))
-
stems
()¶ Return a set of stems.
Return type: set(str)
-
stems_to_signatures
()¶ Return a dict of stems to morphological signatures.
Return type: dict(str: set(tuple(str)))
-
stems_to_words
()¶ Return a dict of stems to words.
Return type: dict(str: set(str))
-
successors
()¶ Return a dict of word (sub)strings to their successors.
Return type: dict(str: set(str))
-
use_default_parameters
()¶ Reset parameters to their default values.
-
word_bigram_counter
()¶ Return a dict of word bigrams with their counts.
Return type: dict(tuple(str): int)
-
word_phonology_dict
()¶ Return a dict of words to Word objects. A Word instance has the methods
spelling()
,phones()
,count()
,frequency()
,unigram_plog()
,avg_unigram_plog()
,bigram_plog()
, andavg_bigram_plog()
.Return type: dict(str: Word instance)
-
word_trigram_counter
()¶ Return a dict of word trigrams with their counts.
Return type: dict(tuple(str): int)
-
word_unigram_counter
()¶ Return a dict of words with their counts.
Return type: dict(str: in)
-
wordlist
()¶ Return a wordlist sorted by word frequency in descending order. (So “the” will most likely be the first word for written English.)
Return type: list(str)
-
words_in_signatures
()¶ Return a set of words that are in at least one morphological signature.
Return type: set(str)
-
words_to_contexts
()¶ Return a dict of words to contexts with counts.
Return type: dict(str: dict(tuple(str): int))
-
words_to_neighbors
()¶ Return a dict of words to syntactic neighbors.
Return type: dict(word: list(str))
-
words_to_phones
()¶ Return a dict of words with their phones.
Return type: dict(str: list(str))
-
words_to_signatures
()¶ Return a dict of words to morphological signatures.
Return type: dict(str: set(tuple(str)))
-
words_to_sigtransforms
()¶ Return a dict of words to signature transforms.
Return type: dict(str: set(tuple(tuple(str), str))
-