medacy.pipeline_components.metamap.metamap module

A utility class to Metamap medical text documents. Metamap a file and utilize it the output or manipulate stored metamap output

class medacy.pipeline_components.metamap.metamap.MetaMap(metamap_path=None, cache_output=False, cache_directory=None, convert_ascii=True)[source]

Bases: object

_convert_to_ascii(text)[source]

Takes in a text string and converts it to ASCII, keeping track of each character change

The changes are recorded in a list of objects, each object detailing the original non-ASCII character and the starting index and length of the replacement in the new string (keys original, start, and length, respectively).

Parameters

text (string) – The text to be converted

Returns

tuple containing:

text (string): The converted text

diff (list): Record of all ASCII conversions

Return type

tuple

_item_generator(json_input, lookup_key)[source]
_restore_from_ascii(text, diff, metamap_dict)[source]

Takes in non-ascii text and the list of changes made to it from the convert() function, as well as a dictionary of metamap taggings, converts the text back to its original state and updates the character spans in the metamap dict to match

Parameters
  • text (string) – Output of _convert_to_ascii()

  • diff (list) – Output of _convert_to_ascii()

  • metamap_dict (dict) – Dictionary of metamap information obtained from text

Returns

tuple containing:

text (string): The input with all of the changes listed in diff reversed metamap_dict (dict): The input with all of its character spans updated to reflect the changes to text

Return type

tuple

_run_metamap(args, document)[source]

Runs metamap through bash and feeds in appropriate arguments :param args: arguments to feed into metamap :param document: the raw text to be metamapped :return:

extract_mapped_terms(metamap_dict)[source]

Extracts an array of term dictionaries from metamap_dict :param metamap_dict: A dictionary containing the metamap output :return: an array of mapped_terms

get_semantic_types_by_term(term)[source]

Returns an array of the semantic types of a given term :param term: :return:

get_span_by_term(term)[source]

Takes a given utterance dictionary (term) and extracts out the character indices of the utterance

Parameters

term – The full dictionary corresponding to a metamap term

Returns

the span of the referenced term in the document

get_term_by_semantic_type(mapped_terms, include=[], exclude=None)[source]

Returns Metamapped utterances that all contain a given set of semantic types found in include

Parameters

mapped_terms – An array of candidate dictionaries

Returns

the dictionaries that contain a term with all the semantic types in semantic_types

load(file_to_load)[source]
map_corpus(documents, directory=None, n_job=-1)[source]

Metamaps a large amount of files quickly by forking processes and utilizing multiple cores

Parameters
  • documents – an array of documents to map

  • directory – location to map all files

  • n_job – number of cores to utilize at once while mapping - this may use a large amount of memory

Returns

map_file(file_to_map, max_prune_depth=10)[source]

Maps a given document from a file_path and returns a formatted dict :param file_to_map: the path of the file that will be metamapped :param max_prune_depth: set to larger for better results. See metamap specs about pruning depth. :return:

map_text(text, max_prune_depth=10)[source]
mapped_terms_to_spacy_ann(mapped_terms, entity_label=None)[source]

Transforms an array of mapped_terms in a spacy annotation object. Label for each annotation defaults to first semantic type in semantic_type array :param mapped_terms: an array of mapped terms :param label: the label to assign to each annotation, defaults to first semantic type of mapped_term :return: a annotation formatted to spacy’s specifications