medacy.pipeline_components.metamap.metamap module¶
A utility class to Metamap medical text documents. Metamap a file and utilize it the output or manipulate stored metamap output
-
class
medacy.pipeline_components.metamap.metamap.
MetaMap
(metamap_path=None, cache_output=False, cache_directory=None, convert_ascii=True)[source]¶ Bases:
object
-
_convert_to_ascii
(text)[source]¶ Takes in a text string and converts it to ASCII, keeping track of each character change
The changes are recorded in a list of objects, each object detailing the original non-ASCII character and the starting index and length of the replacement in the new string (keys
original
,start
, andlength
, respectively).- Parameters
text (string) – The text to be converted
- Returns
tuple containing:
text (string): The converted text
diff (list): Record of all ASCII conversions
- Return type
tuple
-
_restore_from_ascii
(text, diff, metamap_dict)[source]¶ Takes in non-ascii text and the list of changes made to it from the convert() function, as well as a dictionary of metamap taggings, converts the text back to its original state and updates the character spans in the metamap dict to match
- Parameters
text (string) – Output of
_convert_to_ascii()
diff (list) – Output of
_convert_to_ascii()
metamap_dict (dict) – Dictionary of metamap information obtained from
text
- Returns
tuple containing:
text (string): The input with all of the changes listed in
diff
reversed metamap_dict (dict): The input with all of its character spans updated to reflect the changes totext
- Return type
tuple
-
_run_metamap
(args, document)[source]¶ Runs metamap through bash and feeds in appropriate arguments :param args: arguments to feed into metamap :param document: the raw text to be metamapped :return:
-
extract_mapped_terms
(metamap_dict)[source]¶ Extracts an array of term dictionaries from metamap_dict :param metamap_dict: A dictionary containing the metamap output :return: an array of mapped_terms
-
get_semantic_types_by_term
(term)[source]¶ Returns an array of the semantic types of a given term :param term: :return:
-
get_span_by_term
(term)[source]¶ Takes a given utterance dictionary (term) and extracts out the character indices of the utterance
- Parameters
term – The full dictionary corresponding to a metamap term
- Returns
the span of the referenced term in the document
-
get_term_by_semantic_type
(mapped_terms, include=[], exclude=None)[source]¶ Returns Metamapped utterances that all contain a given set of semantic types found in include
- Parameters
mapped_terms – An array of candidate dictionaries
- Returns
the dictionaries that contain a term with all the semantic types in semantic_types
-
map_corpus
(documents, directory=None, n_job=-1)[source]¶ Metamaps a large amount of files quickly by forking processes and utilizing multiple cores
- Parameters
documents – an array of documents to map
directory – location to map all files
n_job – number of cores to utilize at once while mapping - this may use a large amount of memory
- Returns
-
map_file
(file_to_map, max_prune_depth=10)[source]¶ Maps a given document from a file_path and returns a formatted dict :param file_to_map: the path of the file that will be metamapped :param max_prune_depth: set to larger for better results. See metamap specs about pruning depth. :return:
-
mapped_terms_to_spacy_ann
(mapped_terms, entity_label=None)[source]¶ Transforms an array of mapped_terms in a spacy annotation object. Label for each annotation defaults to first semantic type in semantic_type array :param mapped_terms: an array of mapped terms :param label: the label to assign to each annotation, defaults to first semantic type of mapped_term :return: a annotation formatted to spacy’s specifications
-