medacy.tools.con.brat_to_con module

Converts data from brat to con. Enter input and output directories as command line arguments. Each ‘.ann’ file must have a ‘.txt’ file in the same directory with the same name, minus the extension. Use ‘-c’ (without quotes) as an optional final command-line argument to copy the text files used in the conversion process to the output directory.

Also possible to import ‘convert_brat_to_con()’ directly and pass the paths to the ann and txt files for individual conversion.

author

Steele W. Farnsworth

date

30 December, 2018

medacy.tools.con.brat_to_con.check_valid_line(item: str)[source]

Returns a boolean value for whether or not a given line is in the BRAT format. Tests are not comprehensive.

medacy.tools.con.brat_to_con.convert_brat_to_con(brat_file_path, text_file_path=None)[source]

Takes a path to a brat file and returns a string representation of that file converted to the con format. :param brat_file_path: The path to the brat file; not the file itself. If the path is not valid, the argument

will be assumed to be text of the brat file itself.

Parameters

text_file_path – The path to the text file; if not provided, assumed to be a file with the same path as the brat file ending in ‘.txt’ instead of ‘.ann’. If neither file is found, raises error.

Returns

A string (not a file) of the con equivalent of the brat file.

medacy.tools.con.brat_to_con.find_line_num(text_, start)[source]
Parameters
  • text – The text of the file, ex. f.read()

  • start – The index at which the desired text starts

Returns

The line index (starting at 0) containing the given start index

medacy.tools.con.brat_to_con.get_end_word_index(data_item: str, start_index, end_index)[source]

Returns the index of the first char of the last word of data_item_; all parameters shadow the appropriate name in the final for loop

medacy.tools.con.brat_to_con.get_relative_index(text_: str, line_, absolute_index)[source]

Takes the index of a phrase (the phrase itself is not a parameter) relative to the start of its file and returns its index relative to the start of the line that it’s on. Assumes that the line_ argument is long enough that (and thus so specific that) it only occurs once. :param text_: The text of the file, not separated by lines :param line_: The text of the line being searched for :param absolute_index: The index of a given phrase :return: The index of the phrase relative to the start of the line

medacy.tools.con.brat_to_con.line_to_dict(item)[source]

Converts a string that is a line in brat format to a dictionary representation of that data. Keys are: T; data_type; start_ind; end_ind; data_type. :param item: The line of con text (str). :return: The dictionary containing that data.

medacy.tools.con.brat_to_con.switch_extension(name, ext)[source]

Primarily for internal use. Takes the name of a file (str) and changes the extension to the one provided (str)