medacy.tools.con.con_to_brat module

Converts data from con to brat. Enter input and output directories as command line arguments. Each ‘.con’ file must have a ‘.txt’ file in the same directory with the same name, minus the extension. Use ‘-c’ (without quotes) as an optional final command-line argument to copy the text files used in the conversion process to the output directory.

Function ‘convert_con_to_brat()’ can be imported independently and run on individual files.

author

Steele W. Farnsworth

date

30 December, 2018

medacy.tools.con.con_to_brat.check_valid_line(item: str)[source]

Non-comprehensive tests to see if a given line is valid for conversion. Returns respective boolean value. :param item: A string that is a line of text, hopefully in the con format. :return: Boolean of whether or not the line appears to be in con format.

medacy.tools.con.con_to_brat.convert_con_to_brat(con_file_path, text_file_path=None)[source]

Converts a con file to a string representation of a brat file. :param con_file_path: Path to the con file being converted. If a valid path is not provided but the argument is a

string, it will be parsed as if it were a representation of the con file itself.

Parameters

text_file_path – Path to the text file associated with the con file. If not provided, the function will look for a text file in the same directory with the same name except for the extention switched to ‘txt’. Else, raises error. Note that no conversion can be performed without the text file.

Returns

A string representation of the brat file, which can then be written to file if desired.

medacy.tools.con.con_to_brat.get_absolute_index(txt, txt_lns, ind)[source]

Given one of the d+:d+ spans, which represent the index of a char relative to the start of the line it’s on, returns the index of that char relative to the start of the file. :param txt: The text file associated with the annotation. :param txt_lns: The same text file as a list broken by lines :param ind: The string in format d+:d+ :return: The absolute index

medacy.tools.con.con_to_brat.line_to_dict(item)[source]

Converts a string that is a line in con format to a dictionary representation of that data. Keys are: data_item; start_ind; end_ind; data_type. :param item: The line of con text (str). :return: The dictionary containing that data.

medacy.tools.con.con_to_brat.switch_extension(name, ext)[source]

Takes the name of a file (str) and changes the extension to the one provided (str)