Before an Odinson index can be created, the text needs to be annotated. You may use your own annotation tools, as long as you convert your annotated output to Odinson Documents.
However, we also provide an App for annotating free text and producing this format, which makes use of the clulab Processors library.
The configurations are specified in
First, decide what Processor you’d like to use to annotate the text by specifying a value for
odinson.extra.processorType. Available options are
CluProcessor. For more information about these, see clulab Processors.
odinson.docDirare set as intended. Text will be read from
odinson.textDir, annotated, and serialized to
NOTE: We recommend a directory structure where you will have a data folder with subdirs
index. If you do this, you can simply specify
odinson.dataDir = path/to/your/dataDir, and the subfolders will be handled.
Depending on the number and size of the documents you are annotating, this step can be memory intensive. We recommend you set aside at least 8g, but if you have more it will run faster. You can specify this through this command:
sbt "extra/runMain ai.lum.odinson.extra.AnnotateText"
This step may take time, highly dependent on the length of your documents and the size of your corpus.