status: work in progress (2018-07-09)
The simplest format is plain text in which tokens are delimited by white space:
We're not laughing at you - we're laughing near you.
We ' re not laughing at you - we ' re laughing near you .
These formats are accepted as basic input by many annotation tools. Check the documentation.
This this kind of formats, annotations or various types are added, either pertaining to individual tokens or groups of tokens. The following formats are typical output of part of speech taggers:
We_PRN '_' re_VB not_RB laughing_VBG at_IN you_PRP ._. We_PRN '_' re_VB laughing_VBG near_IN you_PRP ._.
We/PRN '/' re/VB not/RB laughing/VBG at/IN you/PRP ./. We/PRN '/' re/VB laughing/VBG near/IN you/PRP ./.
Document: ID=example.txt (2 sentences, 13 tokens) Sentence #1 (7 tokens): We're not laughing at you. Tokens: [Text=We CharacterOffsetBegin=0 CharacterOffsetEnd=2 PartOfSpeech=PRP] [Text='re CharacterOffsetBegin=2 CharacterOffsetEnd=5 PartOfSpeech=VBP] [Text=not CharacterOffsetBegin=6 CharacterOffsetEnd=9 PartOfSpeech=RB] [Text=laughing CharacterOffsetBegin=10 CharacterOffsetEnd=18 PartOfSpeech=VBG] [Text=at CharacterOffsetBegin=19 CharacterOffsetEnd=21 PartOfSpeech=IN] [Text=you CharacterOffsetBegin=22 CharacterOffsetEnd=25 PartOfSpeech=PRP] [Text=. CharacterOffsetBegin=25 CharacterOffsetEnd=26 PartOfSpeech=.] Sentence #2 (6 tokens): We're laughing near you. Tokens: [Text=We CharacterOffsetBegin=27 CharacterOffsetEnd=29 PartOfSpeech=PRP] [Text='re CharacterOffsetBegin=29 CharacterOffsetEnd=32 PartOfSpeech=VBP] [Text=laughing CharacterOffsetBegin=33 CharacterOffsetEnd=41 PartOfSpeech=VBG] [Text=near CharacterOffsetBegin=42 CharacterOffsetEnd=46 PartOfSpeech=IN] [Text=you CharacterOffsetBegin=47 CharacterOffsetEnd=50 PartOfSpeech=PRP] [Text=. CharacterOffsetBegin=50 CharacterOffsetEnd=51 PartOfSpeech=.]
read more about conll here: https://universaldependencies.org/format.html
annotators: tokenize,ssplit,pos
idx | word | lemma | pos | ner | headidx | deprel |
---|---|---|---|---|---|---|
1 | We | _ | PRP | _ | _ | _ |
2 | 're | _ | VBP | _ | _ | _ |
3 | not | _ | RB | _ | _ | _ |
4 | laughing | _ | VBG | _ | _ | _ |
5 | at | _ | IN | _ | _ | _ |
6 | you | _ | PRP | _ | _ | _ |
7 | . | _ | . | _ | _ | _ |
1 | We | _ | PRP | _ | _ | _ |
2 | 're | _ | VBP | _ | _ | _ |
3 | laughing | _ | VBG | _ | _ | _ |
4 | near | _ | IN | _ | _ | _ |
5 | you | _ | PRP | _ | _ | _ |
6 | . | _ | . | _ | _ | _ |