PSB 2016 Social Media Mining Shared Task Workshop



Task 1 Description

Task 2 Description

Task 3 Description






Task 2

Task brief:

This sub-task is a Named Entity Recognition (NER) task, and the aim is to automatically extract the ADR mentions reported in user posts. This includes identifying the text span of the reported ADRs. Participants may use advanced machine learning systems to extract the mentions and correctly distinguish ADRs from similar non-ADR mentions.
The data for this sub-task includes 2000+ tweets which are fully annotated for mentions of ADR and indications (reasons to use the drug). This set contains a subset of the tweets from sub-task 1 that were tagged as hasADR plus a random set of 800 nonADR tweets. The nonADR subset was annotated for mentions of indications, in order to allow participants to develop techniques to deal with this confusion class. The annotations are stored in a text file that contains the following details for each annotation: tweet ID, start offset, end offset, semantic type (ADR/Indication), UMLS ID, annotated text span and the related drug.
Participating teams must submit their results on the test set in the same format as the training set.

Training Data:

All the training data and necessary documentation can be found in the following link:

Download link  

Note: There are two sets of data in the above link. Both sets should be used for training in this task. Details about the data can be found in the ReadMe.txt file accompanying the data set.

Test Data:
Test data will be made available here.





© DIEGO LAB 2015 Competition Organisers.