Data

DATASETS

Bangla Word Embeddings -
Word embeddings are vector representations of word that allow machines to learn semantic and syntactic meanings by performing computations on them. Two well-known embedding models are CBOW and Skipgram. Different methods proposed to evaluate the quality of embeddings are categorized into extrinsic and intrinsic evaluation methods. This paper focuses on intrinsic evaluation - the evaluation of the models on tasks, such as analogy prediction, semantic relatedness, synonym detection, antonym detection and concept categorization. We present intrinsic evaluations on Bangla word embedding created using CBOW and Skipgram models on a Bangla corpus that we built. These are trained on more than 700,000 articles consisting of more than 1.3 million unique words with different embedding dimension sizes, e.g., 300, 100, 64, and 32. We created the evaluation datasets for the above-mentioned tasks and performed a comprehensive evaluation. We observe, word vectors of dimension 300, produced using Skipgram models, achieves accuracy of 51.33% for analogy prediction, a correlation of 0.62 for semantic relatedness, and accuracy of 53.85% and 9.56% for synonym and antonym detection 9.56%. Finally, for concept categorization the accuracy is 91.02%. The corpus and evaluation datasets are made publicly available for further research.

Dataset can be downloaded from here.

Citation Request:



EEG Datasets -
In this paper, we present a novel idea where we analyzed EEG signals to classify what type of video a person is watching which we believe is the first step of a BCI based video recommender system. For this, we setup an experiment where 13 subjects were shown three different types of videos. To be able to classify each of these videos from the EEG data of the subjects with a very good classification accuracy, we carried out experiments with several state-of-the-art algorithms for each of the submodules (pre-processing, feature extraction, feature selection and classification) of the Signal Processing module of a BCI system in order to find out what combination of algorithms best predicts what type of video a person is watching. We found, the best results (80.0% with 32.32 ms average total execution time per subject) are obtained when data of channel AF8 are used (i.e. data recorded from the electrode located at the right frontal lobe of the brain). The combination of algorithms that achieved this highest average accuracy of 80.0% are FIR Least Squares, Welch Spectrum, Principal Component Analysis and Adaboost for the submodules pre-processing, feature extraction, feature selection and classification respectively.

Dataset can be downloaded from here.

Citation Request:

Mutasim, A. K., Tipu, R. S., Bashar, M. R., & Amin, M. A. (2017, November). Video Category Classification Using Wireless EEG. In International Conference on Brain Informatics (pp. 39-48). Springer, Cham. Web Link

Bibtex :
@misc{mutasim_tipu_bashar_amin_2017, title={Video Category Classification Using Wireless EEG}, url={https://link.springer.com/chapter/10.1007/978-3-319-70772-3_4}, journal={SpringerLink}, publisher={Springer, Cham}, author={Mutasim, Aunnoy K and Tipu, Rayhan Sardar and Bashar, M. Raihanul and Amin, M. Ashraful}, year={2017}, month={Nov}}


CLEF Datasets -
We are organizing a lab session at the ImageCLEF 2015. ImageCLEF 2015 is an evaluation campaign which is being organized as part of The CLEF Initiative labs. The results of this campaign, including selected works among the participants, will be presented in the Conference and Labs of the Evaluation Forum (CLEF) 2015 which will be held in the city of Toulouse in France, 8-11 September 2015. For the 2015 edition, ImageCLEF will organize three main tasks with a global objective of benchmarking automatic annotation and indexing of images. The tasks tackle different aspects of the annotation problem and are aimed at supporting and promoting cutting-edge research addressing the key challenges in the field. A wide range of source images and annotation objectives are considered, such as general multi-domain images for object or concept detection, as well as domain-specific tasks such as labelling and separation of compound figures from biomedical literature and volumetric medical images for automated structured reporting.

The session computer vision and cybernetics group is organizing called medical clustering. Training Dataset (500 image) for the task can be download from the competition website now. Test dataset (250 image) will be released in March 2015, after the conference in September 2015 we will make all 5000 data public. Which can be downloaded from www.cvcrbd.org.