eurasip.org
[8], image labeling and retrieval [9], etc. The term multimodal fusion is used to indicate the integration of information from multiple modalities. In this work, we fuse text-, audio- and image-based models for the estimation of word semantic similarity. Two main fusion methods are employed here, namely middle and late fusion.