JoBimText is an open source framework for application of Distributional Semantics using lexicalized features. It is providing a software solution for automatic text expansion using contextualized distributional similarity
The project is maintained by the Language Technology group at the TU Darmstadt and IBM Research.
The first step is the holing operation that transforms text into a term–feature representation (Jo–Bim). This text representation can be used for contextualization (text expansion, sense disambiguation) or for the calculation of a Distributional Thesaurus (DT).
The DT can be used as a lexical resource and put into a database. Several database servers are supported natively (MySQL, DB2, DCA). Others can be added using an interface. For a distributed database server with strong performance, consider the DCA Server.
Furthermore, it is possible to enrich the DT with sense clusters. The disambiguation of DT entries can be performed with Chinese Whispers clustering, an unsupervised clustering algorithm that detects the number of clusters. The enriched DT can also be accessed from a database using the DT API.
- September 11, 2014: Wikipedia Trigram model available in the Web Demo
- July 11, 2014: JoBimText Web Demo: Multi-Word support and Medline Multi-Word model realeased
- May 28, 2014: Test Corpus released, 1M Sentences
- May 5, 2014: German Trigram model available in the web demo and the API
- April 14, 2014: New JoBimText model released: English news trigram
JoBimText is licensed under the Apache Software License 2.0. This permissive license allows you to use, modify the code and redistribute the code or compiled software:
- You may use this software in commercial projects.
- You may change the code to suit your needs.
- If you modify files, you have to make it visible.
- You may distribute derivative work under any license.