From Wikipedia, the free encyclopedia
Jump to navigation Jump to search
Gensim logo.png
Original author(s) Radim Řehůřek
Developer(s) RARE Technologies Ltd.
Initial release 2009
Stable release
3.7.1 / 31 January 2019; 48 days ago (2019-01-31)
Written in Python
Operating system Linux, Windows, macOS
Type Information retrieval
License LGPL

Gensim is a production-ready open-source library for unsupervised topic modeling and natural language processing, using modern statistical machine learning.

Gensim is implemented in Python and Cython for top performance and scalability. Gensim is specifically designed to handle large text collections using data streaming and incremental online algorithms, which differentiates it from most other machine learning software packages that target only in-memory processing.

Main features

Gensim includes streamed parallelized implementations of fastText[1], word2vec and doc2vec algorithms,[2], as well as latent semantic analysis (LSA, LSI, SVD), non-negative matrix factorization (NMF), latent Dirichlet allocation (LDA), tf-idf and random projections.[3]

Some of the novel online algorithms in Gensim were also published in the 2011 PhD dissertation Scalability of Semantic Analysis in Natural Language Processing of Radim Řehůřek, the creator of Gensim.[4]

Uses of Gensim

Gensim has been used and cited in over 1400 commercial and academic applications as of 2018[5], in a diverse array of disciplines from medicine to insurance claim analysis to patent search[6]. The software has been covered in several new articles, podcasts and interviews.[7][8][9]

Free and commercial support

The open source code is developed and hosted on GitHub[10] and a public support forum is maintained on Google Groups[11] and Gitter.[12]

Gensim is commercially supported by the company, who also provide student mentorships and academic thesis projects for Gensim via their Student Incubator programme.[13]


  1. ^ Scalable *2vec training
  2. ^ Deep learning with word2vec and Gensim
  3. ^ Radim Řehůřek and Petr Sojka (2010). Software framework for topic modelling with large corpora. Proc. LREC Workshop on New Challenges for NLP Frameworks
  4. ^ Řehůřek, Radim (2011). "Scalability of Semantic Analysis in Natural Language Processing" (PDF). Retrieved 27 January 2015. my open-source gensim software package that accompanies this thesis
  5. ^ Gensim academic citations
  6. ^ Commercial adopters of Gensim
  7. ^ Podcast.__init__ episode #71 on Gensim
  8. ^ Interview with Radim Řehůřek, creator of Gensim
  9. ^
  10. ^ Gensim source code on Github
  11. ^ Gensim mailing list on Google Groups
  12. ^ Gensim chat room on Gitter
  13. ^ Gensim open source Incubator

External links

  • Official website

Retrieved from ""
This content was retrieved from Wikipedia :
This page is based on the copyrighted Wikipedia article "Gensim"; it is used under the Creative Commons Attribution-ShareAlike 3.0 Unported License (CC-BY-SA). You may redistribute it, verbatim or modified, providing that you comply with the terms of the CC-BY-SA