- Gensim - Discussion
- Gensim - Useful Resources
- Gensim - Quick Guide
- Gensim - Doc2Vec Model
- Gensim - Developing Word Embedding
- Gensim - Creating LSI & HDP Topic Model
- Gensim - Documents & LDA Model
- Gensim - Creating LDA Mallet Model
- Gensim - Using LDA Topic Model
- Gensim - Creating LDA Topic Model
- Gensim - Topic Modeling
- Gensim - Creating TF-IDF Matrix
- Gensim - Transformations
- Creating a bag of words (BoW) Corpus
- Gensim - Creating a Dictionary
- Gensim - Vector & Model
- Gensim - Documents & Corpus
- Gensim - Getting Started
- Gensim - Introduction
- Gensim - Home
Selected Reading
- Who is Who
- Computer Glossary
- HR Interview Questions
- Effective Resume Writing
- Questions and Answers
- UPSC IAS Exams Notes
Gensim - Using LDA Topic Model
In this chapter, we will understand how to use Latent Dirichlet Allocation (LDA) topic model.
Viewing Topics in LDA Model
The LDA model (lda_model) we have created above can be used to view the topics from the documents. It can be done with the help of following script −
pprint(lda_model.print_topics()) doc_lda = lda_model[corpus]
Output
[ (0, 0.036*"go" + 0.027*"get" + 0.021*"time" + 0.017*"back" + 0.015*"good" + 0.014*"much" + 0.014*"be" + 0.013*"car" + 0.013*"well" + 0.013*"year" ), (1, 0.078*"screen" + 0.067*"video" + 0.052*"character" + 0.046*"normal" + 0.045*"mouse" + 0.034*"manager" + 0.034*"disease" + 0.031*"processor" + 0.028*"excuse" + 0.028*"choice" ), (2, 0.776*"ax" + 0.079*"_" + 0.011*"boy" + 0.008*"ticket" + 0.006*"red" + 0.004*"conservative" + 0.004*"cult" + 0.004*"amazing" + 0.003*"runner" + 0.003*"roughly" ), (3, 0.086*"season" + 0.078*"fan" + 0.072*"reapty" + 0.065*"trade" + 0.045*"concept" + 0.040*"pen" + 0.028*"blow" + 0.025*"improve" + 0.025*"cap" + 0.021*"penguin" ), (4, 0.027*"group" + 0.023*"issue" + 0.016*"case" + 0.016*"cause" + 0.014*"state" + 0.012*"whole" + 0.012*"support" + 0.011*"government" + 0.010*"year" + 0.010*"rate" ), (5, 0.133*"evidence" + 0.047*"bepeve" + 0.044*"repgion" + 0.042*"bepef" + 0.041*"sense" + 0.041*"discussion" + 0.034*"atheist" + 0.030*"conclusion" + 0.029*"explain" + 0.029*"claim" ), (6, 0.083*"space" + 0.059*"science" + 0.031*"launch" + 0.030*"earth" + 0.026*"route" + 0.024*"orbit" + 0.024*"scientific" + 0.021*"mission" + 0.018*"plane" + 0.017*"satelpte" ), (7, 0.065*"file" + 0.064*"program" + 0.048*"card" + 0.041*"window" + 0.038*"driver" + 0.037*"software" + 0.034*"run" + 0.029*"machine" + 0.029*"entry" + 0.028*"version" ), (8, 0.078*"pubpsh" + 0.059*"mount" + 0.050*"turkish" + 0.043*"armenian" + 0.027*"western" + 0.026*"russian" + 0.025*"locate" + 0.024*"proceed" + 0.024*"electrical" + 0.022*"terrorism" ), (9, 0.023*"people" + 0.023*"child" + 0.021*"kill" + 0.020*"man" + 0.019*"death" + 0.015*"die" + 0.015*"pve" + 0.014*"attack" + 0.013*"age" + 0.011*"church" ), (10, 0.092*"cpu" + 0.085*"black" + 0.071*"controller" + 0.039*"white" + 0.028*"water" + 0.027*"cold" + 0.025*"sopd" + 0.024*"cool" + 0.024*"heat" + 0.023*"nuclear" ), (11, 0.071*"monitor" + 0.044*"box" + 0.042*"option" + 0.041*"generate" + 0.038*"vote" + 0.032*"battery" + 0.029*"wave" + 0.026*"tradition" + 0.026*"fairly" + 0.025*"task" ), (12, 0.048*"send" + 0.045*"mail" + 0.036*"pst" + 0.033*"include" + 0.032*"price" + 0.031*"address" + 0.027*"email" + 0.026*"receive" + 0.024*"book" + 0.024*"sell" ), (13, 0.515*"drive" + 0.052*"laboratory" + 0.042*"bpnd" + 0.020*"investment" + 0.011*"creature" + 0.010*"loop" + 0.005*"dialog" + 0.000*"slave" + 0.000*"jumper" + 0.000*"sector" ), (14, 0.153*"patient" + 0.066*"treatment" + 0.062*"printer" + 0.059*"doctor" + 0.036*"medical" + 0.031*"energy" + 0.029*"study" + 0.029*"probe" + 0.024*"mph" + 0.020*"physician" ), (15, 0.068*"law" + 0.055*"gun" + 0.039*"government" + 0.036*"right" + 0.029*"state" + 0.026*"drug" + 0.022*"crime" + 0.019*"person" + 0.019*"citizen" + 0.019*"weapon" ), (16, 0.107*"team" + 0.102*"game" + 0.078*"play" + 0.055*"win" + 0.052*"player" + 0.051*"year" + 0.030*"score" + 0.025*"goal" + 0.023*"wing" + 0.023*"run" ), (17, 0.031*"say" + 0.026*"think" + 0.022*"people" + 0.020*"make" + 0.017*"see" + 0.016*"know" + 0.013*"come" + 0.013*"even" + 0.013*"thing" + 0.013*"give" ), (18, 0.039*"system" + 0.034*"use" + 0.023*"key" + 0.016*"bit" + 0.016*"also" + 0.015*"information" + 0.014*"source" + 0.013*"chip" + 0.013*"available" + 0.010*"provide" ), (19, 0.085*"pne" + 0.073*"write" + 0.053*"article" + 0.046*"organization" + 0.034*"host" + 0.023*"be" + 0.023*"know" + 0.017*"thank" + 0.016*"want" + 0.014*"help" ) ]
Computing Model Perplexity
The LDA model (lda_model) we have created above can be used to compute the model’s perplexity, i.e. how good the model is. The lower the score the better the model will be. It can be done with the help of following script −
print( Perplexity: , lda_model.log_perplexity(corpus))
Output
Perplexity: -12.338664984332151
Computing Coherence Score
The LDA model (lda_model) we have created above can be used to compute the model’s coherence score i.e. the average /median of the pairwise word-similarity scores of the words in the topic. It can be done with the help of following script −
coherence_model_lda = CoherenceModel( model=lda_model, texts=data_lemmatized, dictionary=id2word, coherence= c_v ) coherence_lda = coherence_model_lda.get_coherence() print( Coherence Score: , coherence_lda)
Output
Coherence Score: 0.510264381411751
Visuapsing the Topics-Keywords
The LDA model (lda_model) we have created above can be used to examine the produced topics and the associated keywords. It can be visuapsed by using pyLDAvispackage as follows −
pyLDAvis.enable_notebook() vis = pyLDAvis.gensim.prepare(lda_model, corpus, id2word) vis
Output
From the above output, the bubbles on the left-side represents a topic and larger the bubble, the more prevalent is that topic. The topic model will be good if the topic model has big, non-overlapping bubbles scattered throughout the chart.
Advertisements