| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Bhargavi Topic Modeling

Page history last edited by bhargavi@umail.ucsb.edu 10 years, 5 months ago

I used the GUI topic modeling tool Gabe posted and asked it to sort 'Alice in Wonderland' into 5 topics. This is what it did:

 

 

0 10 alice mock voice hatter duchess tone back hare day poor 

1 10 alice project queen time ll work tm dormouse long king 

2 10 thought gryphon rabbit thing don found great white caterpillar make 

3 10 turtle began mouse put ve alice march won half words 

4 10 alice gutenberg head round looked large replied added king things 

 

<50> LL/token: -8.9043

<60> LL/token: -8.88294

<70> LL/token: -8.88323

<80> LL/token: -8.88141

<90> LL/token: -8.87428

 

0 10 alice mock hatter voice duchess tone back cat hare day 

1 10 project queen time work ll tm dormouse long made works 

2 10 thought gryphon rabbit thing don found great caterpillar white make 

3 10 alice turtle began mouse put ve won things half words 

4 10 gutenberg king head round looked large march replied added good 

 

<100> LL/token: -8.88939

<110> LL/token: -8.86607

<120> LL/token: -8.88399

<130> LL/token: -8.86321

<140> LL/token: -8.86408

 

0 10 mock hatter voice duchess dormouse tone back hare poor day 

1 10 project queen turtle work tm ll time long made works 

2 10 thought gryphon rabbit don thing found great white make dear 

3 10 alice began mouse put ve replied things half words curious 

4 10 gutenberg king head round looked large march added good moment 

 

<150> LL/token: -8.85694

<160> LL/token: -8.86007

<170> LL/token: -8.87053

<180> LL/token: -8.85087

<190> LL/token: -8.8591

 

0 10 mock hatter voice thing duchess dormouse tone back cat hare 

1 10 project queen time turtle ll work tm long made works 

2 10 thought gryphon rabbit don found great white dear door eyes 

3 10 alice began mouse put ve replied things half find curious 

4 10 gutenberg king head round looked large march caterpillar good moment 

 

<200> LL/token: -8.84156

 

Total time: 0 seconds

Mallet Output files written in C:\Users\Bhargavi Narayanan\Downloads ---> C:\Users\Bhargavi Narayanan\Downloads\output_state.gz , C:\Users\Bhargavi Narayanan\Downloads\output_topic_keys

 

Csv Output files written in C:\Users\Bhargavi Narayanan\Downloads\output_csv

Html Output files written in C:\Users\Bhargavi Narayanan\Downloads\output_html

 

PROCESS COMPLETE

Time :8.095

 

I did not know why it gave 4 different set of topics for the same text. The output was also not very visual that did not help at all. I thought Mallet would be different and tried that. (This was the first time I used anything like it and didn't quite know what I was doing. Only after it had given the results did I figure out that it had used four texts by mistake instead of just Alice in Wonderland. Rewriting the code would take me the weekend all over again, so I decided to just go with this) This was the result it published.  

 

Topic Number

Dirichlet Parameter

Topics

 

 

 

0

2.5

dorian henry lord gray harry basil don things cried love life hallward sibyl art face vane beauty portrait back

1

2.5

alice queen thought king turtle mock hatter gryphon rabbit mouse thing duchess ve cat dormouse tone ll march round

2

2.5

back head face long don make mind hear sat half side window silence full eye low ground business sound

3

2.5

time hand looked house left day hands voice heard turned word air words head matter give large kind walked

4

2.5

ain thought man sunday judge river mighty big kill blue finally comfort tree fool island board job minute group

5

2.5

people mr men room good young made thought lady passed women eyes mother death years bad friends night black

6

2.5

world answered thing soul told gold real afraid met curious terrible fancy friend terror moments horror married memory senses

7

2.5

boy nancy master em ma rose ve length round instant streets hastily observed countenance crackit nodded interposed officer disposed

8

2.5

life round asked felt table suddenly suppose strange ah make past answer call white age shook fact understand feeling

9

2.5

man picture dear wonderful simply woman music forget true red find painted sin artist delightful play pain blue sins

10

2.5

water find told lay heart church master children moved hold hill stopped burst candle clothes breakfast fell listened run

11

2.5

ll turning person poor home asked miss noise minute reply work raised loud stairs returned running drink nose tone

12

2.5

tom huck ll joe boys boy becky time aunt sid ve presently injun reckon school polly potter thatcher village

13

2.5

mrs great sir night manner road cold bill gate object bed dog heaven laying couldn roused violent pain iron

14

2.5

door girl place rose eyes morning good light fire open dark brought corner hat deep taking part floor pretty

15

2.5

mr oliver replied bumble man sikes gentleman jew fagin dear young cried brownlow monks noah woman lady doctor giles

16

2.5

gutenberg project work began tm great works set white moment won feet terms eyes sort agreement idea license talking 

17

2.5

don night found dead long thing stood home body day lost money days hour foundation wanted book won give

18

2.5

heart hope town street reached cry died hung forward added ten entered struck point strong arms devil bell account

19

2.5

put made don till chapter end things didn small knew good turn poor gave high ran trouble electronic rest

 

I wonder how I can create an output that is more visual - that can show maybe overlap(s) between topics. I also wonder how the tool was told to categorize certain words under specific topics. All words here are in the lower case (in both results) and there are even some spelling mistakes/fragments that must have occurred often enough to be treated as a 'category'.... so well, I am left with more questions than answers. 

Comments (0)

You don't have permission to comment on this page.