| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Dalia Bolotnikov - Topic Modeling Test Page

Page history last edited by Dalia Bolotnikov 10 years, 5 months ago

One of the input files I used when experimenting with the topic modeling tool that Gabe found was Franz Kafka's The Trial.  Here are the results:

 

Coded LDA: 8 topics, 3 topic bits, 111 topic mask

max tokens: 8

total tokens: 20815

<10> LL/token: -8.74771

<20> LL/token: -8.55507

<30> LL/token: -8.51039

<40> LL/token: -8.46275

 

0 6.25 door room court things director case head time block wanted 

1 6.25 looked lawyer judge leni bed man priest business hands called 

2 6.25 time hand make lawyer front window wanted grubach waiting day 

3 6.25 made court back sort turned mrs sitting find eyes understand 

4 6.25 ve don long miss rstner told give room didn hard 

5 6.25 asked trial uncle stood woman bank side work close important 

6 6.25 painter man good businessman round thought thing small don doorkeeper 

7 6.25 ll people office end light show words lawyers put back 

 

<50> LL/token: -8.44575

<60> LL/token: -8.42297

<70> LL/token: -8.39155

<80> LL/token: -8.39692

<90> LL/token: -8.39611

 

0 6.25 door court wanted room things director case head block waiting 

1 6.25 lawyer looked room judge leni bed priest business hands called 

2 6.25 time hand make front put window doorkeeper day words place 

3 6.25 made sort mrs turned find eyes understand arrest began hold 

4 6.25 don ve long miss give rstner told didn hard face 

5 6.25 asked trial uncle stood woman bank side important close won 

6 6.25 man painter thought good businessman round attention free usher thing 

7 6.25 back ll people work office show light lawyers end immediately 

 

<100> LL/token: -8.38281

<110> LL/token: -8.37531

<120> LL/token: -8.40103

<130> LL/token: -8.39483

<140> LL/token: -8.40828

 

0 6.25 door court room things case director head block sat opened 

1 6.25 lawyer room judge leni looked bed business hands called left 

2 6.25 time hand front doorkeeper put window make waiting grubach place 

3 6.25 made good turned sort mrs find arrest began young hold 

4 6.25 don ve long wanted miss give rstner told didn hard 

5 6.25 asked trial uncle stood woman bank open side important deputy 

6 6.25 man painter thought round looked leave answer attention usher thing 

7 6.25 back ll people work office make end light lawyers understand 

 

<150> LL/token: -8.4027

<160> LL/token: -8.38665

<170> LL/token: -8.38374

<180> LL/token: -8.38144

<190> LL/token: -8.37576

 

0 6.25 room door court director case head things block told sat 

1 6.25 lawyer judge leni bed business hands called sitting gentlemen documents 

2 6.25 time hand front doorkeeper put window waiting grubach show words 

3 6.25 made good priest sort turned eyes lawyer began arrest hold 

4 6.25 don ve long wanted miss give rstner mrs didn hard 

5 6.25 asked trial uncle stood woman bank open important close deputy 

6 6.25 man painter looked thought round businessman leave thing answer usher 

7 6.25 back ll make people end work light lot understand hear 

 

In DFR-Browser, I looked at the document view of two essays that discuss The Trial, one published in 1960, the other in 1982:

 


 

Most of the changes in topic/vocabulary are due to each paper having a different focus, but I wonder if any of them could be attributed to a more general change in focus in Kafka scholarship?  One of the differences that especially stands out to me is the number of references to particular people in the 1960 essay, as opposed to only two names in the list for the 1982 essay (Kafka and Freud, both in topic 30).  Another change to consider might be topic 58 (text language reading work way form relation discourse subject power another point writing makes place), which is completely absent from the list for the earlier essay but is at the top of the list for the later essay.  The only two topics that stayed in the top six are 41 (king political war law england state history court great made during english power queen men) and 38 (novel story narrative fiction reader novels characters book narrator character chapter stories author real events), which makes sense.

 

I also enjoyed choosing words from the all-words view, going to the topic view for topics in which the word is highly ranked, and looking through the documents where the topic is at its largest proportion to see how the results differ from my expectations.  For example, the topic listed in the word view of "darkness" is 62 (light old man dead eyes night face heart day earth sun white dark head saw).  In the topic view, most of the essay titles are not surprising (such as "The Eve of St. Agnes and the Mysteries of Udolpho"), but others do not seem as obviously connected to the word and topic (such as "The Power of Women's Hair in the Victorian Imagination") -- which makes me more interested in reading those papers and trying to understand the seemingly strange connection!

Comments (0)

You don't have permission to comment on this page.