| 
  • If you are citizen of an European Union member nation, you may not use this service unless you are at least 16 years old.

  • You already know Dokkio is an AI-powered assistant to organize & manage your digital files & messages. Very soon, Dokkio will support Outlook as well as One Drive. Check it out today!

View
 

Phillip Cortes - Topic Modeling Test Page

Page history last edited by Phillip Cortes 10 years, 6 months ago

For this exercise using the tool Gabe discovered, I topic modeled the text of Milton's prose tract Areopagitica. As Goldstone and Underwood put it, a topic is "neither more nor less than a pattern of co-occurring words."  The co-occurrences of words like "plato, "sects, "fool, "corruption," and "god" in topic #2, for instance, make for an intriguing word cluster.

 

 

List of Topics

1. licencing left doe good sin set labour fear chief abroad
2. men god bin world end sects fool plato civill corruption
3. inquisition prelats freedom bishops pamphlet happy sitting words bin gave
4. ye order thought made give licencers honest call freely vain
5. book learning reason judgement english licence matter small sees thoughts
6. hath church doe writing conscience write laid dangerous discipline hold
7. things knowledge make free lesse hands bring body round means
8. truth light reformation open opinion generall lay hear printing speak
9. books evill reading read law unlesse state learned sort presse
10. good great life city wise whereof publick heav master wisdome
11. people nation christian put honour twenty late tis study part
12. found thing ev farre issue invention fell whereof adam manners
13. religion house till faith som find finde year heard court
14. commons schisms long heard hope commonwealth set worth worthy roman
15. liberty lords parlament england wits young authors burnt love government
16. work true care writt wherewith person beleeve divine shew unlicenc
17. times eyes matters spirit place imprimatur anough sun ready perpetuall
18. licencer hand greatest st prohibiting labours kinde passe judicious dead
19. time author printed common autority utter grave present print condemn
20. man licencing vertue opinions gods knowing reader vision fain god

 

What's useful is the output html file that the topic modeling tool takes you to. Once you open in a separate browser window the output html file of the topic modeling exercise you implemented, you'll see the list of topics above, and each topic is a link leading you to another list of "top-ranked docs in this topic (#words in doc assigned to this topic)." Below is the link to the webpage that leads from topic #1:

 

TOPIC : licencing left doe good sin set labour fear chief abroad ...


top-ranked docs in this topic (#words in doc assigned to this topic)

         2. (139) doc 19
         3. (46) doc 36
         4. (16) doc 22
         5. (13) doc 17
         6. (12) doc 25
         7. (11) doc 27
         8. (10) doc 24
         9. (9) doc 38
         10. (9) doc 21
         11. (8) doc 32
         12. (6) doc 2
         13. (5) doc 35
         14. (5) doc 34
         15. (4) doc 20
         16. (4) doc 6
         17. (3) doc 5
         18. (2) doc 37
         19. (2) doc 33
         20. (2) doc 31
         21. (2) doc 30
         22. (2) doc 29
         23. (2) doc 26
         24. (2) doc 23
         25. (2) doc 4
         26. (2) doc 3
         27. (1) doc 28
         28. (1) doc 14
         29. (0) doc 83
         30. (0) doc 82
         31. (0) doc 81
         32. (0) doc 80
         33. (0) doc 79
         34. (0) doc 78
         35. (0) doc 77
         36. (0) doc 76
         37. (0) doc 75
         38. (0) doc 74
         39. (0) doc 73
         40. (0) doc 72
         41. (0) doc 71
         42. (0) doc 70
         43. (0) doc 69
         44. (0) doc 68
         45. (0) doc 67
         46. (0) doc 66
         47. (0) doc 65
         48. (0) doc 64
         49. (0) doc 63
         50. (0) doc 62
         51. (0) doc 61
         52. (0) doc 60
         53. (0) doc 59
         54. (0) doc 58
         55. (0) doc 57
         56. (0) doc 56
         57. (0) doc 55
         58. (0) doc 54
         59. (0) doc 53
         60. (0) doc 52
         61. (0) doc 51
         62. (0) doc 50
         63. (0) doc 49
         64. (0) doc 48
         65. (0) doc 47
         66. (0) doc 46
         67. (0) doc 45
         68. (0) doc 44
         69. (0) doc 43
         70. (0) doc 42
         71. (0) doc 41
         72. (0) doc 40
         73. (0) doc 39
         74. (0) doc 18
         75. (0) doc 16
         76. (0) doc 15
         77. (0) doc 13
         78. (0) doc 12
         79. (0) doc 11
         80. (0) doc 10
         81. (0) doc 9
         82. (0) doc 8
         83. (0) doc 7
         84. (0) doc 1

 

Then I could go to the link for #17 on the list, or doc.#5, leading me to this:

 

DOC : doc 5


If ye be thus resolv'd, as it were injury to think ye were not; I know not what should withhold me from presenting ye with fit instance wherein to shew both that love of truth which ye eminently professe, and that uprightnesse of your judgement which is not wont to be partiall to your selves; by judging over again that Order which ye have ordain'd to regulate Printing, That no Book, pamphlet, or paper shall be henceforth Printed, unlesse the same be first approv'd and licenc't by such, or at lea...

Top topics in this doc (% words in doc assigned to this topic)

         (22%) ye order thought made give licencers honest call freely vain ...
         (16%) truth light reformation open opinion generall lay hear printing speak ...
         (15%) books evill reading read law unlesse state learned sort presse ...
         (5%) book learning reason judgement english licence matter small sees thoughts ...
         (5%) inquisition prelats freedom bishops pamphlet happy sitting words bin gave ...

 

 

 

I was quite surprised to find that the modeling program generated these different percentages, which indicate how much of these words pertain to a particular topic. It is odd though that these percentages do not add up to 100. Is the program unable to account for the remaining 37%? What would do the insignificance or unaccountability of these words mean? In spite of these gaps, the significance of the counted probabilities with respect to Milton's text is still a mystery to me, yet these measurements certainly are novel ways of approaching and grappling with this already complex literary work.

 

Comments (0)

You don't have permission to comment on this page.