Text Analysis with online Tools

Collaborative notes taken by paricipants of the DH Summer School Switzerland:
Download as PDF

Tutorial/Workshop Session 2: Susan Schreibman (@schreib100): Text Analysis with online Tools


Tools we will test
2. google Ngram viewer : http://books.google.com/ngrams (You cannot add your data to it, it will work with the scanned books only, mainly for comparing how particular words’ usage has chnged over the period of time in the scanned books. )
3. Text Arc : http://www.textarc.org/  Brad Paley Developed “for fun” by designer, independant from academia. Based on concordance (?was she talking about this soft?) Get an overview of an entire corpus of texts
5. IBM’s ManyEyes : http://www-958.ibm.com/software/analytics/manyeyes/ micro-reading of texts, charts out the words that are led by certain words
6. Voyant : http://disc.library.emory.edu/lincoln/voyant/ developed by literary scholars. shows you also where in the text the word occurs. http://voyant-tools.org
Other tools and Resources
Bookworm Culturomics
First step : chose a text 😉 We’ll each play with one single text (personal, not too short) with several different tools
— how’s that going?
.doc prepared by copy pasting the text of a PDF of a major paper in biology/medical sciences (first paper of discovery of HIV before it was called HIV) (as in Grid?)
Trying to do it with Unicode text, Wordle is not able to render it properly, just some squares and all….
Does these things work with languages written using scripts other than Roman ?
Using ManyEyes with french .txt is kind of frustating because it doesn’t understand utf-8… With Voyant it is much better. With Voyant you can add your own limits, stop words list, which is usefull.