Data mining seems like a valuable source, especially for statistical analyses of text, for historiography, and for detecting patterns. It seems that the field as presently configured favors modern and contemporary projects, where sources are most likely to be in digital format. (The programs and strategies for dealing with digitized print sources that were in old fonts seem to be time consuming and fraught with problems.)
What are art historians who deal with older eras to do if they wish to include sources that are not in digital form? Some issues are:
- We may have primary sources still in primary form, where paleography is still an issue, or sources that are not in digital form
- In some cases we do not even have the requisite several hundred sources that would yield a corpus to analyze
- If we have sources in more than one language, synonyms (not proper nouns) may have subtly different meanings that could skew results
However, I do see potential in using data mining to analyze student work, including exams, papers, and other written work. At some colleges there is an option for exams to be done in testing centers on computers and more colleges are moving toward tablets for all students to,use for reading and assigned work.Some possible data mining projects include:
- Analyzing essay questions – if students were asked to discuss one (or more) art works, artists, or scholars, what are the proportions for who/what was mentioned? This could provide some insight into what sources students rely on (text book, outside reading, info mentioned in class) or what interests them.