Hammond ’14 and Smith ’14 Create New Text Analysis Software
Vast Possibilities for Application
By Patrick Bedard '14
Contact: Holly Foster 315-859-4068
June 21, 2012
Imagine being able to select any written document on a computer and automatically know where the writer struggled, which sections the writer breezed through, and if the writer had plagiarized – all without reading a single word of the document itself. The idea seems simple enough to conceive with the use of text extracting programs and subsequent algorithms, but, surprisingly, no software maker has produced such a product.
Sarah Hammond ’14 and Justin Smith ’14, both computer science and math dual concentrators, have set out to fill this void and create their own tool for broad spectrum text analysis. They’ll be designing their program under the guidance of Associate Professor of Computer Science Mark Bailey with the specific intention of extracting and analyzing text from a computer programming terminal. Modifying their program to extract and analyze text from other programs, such as Microsoft Word, they say, is the next logical step.
Within the context of analyzing text from a programming terminal, Hammond and Smith will be examining variables such as time spent on each line of the program, modifications and edits made to each line and “hot lines,” or areas where the programmer had trouble. The students and Professor Bailey believe that this program will be highly useful in teaching programming languages to students and hope to use it in the classroom in the coming semester.
The students also hope that their program can be used in other classrooms across Hamilton’s campus so that professors can see which parts of an assignment cause students to get “writer’s block” or if any sections of a paper were copied and pasted in. Of course, this program’s usefulness is not limited to education – software companies will be able to see what sections their programmers struggle with, businesses can identify the most efficiently and inefficiently written sections of reports, and editors and publishing houses could even get metrics on an author’s writing style and compare theirs to other authors in the field.
While the possibilities for Hammond and Smith’s project are far reaching, the coding process itself is time consuming and sometimes slow, especially for only two programmers. The two had to first learn a new programming language and familiarize themselves with the software with which their program will interact. Despite these hurdles, Smith remarked that “On any given day there are a number of different tracks I can take that will bring me closer to finishing. It is fun getting stuck and working through countless unanticipated problems.” The key to moving forward, according to Smith, is never to focus on one problem for too long and to change gears when the two get stuck.
Hammond and Smith first had to develop a session logger that would analyze the data (in this case the lines of coding) that users input. They then moved on to select an output format to extract the various variables they were looking to record. The two are currently in the process of developing tools for analyzing their various outputs. They are testing their work by writing their program within the very session logger they have developed. The process of modifying the program to function with different text editors likely will not occur until later on in the year or during future summer research projects. The students’ ultimate goal, according to Hammond, is to “gain insight into the program development process.”
Both Hammond and Smith are avid musicians, and Hammond is a member of the Hamilton College orchestra and is in a string trio. The two both hope to go onto graduate school in computer science and eventually work in the software development field.
Sarah Hammond is a graduate of Saratoga Springs High School (N.Y.) and Justin Smith is a graduate of the Chautauqua Lake Central School (N.Y.)