gray square DePauw Home     gray square DePauw Search    -    Sunday July 06, 2008
Scott Thede   
 

PARE - An Automatic Text Summarizer

My research is mainly in the area of artificial intelligence, particularly natural language processing. I have focused my efforts lately on the problem of text summarization, which is the problem of compressing the information contained in a document or documents into a smaller summary.

Our system is currently implemented using a word graph, following an algorithm similar to that used by Google for their page ranking. We forge links between words, then weight the importance of individual words based on their links to other important words. The project is currently implemented in Java.

Currently, the summarization of the documents is not perfect - my goals for the project in future years are:

  • Clean up the code for the project, making it more readable.
  • Make sure everything is implemented in Java and included in the same build.
  • Work on the interface, making it easier to use.
  • Expand the set of links available in the word graph.

As longer-term work, I would like to explore the possibility of a generative summarization system. Currently, we are employing the sentence extraction method, where the summary is produced by pulling out the more interesting sentences. A generative system produces new text and sentences as the summary, and is a much more challenging project.

 

Julian Science
and Math Center  

602 S. College Ave. 
Greencastle, IN 
46135-1969 

Phone (765) 658-4735  
Fax (765) 658-4732  

 

Comments and suggestions