Text Analysis and Readability Statistics in Base SAS

SGF 2012
Although SAS® provides a specific product for text mining (SAS Text Miner), you may be surprised how much text analysis you can readily perform using just Base SAS. The author introduces the topic with some background on widely-used readability statistics and tests in addition to a brief comparison of Hemingway and Dickens. After selecting two appropriate readability tests and texts of similar length, she describes data preparation challenges, including how to deal with punctuation, case, common abbreviations, and sentence segmentation. Using a few simple calculated macro variables, she develops a program which can be re-used to calculate readability tests on any sample input text file. Finally, she validates her SAS output using published readability statistics from sources such as Amazon and searchlit.org.

One thought on “Text Analysis and Readability Statistics in Base SAS”

Comments are closed.