HASUG Meeting Notes: May 2011

Northeast Utilities hosted May’s HASUG meeting in Berlin, CT. Both speakers focused on detecting fraud, first from store credit card issuer GE’s perspective, and then from a property and casualty insurance claims perspective.

Usage of SAS in Credit Card Fraud Detection,” presented by GE employee Clint Rickards, began by introducing the most common types of credit card fraud and contrasting challenges faced by PLCC (store card) vs. bank card issuers. He presented the following interesting statistic: half of all credit card fraud as measured in dollar amounts occurs in only six states (CA, TX, FL, NJ, NY, and MI). He then discussed the general architecture of GE’s Risk Assessment Platform (RAP) designed to detect both real-time and post-transaction fraud, which uses the full SAS Business Intelligence suite of products: SAS/Intrnet, Enterprise Guide, Data Integration Studio, Management Console, SAS/Scalable Performance Data Server, and Platform Flow Manager/Calendar Editor. Finally, he stressed the importance of automated processes, reusable code, large jobs broken into smaller pieces to allow for easier debugging, and separation between the testing and production environment.

Next, Janine Johnson of ISO Innovative Analytics presented “Mining Text for Suspicious P&C Claims,” describing how her consulting firm developed an automated (labor intensive, but cost effective) process in Base SAS for “mining” insurance claim adjusters’ notes in an unstructured text field to get data for use in a predictive model. She introduced the process of text mining as follows: information retrieval, natural language processing, creating structured data from unstructured text, and evaluating structured outputs (classification, clustering, association, etc.). Before beginning this process, she emphasized the necessity of consulting a domain expert (in this case, someone in the P&C industry familiar with industry jargon and non-standard abbreviations). She then organized her own project into five steps of an iterative process: cleaning the text (using upcase, compress, translate, and combl functions), standardizing terms (using regular expression functions prxparse, prxposn, prxchange, as well as scan and tranwrd), identifying words associated with suspicious claims and grouping them into concepts (“concept generation”), flagging records with those suspicious phrases, and finally using proc freq with the chi squared option to evaluate lift.