direct
281.667.4200

training
888.742.2454

fax
281.652.5721

email
send a message
 
newsletter
Receive quarterly training schedule updates and informative articles

 
 
 
 
 
 
  BusinessIntelligenceSITE - Business Software Directory
 
 
The Modeling Agency Quarterly Newsletter
2007-Q3 Release
 

[ August 1, 2007  |  This Edition: ]

1.  Training Schedule Update:  Learn how experts mine data, and what it takes to get started.  Next up: San Diego, September 24 to 28
and Las Vegas, December 3 to 7

2.  Feature Article:  "Healthful Applications of Predictive Modeling: Fraud Management Strategies in Healthcare" by Alex Filimon, Partner, Novus Consulting Group

3.  Article:  "Advanced Statistical Modeling Trends" by Curt Hall, Senior Consultant of The Cutter Consortium

4.  Announcement:  TDWI World Conference in Orlando, Florida, October 28 - November 2, 2007

5.  Newsletter Summary

 
 

1.  TRAINING SCHEDULE UPDATE 

 

  
LEARN HOW EXPERTS MINE DATA IN SAN DIEGO OR LAS VEGAS

The next offering of The Modeling Agency's vendor-neutral, application-oriented data mining courses is scheduled for September 24 to 28 in San Diego and December 3 to 7 in Las Vegas.  Participants will enjoy a balanced, broad and non-promotional presentation of predictive modeling without restriction to a particular tool method or product.

 

Attendees will learn about data mining capabilities, limitations, best practices, strategies, methods, tools, techniques and applications while enjoying all the entertainment and seasonal weather that San Diego and Las Vegas have to offer.  Those in attendance will leave with a comprehensive binder of notes, illustrations and references to valuable resources.  Don't leave a powerful competitive advantage untapped: harness the valuable information and profits hidden in your data. 

The previous two series offerings sold out months in advance and the San Diego offering is limited to just 18 seats.  Be sure to reserve your space early.  A current status of remaining space may be viewed at TMA's training schedule page.   Submit an unofficial registration and reserve your seat today while your training request is processed.
 

CHOOSE THE TRAINING THAT'S RIGHT FOR YOU
The Modeling Agency offers three data mining courses with distinct objectives.  The courses are designed to be attended independently, or as a progressive series.  While the three levels are staged as a progression, they should not be viewed simply as "introductory, intermediate and advanced."  Refer to the table below to ensure that your experience, situation and objectives align properly with the intent, scope and depth of each offering:

Course

Focus

Scope

Geared To

Data Mining: Level I Strategy An intensive overview of strategy, best practices and case studies Project leaders,
Stakeholders,
Functional Managers
Data Mining: Level II Methods A tactical drill-down of the data mining process, methods, techniques and resources Business Analysts,
Functional Analysts,
IT Professionals
Data Mining: Level III Application A hands-on application workshop as an extension to Data Mining: Level II Practitioners,
Model-builders,
Decision Support
Developers
 
 
FULL COURSE DETAILS

The featured course schedule for this section is outdated.  For current course dates, locations, pricing and detailed outlines, please visit the main training page.

web
http://www.the-modeling-agency.com/training

email
training@the-modeling-agency.com

phone
888-742-2454 (toll free)
281-667-4200 (direct)
281-652-5721 (fax)
 

 
Courses May Be Delivered At Your Site

Call (888) 742-2454 or send an email inquiry to receive a value-based
spreadsheet quotation for training at your site.


Government Buyers
TMA is a CCR Registered Veteran-Owned Small Business and accepts EFT.
 

 

 

2.  FEATURE ARTICLE
 

Healthful Applications of Predictive Modeling
Fraud Management Strategies in Healthcare

by
Alex Filimon
Novus Consulting Group

 

INTRODUCTION
Healthcare insurance fraud could be defined as a deliberate act of deceiving, concealing, or misrepresenting information that results in healthcare benefits being paid to an individual or a group.  Some of the most common examples are billing for services not rendered, providing unnecessary services, up-coding for services provided and establishing fictitious providers and billing agents.  Although an exact dollar amount cannot not be determined for loss due to fraud, it is estimated that healthcare fraud represents around $90 to $180 billion in North America annually.

Insurance fraud is often viewed as white collar crime by the public and comes in different shapes and sizes.  In fact, research shows that a great percentage of people believe that some forms of insurance fraud are acceptable.  Surveys reveal that one in every four adults believes it is acceptable to exaggerate a claim because of high premiums being paid.  People therefore feel that insurance fraud is a victimless crime and try to cheat insurance companies.  The complexity of the health system as a whole and the utilization patterns among the service users create a challenging environment for fraud detection specialists.  Unfortunately, out of the total amount of dollars lost due to healthcare fraud insurance, only 10% is discovered and out of that percentage, only 10% is ever recovered. 

The healthcare sector is facing both technical and business issues and requires finite models that detect fraud and develop effective business strategies to minimize loss. In practice, no one technology can be a solution for detecting fraud, but perhaps an intelligent combination of various technologies could be an efficient method to identifying fraud.

Before talking about the different technologies available for healthcare providers to prevent fraud, we will take a look at overall industry challenges.
 

INDUSTRY CHALLENGES
One of the main challenges faced by healthcare providers is that most of their efforts are retrospective.  That is, they try to recover their money after the claims have been paid out.  Unlike credit card fraud, for example, the user is not a victim and s(he) does not have any interest in working with the healthcare provider to find out who is using the card fraudulently.  Therefore, in most cases, insurance providers have to pay the claim first and then chase the suspicious transactions.

A second challenge faced by healthcare providers is predictive modeling accuracy -- especially the number of percentage of false positives.  By rejecting some claims that are valid, the insurance companies can cause a lot of damage to their reputation and create a very difficult (financial) situation for people in need of emergency relief. 

Another challenge faced by insurance providers is the lack of resources, both from a human and technological point of view.  While human resources are scarce in this field, in order to build better predictive models to prevent fraudulent transactions, insurance providers need to have access to more data with fraudulent transactions.  It is ironic that, in order to better prevent fraud, modelers need more fraud to happen.  Resources that have to be employed for detecting and preventing fraud are also affected by the accuracy of the models.  If the model has false positives, that means that human resources are wasted by chasing people that haven’t committed fraud.  On the other hand, if the models have false negatives, it means that the health insurance provider is missing opportunities to chase people who have committed fraud. 

With a clear understanding of some of the major challenges we can move on to discuss how insurance providers are addressing them.  Currently, insurers spend most of their resources on human capital intensive techniques such as fraud awareness training, manual red flag systems, and external database searches.  According to industry insiders less than 25% of the efforts are directed towards automated red flags, rules engines and data mining.  Unfortunately, most of the current tools are opportunistic.  This means that the insurance providers react when a person calls a fraud hotline or when different insurance providers share information.  More advanced tools are claim reports and drill-down analysis using OLAP tools.  Again, these tools require human operators to browse through mountains of data in the search of potentially fraudulent claims. 
 

RULE-BASED TOOLS
Rule-based tools have gained in popularity in past years with advancements in computing power and analytical software.  These tools define fraud indicators based on past data and create thresholds on each indicator.  In this way, if a claim goes over a threshold, it is red-flagged and a more thorough review is started.  Rules are often used to identify suspicious claims; for instance, a simple rule could be “leg injuries are more likely to be fraudulent than neck injuries”.  Therefore leg injuries are identified as high risk fraud in comparison to neck injuries.  In order to achieve further validation or refinement other rules can be included such as “age of 45 and over”.  A rule based approach is similar to multiple “if-then” statements.

The biggest issue with the rule-based system is that, when applying a lot of rules to the data, it becomes rather cumbersome and complicated.  The maintenance of a rule-based system is also expensive.  Rule-based systems increase the adjuster's efficiency by creating automated decisions, and improve consistency by applying the same rules to incoming claims.

Rule-like technology is also more effective when used for back-end processing; that is, to review new claims.  It helps to identify suspicious claims from previously known fraudulent activities.  It is often difficult to catch first time offenders; however this rule helps gather evidence once a suspect is identified.  For successful usage the user must be clear of what is to be discovered.  This process is helpful in making background checks with police records.  Link analysis helps matching cases that have links between claims data and known fraud activities.  When a relevant match is obtained, the case can be forwarded to review process. 

These systems are easy to implement and help to identify repeat offenders.  The disadvantages are that a relevant match might not indicate fraud and it is difficult to track new offenders.
 

PREDICTIVE TOOLS
The ultimate goal of using predictive tools is to predict the right claim for the right investigation at the right time.  The first step in the predictive modeling process is to process the huge database which contains historical claims, first-reports, medical payments, and the like.  By processing the database the behavioral characteristics that help identifying fraud are studied.  Behavioral characteristics play a vital role in identifying suspect claims.  The second step is to take these features and combine them to provide a fraud risk assessment.  This model uses claim features to produce a fraud- risk assessment for each given claim.  Model-generated scores help focus on claims that require further review and minimize time verifying claims that are legitimate.

In my practice, I have used both supervised and unsupervised data mining techniques to target fraudulent healthcare insurance transactions.  People often ask why data mining tools are better than the regular reports or OLAP tools.  In my opinion, one major difference is that the data mining algorithms (with small variations, depending on the algorithm), show us not only if a claim is potentially fraudulent, but also why.   It is important to understand what the factors are that predict fraudulent claims.

In the next section of the article, we would like to talk more about each major category of data mining techniques and present some real-life examples. 
 

UNSUPERVISED DATA MINING TECHNIQUES
Among the techniques under this category, we used clustering, association rules and sequence pattern mining.  Clustering is a collection of data objects.  A good clustering algorithm has two main characteristics: high-intra-class similarity and low inter-class similarity.  For one of our customers, we used a dataset that included medical procedures and drug prescription for detecting fraudulent transactions.  The following are the type of results we were able to provide to our client:

  • High occurrence of unnecessary medical procedures in a certain geographical area

  • Extremely high volume of drug prescriptions in a short period of time

  • Above average usage of expensive and non-necessary drugs from a specific pharmaceutical company

 
As a secondary unsupervised technique we use association rules.  This technique is a good choice for finding patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases and other information repositories.  We found the association rules extremely useful in the following situations:

  • When you don’t know a lot about the data

  • The analysts are new at conducting analysis

  • You need a starting point for your analysis

 
Related to the association rules technique is the sequential pattern mining technique that looks into the sequence of transactions or events in a database.  This method has a timestamp associated to each treatment or claim.  This method is extremely useful in detecting:

  • Errors in medical treatment

  • Unusual sequences of health insurance claims

 
With another client, we worked on finding the sequence of proper dental treatments.  The dataset at our disposal contained all dental procedures done in the past seven years.  With the help of sequential pattern mining we were able to discover many inexplicable procedures from a medical perspective.  This brings up another important point: inadequate domain knowledge may cause the successful application of the algorithms to fall short of their useful objectives.  Also, a downside of the sequential pattern mining and association rules is that many of the rules are trivial; hence, an analyst has to browse all rules generated, eliminate the trivial ones and closely inspect the anomalies.
 

SUPERVISED LEARNING TECHNIQUES
These tools are mostly used when we know what we are looking for or what we want to predict.  Under supervised methods, we have the descriptive (or rule-based) techniques such as classification decision trees (which answer business questions such as “What are the characteristics of fraudulent claims?) and predictive techniques such as logistic regression and neural networks (they would answer business questions such as “Who is more likely to submit a fraudulent claim?”).  Of course, decision trees can be also used as predictive tools, but for the purpose of this article we will not cover that subject. 

Predictive models have usually two major stages: the model is built with the existing data and the following step is to apply the model to new data.  Classification models use specially designed algorithms and develop a description and label classes in a database derived from the features present in the training data.  These models have two stages as well: the model is built, and then the rules are applied to new data.  Building these types of models requires an excellent knowledge of the business.  However, once the models have been built and validated, the scoring process is quite simple and can be invoked automatically when a new transaction is recorded.

With another client, a dataset was used that included auto insurance policies for predicting fraudulent claims.  The tools that we used were decision trees and logistic regression.  Some of the deliverables of the projects were:

  • Effective rules for identifying fraudulent claims

  • Acceleration of the claims settlement process

  • The client was able to make “pay/no pay” decisions within hours

  • Dramatically reduced percentage of false positives and false negatives

  • Real-time scoring of new claims

 
SUMMARY
We would like to emphasize some important points regarding the data mining tools used for detecting and preventing insurance claims fraud.  In order to conduct a successful data mining project, you need accurate data, world-class data mining tools that offer scalable techniques and an extremely good understanding of the business.  Some of the challenges with these types of projects (and not only in this industry) are the lack of in-house expertise (very hard to find analysts that understand both the business and the algorithms), difficulties in proving the ROI for data mining initiatives, and the number of false positives and negatives -- with their influence on the business.  However, once the models are built, the company can save substantial time and money, and customer satisfaction will increase (due to more rapid processing of valid claims).

We also have a final recommendation: build your own in-house team and keep them happy.  As a university instructor and consultant, I see the scarcity of these types of skills every day as well as the desire of my clients to hire more people with analytical skills.  While we have better computers and software, we have to remember that people make the final decision.

 
ABOUT THE AUTHOR
Alex Filimon
is a Partner of the Novus Consulting Group, a full service management consulting company located in Halifax, Nova Scotia, Canada.  He has also been teaching graduate level courses in Knowledge Discovery in Databases and Marketing at Dalhousie and St. Mary’s universities in Halifax.  His extensive client list (including banks, insurance companies, retailers, bio-tech companies, telcos, and not-for-profit organizations) and strong academic background helped him to become the only SAS Institute partner in Atlantic Canada.  Visit Novus Consulting Group’s web site to view more about projects or training sessions, or contact Alex at +1 (902) 489-2665 or afilimon@novusconsulting.com
 

All Rights Reserved by Novus Consulting Group and The Modeling Agency Copyright © 2007


 

 

3.  ARTICLE

 
Advanced Statistical Modeling Trends


by
Curt Hall
Senior Consultant
Cutter Consortium
 

Although companies continue to express considerable interest in using neural networks and other advanced statistical modeling techniques for data mining and other BI applications, most organizations' BI and analytic practices rely primarily on standard reporting and multidimensional (OLAP) analysis methods. This is the finding of a March 2007 Cutter Consortium survey designed to assess the BI practices of 119 end-userorganizations (based worldwide).

Specifically, when asked the question, "Which of the following best characterizes your organization's current BI and analytic practices?" survey participants responded as follows:

  • 72% said: "Our BI practices primarily involve tdhe use of reporting and multidimensional analysis (i.e., OLAP) tools"

  • 5% said: "Our BI practices include the use of neural networks and other advanced statistical modeling/analytic techniques"

  • 5% said: "Both of the above"

  • 18% said: "Don't know"

What these findings indicate to me is that the majority of organizations' BI practices have not progressed beyond the use of your basic "vanilla" BI techniques. I do not mean to suggest that these techniques are somehow obsolete. Rather, I'm just saying that the evolution of BI -- from basic reporting and multidimensional analysis to the greater application of more advanced analytics -- is proceeding slowly. The reason for this is that applying neural nets and other advanced statistical modeling techniques is difficult, and most end-user organizations simply do not have regular staff familiar in their development. For the most part, only larger companies can afford to employ people well versed in statistical analysis for such projects. In short, it's one thing to develop neural net and other advanced statistical models for research and pilot projects.  However, it's quite another to develop models that you feel confident enough with to deploy in a production setting. 

The bottom line is that it appears that the use of advanced statistical modeling techniques for BI applications by end-user organizations will continue to proceed at a fairly limited pace for the foreseeable future.

Is your organization using advanced statistical modeling techniques for BI applications? I'd like to hear why or why not. Send your comments to chall@cutter.com or call me at +1 510 848 7417.

 
ABOUT THE AUTHOR
This excerpt, by Cutter Consortium Consultant Curt Hall, originated from Cutter's Business Intelligence Advisory Service.  Through this subscription-based service, you are assured of expert analyses of the latest business intelligence strategies, products, and technologies. For more information or to find out how you can become a client, please visit Cutter Consortium's web site, or contact Dennis Crowley at + 1 781-641-5125 or e-mail
dcrowley@cutter.com.

Published with permission from Cutter ConsortiumCopyright © 2007.
 
 


4.
  ANNOUNCEMENT
 

TDWI World Conference
The Premier Event for Business Intelligence
and Data Warehousing Education

October 28 - November 2, 2007
Orlando, Florida

 
CONFERENCE HIGHLIGHTS
The TDWI World Conference in Orlando brings together leading industry visionaries to deliver a unique program of cutting-edge education, best practices, one-on-one consulting, peer networking, business intelligence certification, and product demos. From business intelligence fundamentals to business analytics, TDWI’s program of more than 50 full- and half-day courses offers something for your entire team.
 


CONFERENCE BENEFITS
 

  • Interact with the most knowledgeable and experienced instructors
    in the industry
  • Gain practical knowledge that you can apply immediately
  • Bridge the knowledge and communication gaps between business and IT
  • Network and share best practices with your peers

CONFERENCE REGISTRATION

For more information, and to register for the
TDWI World Conference in Orlando, please visit:

TDWI World Conference in San Diego.
 

Produced with permission from The Data Warehousing Institute Copyright © 2007
   


 
5.  NEWSLETTER SUMMARY
 

The Modeling Agency newsletter is a quarterly publication which provides course announcements, training schedule updates and informative articles.  This newsletter may be shared in its entirety and subscriptions are free. For additional information on TMA's training, consulting services and solutions, follow corresponding links at the top of this page.

This newsletter is shared with those who have activated a subscription, or have supplied their Email address to The Modeling Agency when requesting product information. If you wish not to receive future releases, simply send an empty email with cancel as he subject from the account which you were subscribed.

    address
One Oxford Centre
301 Grant St, Ste 4300
Pittsburgh, PA 15219 USA
 
phone: 281.667.4200
fax: 281.652.5721
training: 888.742.2454
Copyright © 2000 - 2008 The Modeling Agency. All rights reserved.