A Conceptual Foundation for the
Formulation
of Business Predictive Analytics Projects
by Thomas A. "Tony" Rathburn
Senior Consultant, The Modeling Agency
PREFACE
The author approaches building
models using Predictive Analytics as a strategy for playing a ‘game’. The
game is defined by the business unit. All rules, strategies, constraints,
and score keeping are defined by those playing the game. There is no
‘right’ answer. There are only decisions and impacts.
Predictive Analytics is an approach for developing quantitative models,
based on historical data, for the purpose of making ‘better’ decisions in a
business environment.
As we
are dealing with human behavior, there is no perfect model to be developed.
Instead, we are challenged to define groups of people who have a
probabilistic expectation of displaying/not-displaying a behavior.
This
article explores the development of the major approaches to these types of
problems as a foundation for conceptualizing Predictive Analytics problems
effectively.
FRAMING BUSINESS
PREDICTIVE ANALYTICS AS A GAME
A game is played by two or more
participants for the purpose of winning a prize. That prize can be
recognition, property, titles or any other thing of value to the players.
Most typically the prize is money.
Games
are played by a set of rules, within a set of boundaries, and have an
established way of keeping score… all mutually agreed upon by the players.
Predictive Analytics is the goal directed development of mathematical
models, based on historical data, to support decision making. As such, it
is especially well suited to the discovery of enhanced strategies for
playing the game we call “business”.
This
discussion specifically addresses issues related to the formulation of
projects involving decision making in a business environment. None of the
insights contained in this discussion are “right”. Rather, the author
describes a best-practices approach for utilizing predictive analytics in a
business environment. This approach can be generalized successfully to
virtually all business projects.
MATHEMATICAL MODELS AND
KEEPING SCORE
A mathematical model consists of
a set of variables, associated weights and operators to describe the
relationships between the variables. Mathematical models are developed for
the purpose of estimating the value of a variable of interest.
In and
of themselves, mathematical models have few interesting qualities. It is
only when we put them in context that they have the potential to have value.
Does
the expression y = a + b(w1) – c(w2)2
seem particularly exciting or provocative? Probably not. However, if I
told you that this was an exceptionally effective model for managing an
aspect of your personal finances, explained the variables involved, and the
rules for utilizing the model, you are much more likely to find this bit of
math of interest.
It is
only in the context of the game that we find mathematical models of interest
and useful. Their advantage comes from providing a reliable way of
evaluating a situation. By adding a set of rules for utilizing the model,
we have a strategy for consistently making decisions.
That
is not to say that we have a good way of making decisions… only a reliable
and consistent way of making decisions.
The
usefulness of mathematical models must be evaluated in the context of the
game we are playing. Do they successfully fall within the constraints of
the rules and boundaries we have defined for playing the game? And, more
importantly, do they provide a strategy for playing the game that gives us a
higher level of performance -- or more success -- than we are currently
achieving?
Every
mathematical model, along with its associated usage rules, can be evaluated
in this way. It is important to note that every time we change the mix of
our variables in our models, by either adding or deleting a variable, we
have changed the level of performance we will achieve by using the model to
play our game. The same can be said for changing any of the weights or
operators in our model, and for changing any rule associated with the
implementation of the model.
Our
attraction in predictive analytics, therefore, is not to any particular
mathematical model, but to the process of searching for a combination of
variables, weights, operators, and rules for usage that improves the
decision making in the game we are playing. That improvement is evidenced
by an increase in the “score” we achieve by using the model versus using our
existing decision making approach.
USES OF MATHEMATICAL MODELS
In their simplest use, we
describe a particular set of circumstances by providing a model with a set
of values for each of its respective variables and complete the calculations
necessary to compute the expected value for our target outcome.
Unfortunately, this expected value rarely has any inherent value on its own.
Only when we combine a set of rules for the utilization of the model with
this outcome do we now have a system capable of assisting us with decision
making.
With
the development of the model, we now have a general approach to evaluating a
wide range of possibilities by varying the value of one or more of the
model’s component variables. By computing a reasonably large number of
these scenarios, and connecting the derived points, we can see the
visualization of a line in one dimensional output space, a plane in two
dimensional space, or a hyper-plane in n-dimensional space.
These
structures, whether simple or complex, have only two possible uses. We can
use the structure as a boundary between categories in a classification
problem, or we can determine our location on the structure itself in a
forecasting problem.
In our
thinking about model development -- which is nothing more than the processes
for determining the component variables, weights and operators -- we should
be strongly influenced by how to conceptualize the way we intend to use the
output from the model. Is our intention to do classification, or
forecasting?
In
general, it is much easier to develop models for classification than it is
for forecasting. This is easily demonstrated simply by considering the
precision requirements for each type of problem.
If we
assume any n-dimensional space that is populated with a pattern of of X’s
and O’s, assume that our game is to find a way to build a model that defines
a boundary between the X-subspace and the O-subspace. Our score keeping
metric is ‘percent correct classification’.
In
general, our search is for a way of building that boundary that will improve
our “score”. However, it is important to note that there are a large number
of model-generated boundaries that achieve the same score. Each of these
boundaries has its own combination shape, slope and intercept.
No one
of these models is better, or worse, than any of the others so long as it
achieves the same long-term score. The advantage of adopting a
classification approach to a problem of interest is this inherent
availability of multiple models. There are no “right” models. We simply
need to find one of the many models that out performs the model we are
currently using.
In
contrast, if we approach our game with the strategy of generating a
forecast, our search for a model with a higher level of performance becomes
much more difficult. In forecasting, our precision requirements are much
higher. We are seeking an accurate value… a location on a continuous
plane. This requires accurate construction of the surface of interest.
Then we must determine where we are on that surface. This is always a much
more difficult approach. The question becomes whether that level of
precision is necessary, or is it sufficient to use a classification approach
to the problem.
PHYSICAL SYSTEMS VERSUS
HUMAN BEHAVIORAL MODELING
The use of mathematics and
computers has generated a mindset that leads us to expect precision and
explainability in all aspects of our decision making.
Just
as we addressed, above, that there are only two ways in which a model may be
utilized: classification, and forecasting. And there are only two types of
problems to which these models may be applied: physical systems, and
behavioral systems.
In the
development of a model for physical systems, our focus in generally on
finding a way to describe the way the system: the process. The key
characteristic of this type of model development is that there is a right
answer. A physical system works in a particular way. It may be simple, or
very complex, but it is governed by a set of characteristics, laws and
drivers that function in a consistent, reliable manner.
Human
behavior, on the other hand, is inherently messy. Individuals are
inconsistent and unreliable in the patterns of behavior they display. Two
seemingly identical individuals, based on the values of the variables we
have available, may display very different behaviors. In fact, the same
individual, in even slightly different time frames, is likely to display
differing behavior patterns.
Recognition of this characteristic has a significant impact on the model
development process. In the development of a physical systems model, we are
searching for a “right” answer. In a behavioral model, the best we can hope
to achieve is the development of an accurate probabilistic expectation that
a behavior will be displayed by a defined group of people.
It is
important to note that in a physical system, we often consider the variables
in a correct model to be drivers -- the attributes that describe and control
the process. However, in a behavioral model, no such drivers exist.
There
is no causality in the variables contained in a behavioral model. Rather,
the variables in a behavioral model are simply a set of attributes for
describing a group. By considering these attributes, and their relationship
to each other, we have defined a group that displays a particular behavior
of interest at a measurable rate.
In
both the development of our behavioral models, and in their use, it is
important to keep in mind that we can not specifically anticipate the
behavior of any specific member of the group. Rather, we are limited to
having an expected probability of seeing the behavior, based on the
individual’s status as a member of the group.
Additionally, once a group has been defined for a particular behavior of
interest, and the expected probability of the behavior of interest has been
determined, our modeling efforts shift to determining a degree of belief as
to whether or not an individual is, or is not a member of the group.
We may
be interested in determining whether or not an individual is part of a group
we have called ‘Respondents’. From our analysis, we have determined that we
can describe Respondents based on the relationship between a set of
variables, weights and mathematical operators. For the group described in
this way, we have determined that this group displays their behavior of
interest, responding, at a rate of 2%.
First,
the variables in our model are not the attributes that cause any member of
the group to display this behavior. They are simply a way of describing the
group.
Second, once the group is described, our attention shifts to determining
whether or not an individual is a member of the group thus described. It is
important to keep in mind that our models are not determining whether or not
the person will display the behavior of interest (responding in this case).
Rather, we are determining our belief that the individual is a member of the
group. If we do, in fact, decide that a specific individual is a member of
the group, we may only anticipate the behavior of interest being displayed
at the expected probability of the group. In other words, our model is
determining whether or not an individual is a member of the group that
displays the behavior of responding at a 2% response rate.
DISTINGUISHING BETWEEN STATISTICS AND
PREDICTIVE ANALYTICS
Much has been said, and
written, about the apparently competitive nature of statistics and
predictive analytics. This is unfortunate, since used appropriately, these
are highly complimentary fields of application.
Statistics and predictive analytics use many of the same techniques. The
distinction between the two fields is best considered not by the techniques
utilized, but rather by the purpose for using them.
Statistics tends to focus on the description of a population. Most often,
this description is general. It describes the central tendency, and a
measure of spread. The most common metrics employed are mean and standard
deviation.
Before
any work can be effectively completed in predictive analytics, we must
understand the general description of the group we are working with. In
fact, our work in predictive analytics cannot come to a successful result
without a reasonably accurate estimate of the general behavior of our
population.
Predictive analytics is an extension of traditional statistics. Our work
lies in the belief that not all members of our group display their behavior
of interest at the same rate. Our belief is in the existence of definable
sub-groups who display the behavior of interest at a rate different from the
group as a whole. And our work is focused on identifying the description of
those sub-groups and assigning individuals appropriately.
In our
efforts to achieve a higher score in the game we are playing, we typically
look for those sub-groups who have the greatest impact on our performance.
This performance impact may be either positive, or negative. Once
successfully identified, these sub-groups will receive different treatment
than is administered to the group in general.
While
this may seem simplistic on the surface, this is the key strategy to the
successful implementation of predictive analytics for behavior modeling in a
business environment.
By
completing our statistical analysis competently, we have achieved the
ability to treat the group as a whole, in a particular way. By accurately
identifying sub-groups, deriving a way to assign at least some individuals
to these sub-groups under a particular set of circumstances, and treating
the individuals assigned to the sub-groups in a manner different from
members of the general group, we have an approach that allows us to vary our
game strategy and achieve improved performance.
The
use of mathematical techniques for this purpose is simply a way of
implementing the identification of the sub-groups. Unfortunately, far too
many practitioners approach predictive analytics with the perspective that
the importance the techniques is what is critical.
EVALUATING PERFORMANCE IN PREDICTIVE
ANALYTICS MODELS
From a practical perspective, it
is not uncommon to spend close to 50% of your calendar time on a project
developing and refining a project definition. The performance of a
predictive analytics model is evaluated on the basis of the user’s
particular needs in the decision environment in which model will be
utilized. There is no “proper” set of performance metrics.
You,
as the decision maker -- your group, your organization -- are the only
people qualified to determine what your priorities are.
I
doubt anyone in a business environment has ever received a raise, bonus or
promotion based on R2.(the statistical “coefficient of
determination” or overall model accuracy). In selecting which of your
models is most appropriate, it is critical that they be evaluated primarily
on the basis of your business objectives.
These
metrics are based on enhancing benefits or reducing negative aspects of your
process. Often, they are expressed as increasing profit or reducing
expenses.
Your
model, and its associated rules for usage, must also take into consideration
the assumptions and constraints of both your organization, and the
regulatory environment in which you function.
It is
critical to understand that these are the rules by which you are playing
your game, and how you keep score. Only those issues that relate to your
true objectives should be used in evaluating your models. All analytic
issues are secondary.
It is
also critical to understand that these rules and metrics must be specified
in your project definition, prior to the beginning of your model development
effort. All work subsequent to your project definition is done for the
purpose of enhancing your strategy to achieving higher scores within the
constraints of the rules laid out at the inception of the game.
Failure to do this is the most common reason predictive analytics projects
fail. With today’s advanced modeling tools, practitioners develop very good
models that either can’t be implemented in their business environment, or
don’t perform well based in the real world.
Take
the time to develop your project definition in detail. It should include
your performance metrics, assumptions and constraints in which the model
will operate in its live environment, current baseline levels of
performance, and a listing of the resources and skill sets required to
build, implement and use the model.
This
is your blue print for the work to be completed. Just as we wouldn’t
consider breaking ground on the construction of a new building without a set
of architectural plans, you shouldn’t begin your predictive analytics
project without a project definition that lays out in detail what it is you
are doing, how you will do it, and what you are attempting to achieve.
It is
just as important that everyone who will be impacted, from the project
sponsor, to the final users, to the functional area that the decision makers
work in, to IT, all are in agreement with the plan before it commences. To
do otherwise virtually guarantees that you will have significant issues as
to the viability of the project at some point in the future – regardless of
the resulting model’s accuracy.
THE ROLE OF MATHEMATICAL TECHNIQUES IN PREDICTIVE ANALYTICS
There are no mathematical techniques that are better, or worse, than others
in general. Our mathematical tools are simply algorithms for determining
the variables, weights and mathematical operators that comprise our model.
That
is not to say that different model development techniques do not have their
own characteristics, their own appropriate and inappropriate uses, and their
own strengths and weaknesses in particular uses. Just as a hammer, a saw,
and a screwdriver are all useful tools with attributes and particular
applications in physical construction, our mathematical toolbox is comprised
of tools with attributes and particular uses in model construction.
Linear
regression is a commonly known tool that assumes normally distributed data,
linear relationships, stable means and standard deviations, orthogonal
inputs and is best used for forecasting problems. There is noting
inherently wrong with linear regression, anymore than there is something
inherently wrong with a hammer. As with a hammer, and contrary to common
usage, linear regression is not the best tool for every job.
Virtually none of our real-world projects fully meet the assumptions and
constraints of linear regression. In business projects, our data is almost
never normally distributed. Many of our relationships have a non-linear
component. Behaviors change over time. Therefore, our means and standard
deviations are not stable and adjust with the change in behavior. While we
can develop orthogonal input variables in our models, doing so requires that
we disregard additional variables with additional information content that
may allow us to build models that perform better on our problem. And almost
all of our problems can be constructed as classification problems rather
than forecasting problems.
This
does not make linear regression a bad tool: it’s a highly efficient method
in the right applications. It simply means that it does not match well with
most real-world problems that we encounter in the business environment. It
also means that there may be other tools and techniques that are better
suited to our needs.
Logistic regression is well suited to use in classification problems, but is
not particularly effective in forecasting. Neural networks are suitable in
both linear and non-linear solutions, and make no assumptions about the
distribution of the data -- but are difficult to use, computationally
intensive and difficult to explain.
Just
as a carpenter selects from the tools available based on the particular task
at hand, it is important for both the modeler, to have a variety of tools
available and to know when and how to use each.
THE ROLE OF DATA IN PREDICTIVE ANALYTICS
Data are our raw materials for the construction of our models. Let’s
assume that we have already developed a comprehensive project definition.
We then have a conceptual understanding of exactly what we are trying to
achieve. We have a well defined set of rules for playing our game. We have
a set of tools for manipulating our data to find relationships to allow us
to define sub-groups who display a behavior of interest at a rate different
from the population as a whole. Our purpose is to use our model in a
decision environment to allocate our resources more effectively, so that we
increase performance based on the way we keep score.
Typically, it is appropriate to budget 75% to 80% of your time on a modeling
project to data-related activities. This work is comprised of a number of
tasks including collection of variables with potential information content,
cleaning your data, dealing with missing variables, selecting variables, and
determining appropriate data representations and transformation schemes to
maximize the extraction of information content.
It is
beyond the scope of this article to address these areas in detail. However,
it is important to note in the conceptualization of our model development
plan, that performance enhancement come from extracting the information
content from our data… not from using some new exotic tool or technique.
BASIC RESPONSE MODEL
As previously discussed,
most business problems can be formulated as a classification problem where
we have calculated the general propensity for a group to display a behavior
of interest. This is far and away, the easiest practical approach for the
construction of predictive analytics projects.
Your
application area may be respondents to a marketing campaign, attrition
modeling, fraud detection, risk modeling, credit analysis, or any other
behavior. On a basic level, we are concerned with identifying sub-groups of
the population who display the behavior at a different rate than appears in
the population as a whole, and treating them in such a way to improve the
performance as measured by our business objectives.
For
our purposes, we will refer to these models as Response Models. That is,
individuals who comprise these sub-groups respond to a particular set of
circumstances by displaying a behavior or interest at a defined rate, stated
as an expected probability.
ONE-TAIL SOLUTIONS
In practice, the easiest
way to identify a sub-group of interest is to build a model that measures a
relative propensity to display the behavior of interest among the
individuals of in our population. We can then determine a reliable boundary
for a sub-group that displays the behavior at a rate significantly different
from the central tendency. This allows us to classify future individuals as
a member or non-member of the sub-group.
Marketing problems are a commonly used example of this type of solution. We
have determined that our population of potential customers responds to a
direct mail campaign at a 2% response rate. That is, they display our
behavior of interest, purchasing, with an expected probability of 0.02.
For
the purpose of our example, we will assume that all purchases are of
equivalent value, and that we are concerned only with individual mailings.
Our sole metric of performance is response rate.
Our
modeling efforts are comprised of finding a mathematical model that will
allow us to more effectively target which prospective customers to mail.
Our
approach will consist of analyzing our historical data to develop a way of
scoring the individuals in our data set based on their propensity to display
the behavior of purchasing.
Our
output variable is binary in form, where a 1 represents purchasers and a 0
represents non-purchasers. This is a classification problem applied to
human behavior.
Every
combination of variables, weights and mathematical operators is a different
model. Each model represents a different way of scoring our individuals.
As such, each model derives its own level of performance based on our
defined metric of response rate.
It is
apparent that our modeling effort is geared toward finding a model that
discriminates between purchasers and non-purchasers in such a way that our
identified sub-group displays the behavior of purchasing at a rate
significantly higher than the population’s general tendency of 2%.
Our
efforts are comprised of two parts. First building a model that
consistently and reliably ranks our individuals based on their propensity to
purchase. And second, determining the cut-off score that acts as a boundary
between the two groups.
Done
well, this simple response model identifies a sub-group that displays the
behavior of purchasing at a much higher rate than the group as a whole.
It is
worth noting, that in most practical applications, the analysis of
determining the cut-off score that acts as a boundary for membership in the
group has a much more profound impact on performance than the raw ranking
model.
Our
business strategy is then based on allocating resources in such a way that
we contact only those individuals who fall into the sub-group displaying the
higher expected probability of purchasing.
While
this example was based on marketing, the approach can be easily generalized
into any type of behavior. Additionally, viewed this way, a large variety
of problems can now be conceptualized as identification of the sub-group,
determination of boundaries, and appropriate allocation of resources to
enhance performance. Whether the impact of the variance in behavior is
positive or negative is accounted for in the way resources are allocated.
TWO-TAILED SOLUTIONS
While a one-tailed
solution often generates significant enhancements to performance, it is
often incomplete. Considering both tails of the distribution of propensity
to display the behavior of interest generally provides an additional
incremental improvement in performance.
From
the above One-Tail Solution, let’s assume that we identified a sub-group
that purchases at a 4% response rate. For purposes of discussion, let’s
assume that the sub-group consisted of the top two deciles of the
individuals scored.
In a
marketing example, based on new customers, we’d simply buy five times as
many names, score the entire list, select those scoring in the top two
deciles for mailing, and benefit from the enhanced selection technique.
If we
modify our problem slightly, and consider a marketing campaign to existing
customers, our one-tailed example isn’t completely practical. It may not
possible to collect five times as many individuals from a finite pool.
In
such a case, we may still use the same approach to ranking our existing
customers and setting our boundary for those individuals with a
significantly higher propensity to purchase. We want to ensure that we
allocate resources appropriately to achieve the enhanced benefits these
individuals provide us.
We can
not afford to simply ignore the remaining individuals in our existing
customer list. Our goal becomes determining how to allocate our remaining
resources most effectively.
A
simple approach would be to move our boundary condition lower and lower on
the ranking scale, contacting additional customers, until we have completely
allocated available resources.
While
this approach is practical, and may lead to results that out-perform current
methods, the practical reality of predictive analytics is that the best
results occur in the tails of the distribution. That is, the more you
approach the central tendency, the less reliable your results are likely to
be.
How
then, can we achieve the benefits of predictive analytics by focusing our
attention in the tails of the distribution? Generally, the easiest approach
is to invert our initial logic. In this case, we can identify a sub-group
that is highly unlikely to purchase, and ensure that we do not
allocate resources to those individuals.
This
is simply the development of a separate one-tailed model, where the
sub-group we are identifying as a 1 is non-purchasers. Again, we complete
our analysis by setting appropriate boundary conditions.
It is
important to note that in our one-tailed solutions, we are scoring our
individuals on a scale of zero to one, where a one is a strong likelihood of
being a member of the set of interest. The most common misunderstanding of
this scoring is considering a zero to be a non-member of the group. This is
an inappropriate conclusion.
Remember, in human behavior modeling, we make no assumptions about causality
in the variables in our model. Our variables are simply one of many ways to
describe a sub-group. What is important is that the expected probability of
the sub-group displaying the behavior is significantly different than for
the group as a whole.
Individuals with a score close to zero display a low expectation of being a
member of the sub-group, as we’ve defined it. They do not necessarily
display a low expected probability of displaying the behavior. They simply
are included in this sub-group.
The
implication of this is that, if we want to find a sub-group that has a
significantly lower than normal propensity to display the behavior of
interest, we must develop, test and implement a model designed to capture
that behavior of interest.
This
approach to capturing both tails of a distribution, the tail that consists
of a sub-group displaying a higher than normal rate of behaving in a
particular way, and the inverse tail that displays the rate at a
significantly lower than normal rate, generally results in an improvement in
performance greater than that achieved by considering only one tail of the
behavior distribution.
CONCLUSION
The conceptualization of business problems as a way of playing a
game, using well defined rules and methods of keeping score, is especially
well suited to the utilization of predictive analytics.
The
approach of treating our decision processes as a sorting mechanism, creating
groups and sub-groups for a particular purpose, assigning individuals to
group membership, and allocating resources to the groups in an appropriate
manner, is highly generalizable to many business scenarios. It is also
consistent with the attributes of human behavior we are attempting to
anticipate.
The
ranking of sub-groups, based on their expected probability to display a
behavior with business impact, allows managers to allocate resources in a
way that has a controllable impact on performance, and customized to the
business decision maker’s priorities.
Predictive analytics is not magic. It is not based on rocket science, and
not necessarily based on extremely complex mathematical concepts. It is
based on a different way of thinking about problems, knowing clearly what
you want to achieve, and manipulating your data to discover a strategy for
achieving enhanced performance.
AUTHOR BIOGRAPHY
Thomas
A. “Tony” Rathburn
is a senior consultant with The Modeling Agency. Tony has worked with
commercial and government clients to develop data mining solutions to
significant business applications since the mid 1980’s. Mr. Rathburn
delivers custom workshops and consults on a wide range of commercial
assignments -- many involving CRM applications. He is the primary
instructor of “Data Mining: Level I,” a
vendor-neutral and best-practices approach to data mining as outlined at
He holds extensive data mining experience in the banking, insurance, and
financial industries. Tony may be reached at
tony@the-modeling-agency.com