Data Mining in the Monetary
and Financial Regulatory Space
by Steven W. Oxman
President, OXKO Corporation
INTRODUCTION
In
the United States, there are tens of millions of monetary and financial
transactions going on every day. Many of these transactions fall under
government regulations for one or more reasons. One federal agency is
required by law to monitor certain financial and monetary transactions.
While doing this monitoring, multiple computer system based solutions were
considered and many were tried. There are many different illegal activities
which this monitoring seeks to identify. Examples of these include
fraudulent financial activities and money laundering.
Imagine looking for one or a set of
trans-actions that are not proper from the entire universe of transactions.
Looking at one day’s worth of transactions is a lot of work. Looking at one
month’s worth of transactions is a large undertaking. Looking at one year’s
worth of transactions most likely is too large a data set for many of the
analysis computers that the government has at its disposal for this work.
And what are you looking for? People
who execute improper financial and monetary transactions work hard to make
these transactions look normal and legal. The law says that you must notify
the authorities when you move $10,000.00 or more via a bank, so the people
not wishing to notify the authorities try to move money in smaller amounts
and try to use means other than banks for many of these transactions. How
do we ferret out the illegal transaction(s) from the legal ones when they
“look” so much the same? Will simple data analysis and data mining work for
this work? If yes, what algorithms do work? If no, then what must an
analyst do to find the transactions of interest?
What are the analysts allowed to do (by
law)? Can the FBI easily obtain financial data from other governmental
organizations and agencies like the IRS or FDIC, for example? If yes, then
what are their boundaries for doing so? If no, then what can they do?
WITHIN AND BETWEEN
GOVERNMENT AGENCIES
Although this subject actually goes beyond the boundaries of this
article, and I am not an expert in this area, I can tell you what I have
observed.
Even with the Patriot Act, it is still
not easy for Government Agencies to share certain data about businesses and
individuals. The US Government takes privacy issues very seriously.
Agencies like the FBI cannot request certain information directly from other
governmental agencies. Instead, the FBI or similar agencies must go to a
Grand Jury and ask the Judge for permission for certain information
transference. They must provide good reason along with very specific focus
and purpose. If the Grand Jury and the Judge agree, then specific
permission can be given for the good reason for specific information
pertinent to a specific case. For the most part “John Doe requests” (those
that ask for information on a larger, non-specific group of people) are not
approved.
So agents and analysts have to focus on
their cases, must focus on information they require, must make a good case
for the information needed, and must be able to present their case to the
Grand Jury effectively, before they can obtain certain data and information
from another Government Agency. It is not easy, and it is usually not
fast. If given the “go ahead” from the Judge and Jury, then the analysts
and agents can obtain that specific data and information and use it within
the boundaries of the Judge’s and Jury’s guidelines.
ANALYSTS' WORK
Usually,
the work of a financial or monetary analyst is being done for one or more
agents working on specific cases or specific financial or monetary schemes.
The agents usually work with agency lawyers. The analyst often will have to
sift through a lot of raw data looking for patterns.
Anyone who might be called as a witness
in a case cannot be “tainted” with data and information that is beyond the
boundaries of the case. Therefore, in a large case, there could be an
analyst that does the preliminary work that might get inadvertently
tainted. Then there is another analyst or analysts that only look at the
initially sifted-through data, so they will not be tainted and can be
available as witnesses as needed by the government lawyer on the case.
The analyst’s work is to find pertinent
evidence for the case, and specifically for the lawyer on the case. This
analyst must work within the boundaries set by any Judge or Grand Jury, if
applicable.
An analyst’s workstation needs to be
connected to the data. This data can be very large and the need to access
this very large data base is important. Therefore, the workstation needs a
large communications pipe to the data base server or servers. The analyst’s
workstation (some like to call it a work bench) needs to be loaded with a
rich set of data analysis and data mining tools. The analyst will have
basic tools like Microsoft Excel and all its data analysis tools. SQL
Server is a good data base management system for the data servers, for it
offers important services like Analysis Services for SQL Server 2005 and
OLAP (on line analytical processing) support. Even before SQL Server 2005,
Microsoft was providing a lot of these services to SQL Server. With SQL
Server 2005 and Office 2007, Microsoft provides free downloads of new data
mining tools for Excel and Visio.
The rest of the data analysis tools
needed for the analyst might include ETL (extraction, transformation, and
loading – with a de-emphasis on the transformation step) tools, data mining
tools, link analysis tools, data induction tools, web analytics tools, data
visualization tools, text handling tools, text analysis tools, text mining
tools, time series analysis tools, powerful search tools (for web and for
private data collections), data merge tools, data field duplication
detection tools, statistical tools, etc. Effective tools are available from
vendors like SPSS, SAS, IBM, Microsoft, KnowledgeStudio, Isoft, i2, Visual
Analytics, Tildenwoods, LPA, NetMap Analytics, Megaputer, Eastport
Analytics, Business Objects, MicroStrategy, Paris Technologies, Thomas
Behrends, Synaptris, and others.
CHAIN OF CUSTODY AND
ORIGINAL DATA FORM RETENTION
This is another area that is beyond the scope of this paper, but
allow me to provide a few observations. When working on legal projects, the
court system wants to insure that there is a clear and documented chain of
custody of the data and its analysis. This is to insure that the data used
for the case has not been modified. Also, the courts want documented
evidence that the data used for the analysis was of the original form, that
it has not been transformed, modified, or otherwise altered as to not
represent the original evidence and facts. It is up to the agents to bring
the analysts “clean” data, and it is up to the agents and the analysts to
not modify or corrupt the data, and to be able to provide full documentation
of all parties that have had any contact with the data.
DATA ANALYSIS MEETS
SOCIAL ANALYSIS
Data mining and analysis could be defined as the process of
pattern and/or knowledge discovery from large collections of data using data
analytic methods. Social analysis might be defined as the process of
pattern and/or knowledge discovery from collections of social artifacts
using social analytic methods.
Data mining and analysis amassess large
collections of data, and looks for patterns of interest in the collected
data base to realize knowledge discovery from the data patterns found.
Social analysis collects social facts and looks for patterns of interest in
the collected social fact base to realize knowledge discovery from the
social patterns found.
We might find through a data mining and
analysis process that many monetary transactions of $5,000.00 each by
individuals were attempts to transfer large amounts of money without
Government Agency notice. We might find, through social analysis, that
people predominantly spend larger amounts of money on groceries when they
are near their primary residence.
When looking to use data mining and
analysis techniques to find illegal monetary and financial transactions and
identify the parties involved, sometimes social analysis is needed to assist
with the analysis work. For example, say that an individual was
transferring money from one country into and out of the US. And say that
the monetary actions and monetary transfer technique was identified. The
analysis left us with the monetary vehicle and an identification number, but
not the name and address of the individual involved. So how do we locate
and identify the individual of interest? Can data mining find this person
for us?
We have not been able to identify a
method of locating and identifying the person of interest from a pure data
mining play. However, by utilizing both data mining and analysis with
social analysis, we were able to develop a method to locate and identify
people of interest. Our social analysis really gets us into “social
mining.”
Through the data mining and analysis
work, we can identify monetary and/or financial transactions that appear to
be illegal. We can verify if the transactions are illegal or not through
classic investigatory methods. When we have a transaction or set of
transactions that are verified as illegal, then we proceed to locate and
identify the person or persons of interest.
We start with any identifying data from
the transaction, for example, an account number. Sometimes we are lucky and
the person’s location and identification data is also available, that makes
it pretty simple to provide the location and identification information to
the agent or agents. But frequently life is not so simple. We have an
account number, but little to no other data for location and identification
purposes. But, the rest of the information will help us. It is data on
where transactions occurred and what the transactions were for, whether the
reason for the transaction is accurate or not, this information will assist
us.
So let’s take a small imaginary example
to see how social mining and analysis assists us. For one account number,
we have the following transactions:
|
Date |
Place |
City, State |
Items(s) |
Amount |
| 01/01/01 |
Grocery Store |
New York, NY |
Food items |
$50.00 |
| 01/01/01 |
Liquor Store |
New York, NY |
Wine |
$100.00 |
| 01/02/01 |
Gas Station |
Branford, CT |
Gas |
$50.00 |
| 01/02/01 |
Grocery Store |
Natick, MA |
Food items |
$250.00 |
| 01/03/01 |
Vet Office |
Natick, MA |
Cat Exam |
$80.00 |
| 01/04/01 |
Tire Store |
Wellesley, MA |
Tires |
$300.00 |
| 01/04/01 |
Jiffy Lube Store |
Natick, MA |
Lube and Oil |
$40.00 |
| 01/05/01 |
A-Plus Plumbing |
Framingham, MA |
Plumbing Repair |
$150.00 |
If an analyst
were to see this data alone, I believe that a good analyst would be able to
infer that the person of interest lives in the Natick/Wellesley/Framingham
area of Massachusetts. This might be a primary residence or a vacation
home. The Natick area in January does not seem to be a vacation home or
secondary residence. Natick is not near warm climates for January, it is
not near any significant tourist attraction, and it is not near a winter
sport area like a Ski Resort. Therefore, most likely, this is a primary
residence. So there are at least two important questions to answer here:
1. When
working with very large data bases, how do we get to “look” at a small
data set like this one?
2. How do
we identify the actual location and identification of the person of
interest from this data?
Actually it is easy to answer both of
these questions. For question one, we would query our very large data base
for one week or one month of account data for an account that we have
already identified as having illegal monetary or financial transactions
(e.g., twenty movements of approximately $5,000.00 from one account to one
other account without any notification, in a very short period of time).
Therefore, we would be now looking at a small, very workable set of data
like that shown above. For question two, we would go to the Grand Jury and
get the proper authorization to go to some of the businesses (say the Vet
and Plumber) and ask for the identification and location of the person who
used that account number. For smaller businesses near the person of
interest, it is likely (like in the case of the Vet) that the business would
know their client, have the account numbers on file or in their accounting
records. Where personal recognition is later needed, again with the correct
court permission and procedures, employees of these businesses can be used
for personal recognition of the person or persons.
Also notice, that through the use of
data mining, a good rule-based engine, and some social mining, an analyst
could also automate the function of going from improper monetary or
financial transactions directly to account identification and location data
assistance. For example, one rule might be: If business used is a VET,
then primary residence is within 10 miles of the business, Confidence 80.
Another example might be: If business used is a GROCERY STORE, and amount
spent is OVER $100, then primary residence is within 5 miles of the
business, Confidence 90.
CONCLUDING REMARKS
Data analysis and
mining is a valuable tool for regulatory agencies. However, data analysis
and mining, in some cases, is not enough. Social analysis and mining can be
used to augment data analysis and mining in those cases where data analysis
and mining is not enough. Through the use of data mining and social mining
together, many illegal activities could be found including, but not limited
to, illegal monetary transactions, illegal financial transactions, money
laundering, drug related transactions, and terrorism related transactions.
ABOUT THE AUTHOR
Steven Oxman, President of the
OXKO Corporation, has
been in the Information Technology industry since 1967 spanning a number of
significant technologies including very large data bases, data warehouses,
data base security, software engineering environments, expert systems,
knowledge based systems, data mining, and web based application development.
Presently Mr. Oxman has been functioning as a part-time/surrogate CIO for a
number of organizations that have a need for a CIO level executive for a
specific amount of time/for a specific need.
Clients of Mr. Oxman have included Charlotte Radiology,
Du Pont, Elkem Metals, Executive Residence, the IRS, US Navy, NASA, CSC,
Ingersoll-Rand, American Airlines, GE, Union-Carbide, and the State of
Illinois. Mr. Oxman has also been a Professor for George Washington
University, and an Instructor for American University and City Colleges of
Chicago. Mr. Oxman is an avid pilot, often using his personal aircraft to
commute to his clients.
Published with permission by
OXKO Corporation.
Copyright © 2008