hadoop big data - fraud detection with real-time analytics

12
Hadoop BIG Data Fraud Detection with real-time Analysis

Upload: hkbhadraa

Post on 06-Jan-2017

583 views

Category:

Technology


4 download

TRANSCRIPT

Page 1: Hadoop BIG Data - Fraud Detection with Real-Time Analytics

Hadoop – BIG DataFraud Detection with real-time Analysis

Page 2: Hadoop BIG Data - Fraud Detection with Real-Time Analytics

What is Fraud Detection?

Fraud Detection with real-time Analysis with Hadoop and Big Data Technologies for different industries such as Banking, Finance, Insurance, Core Accounts Receivable, Government, HealthCare, or Retail.

Fraud is a major concern across all industries. You name the industry (Banking, Insurance, Government, Accounts Receivable, HealthCare, or Retail, for example) and you’ll find fraud.

In today’s inter-connected world, the sheer volume and complexity of transactions makes it harder than ever to find fraud.

Traditional approaches to fraud prevention aren’t particularly efficient. For example, the management of improper payments is often managed by analysts auditing what amounts to a very small sample of claims paired with requesting medical documentation from targeted submitters. The industry term for this model is pay and chase. Claims are accepted and paid out and processes look for intentional or unintentional overpayments by way of post-payment review of those claims.

Page 3: Hadoop BIG Data - Fraud Detection with Real-Time Analytics

Though the sheer volume of transactions makes it harder to spot fraud because of the volume of data, ironically, this same challenge can help create better fraud predictive models – an area where Hadoop and Big Data shines.

What is Fraud Detection?

Page 4: Hadoop BIG Data - Fraud Detection with Real-Time Analytics

How is Fraud detection done?

So how is fraud detection done now?

Because of the limitations of traditional technologies, fraud models are built by sampling data and using the sample to build a set of fraud-prediction and detection models. When you contrast this model with a Hadoop Big Data –anchored fraud department that uses the full data set – No Sampling – to build out the models, you can see the difference.

For creating fraud-detection models, Hadoop is well suited to

Handle Volume: That means processing the full data set - no data sampling.

Manage new varieties of data: Data coming from different sources and in different formats.

Maintain an agile environment: Enable different kinds of analysis and changes to existing models.

Page 5: Hadoop BIG Data - Fraud Detection with Real-Time Analytics

How is Fraud detection done?

The limitations of sampling

Faced with expensive hardware and a pretty high commitment in terms of time and RAM, people tried to make the analytics workload a bit more reasonable by analyzing only a sampling of the data.

While sampling is a good idea in theory, in practice this is often an unreliable tactic. Finding a statistically significant sampling can be challenging for sparse and/or skewed data sets, which are quite common. This leads to poorly judged samplings, which can introduce outliers and anomalous data points, and can, in turn, bias the results of analysis.

Page 6: Hadoop BIG Data - Fraud Detection with Real-Time Analytics

BEST PRACTICES IN FRAUD MANAGEMENT

A best-practice fraud management approach is integrated from end to end.

Figure 1: Fraud management approach Integrated End-End

Page 7: Hadoop BIG Data - Fraud Detection with Real-Time Analytics

BEST PRACTICES IN FRAUD MANAGEMENT

COMBATING FRAUD WITH THE TECHNOLOGY AVAILABLE TODAY – Big Data HadoopStep 1. Create an enterprise wide view of patterns and perpetrators.Step 2. Prevent and detect fraud in enterprise wide context.Step 3. Investigate and Resolve Fraud in an Integrated Environment.

Figure below shows how Hadoop can be integrated within an Enterprise and how it can be used in an enterprise for building Fraud Patterns and Models and analytics on full data, rather going for sampling.

Figure 2: Hadoop in Enterprise

Page 8: Hadoop BIG Data - Fraud Detection with Real-Time Analytics

BEST PRACTICES IN FRAUD MANAGEMENT

A best-practice fraud management system is integrated from end to end, from data management to analysis (using multiple analytical techniques), alert generation and management, and case management.

Hadoop as a queryable archive in support of an enterprise data warehouse.Hadoop can be used as a data transformation engine.Hadoop as a data processing engineHadoop to add Discovery and Sandbox capabilities to a modern-day analytics ecosystem.

Fraud Models and Hadoop

Most Hadoop use cases is that it assists business in breaking through the glass ceiling on the volume and variety of data that can be incorporated into decision analytics. The more data we have, the better our models can be.Mixing non-traditional forms of data with set of historical transactions can make fraud models even more robust.Organization can work to move away from market segment modelling and move toward at-transaction or at-person level modelling. Quite simply, making a forecast based on a segment is helpful, but making a decision based on particular information about an individual transaction is better. To do this, we work up a larger set of data than is conventionally possible in the traditional approach.

Page 9: Hadoop BIG Data - Fraud Detection with Real-Time Analytics

BEST PRACTICES IN FRAUD MANAGEMENT

If the data used to identify or bolster new fraud-detection models isn’t available at a moment’s notice, by the time we discover these new patterns, it could be too late to prevent damage.

Evaluate the benefit to business of not only building out more comprehensive models with more types of data but also being able to refresh and enhance those models faster than ever.Traditional technologies aren’t as agile, either. Hadoop makes it easy to introduce new variables into the model.

Page 10: Hadoop BIG Data - Fraud Detection with Real-Time Analytics

Traditional Statistical Analysis and Hadoop

Traditional statistical analysis applications come with powerful tools for generating workflows.

These applications utilize intuitive graphical user interfaces that allow for better data visualization. Hadoop follow a similar pattern as these other tools for generating statistical analysis workflows.

See Figure 3, during the final data exploration and visualization step, users can export to human-readable formats (JSON/CSV) or take advantage of visualization tools.

Figure 3: Generalized statistical analysis workflow with Hadoop

Page 11: Hadoop BIG Data - Fraud Detection with Real-Time Analytics

CLOSING THOUGHTS

Fraud is a major concern across all industries.

Many organisations spend lot of money and efforts in preventing fraud. With power of modern technologies such as Big Data and Hadoop analysing, detecting and preventing fraud has gone to a next level.

Organisations can continue using their existing IT infrastructure and leverage Big Data Hadoop technologies for real-time fraud analysis.

Organisations can truly be agile while handing Data in Motion, Data at Rest & Data in Many Forms with Big Data Hadoop Technologies.

Page 12: Hadoop BIG Data - Fraud Detection with Real-Time Analytics

Thank [email protected]