fraud detection

Fraud DetectionIn

Mobile Telecommunications

Presented by

NADEEM KHAN 1MS10MCA27SANJIB JOSHI 1MS10MCA43

MARCH 2012

M S Ramaiah Institute of Technology (Autonomous Institute Affiliated to VTU)

Bangalore – 560 054

TABLE OF CONTENTS

1.0 Abstract

2.0 Introduction to Data Mining

2.1 Data Mining Techniques

2.2 Telecommunication Marketing

2.3 Information about WEKA

2.4 Telecommunication Fraud Detection

2.5 Network Fraud Isolation

3.0 Telecommunication Fraud

3.1 Subscription fraud

3.2 Bad Debt

3.3 Call Detail Record

4.0 Problem Definition

4.1 Algorithms used

4.2 Snapshots

5.0 Conclusion

6.0 References

1. AbstractHuge amounts of data are being collected as a result of the increased use of mobile telecommunications. Insight into information and knowledge derived from these databases can give operators a competitive edge in terms of customer care and retention, marketing and fraud detection.One of the strategies for fraud detection checks for signs of questionable changes in user behavior. Although the intentions of the mobile phone users cannot be observed, their intentions are reflected in the call data which define usage patterns. Over a period of time, an individual phone generates a large pattern of use. While call data are recorded for subscribers for billing purposes, we are making no prior assumptions about the data indicative of fraudulent call patterns, i.e. the calls madefor billing purpose are unlabeled. Further analysis is thus, required to be able to isolate fraudulent usage. An unsupervised learning algorithm can analyze and cluster call patterns for each subscriber in order to facilitate the fraud detection process. This research investigates the unsupervised learning potentials of two neural networks for the profiling of calls made by users over a period of time in a mobile telecommunication network. Our study provides a comparative analysis and application of Self-Organizing Maps (SOM) and Long Short-Term Memory (LSTM) recurrent neural networks algorithms to user call data records in order to conduct a descriptive data mining on users call patterns. Our investigation shows the learning ability of both techniques to discriminate user call patterns; the LSTM recurrent neural network algorithm providing a better discrimination than the SOM algorithm in terms of long time series modeling. LSTM discriminates different types of temporal sequences and groups them according to a variety of features. The ordered features can later be interpreted and labeled according to specific requirements of the mobile service provider. Thus, suspicious call behaviors are isolated within the mobile telecommunication network and can be used to identify fraudulent call.

2. Introduction

Data mining, the extraction of hidden predictive information from large databases, is a powerful new technology with great potential to help companies focus on the most important information in their data warehouses. Data mining tools predict future trends and behaviors, allowing businesses to make proactive, knowledge-driven decisions. The automated, prospective analyses offered by data mining move beyond the analyses of past events provided by retrospective tools typical of decision support systems. Data mining tools can answer business questions that traditionally were too time consuming to resolve. They scour databases for hidden patterns, finding predictive information that experts may miss because it lies outside their expectations.Most companies already collect and refine massive quantities of data. Data mining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources, and can be integrated with new products and systems as they are brought on-line. When implemented on high performance client/server or parallel processing computers, data mining tools can analyze massive databases to deliver answers to questions such as, "Which clients are most likely to respond to my next promotional mailing, and why?"This white paper provides an introduction to the basic technologies of data mining. Examples of profitable applications illustrate its relevance to today’s business environment as well as a basic description of how data warehouse architectures can evolve to deliver the value of data mining to end users.

2.1 Datamining Techniques

There are several major data mining techniques have been developed and used in data mining projects recently including association, classification, clustering, prediction and sequential patterns. We will briefly examine those data mining techniques with example to have a good overview of them.

1. Association : Association is one of the best known data mining technique. In association, a pattern is discovered based on a relationship of a particular item on other items in the same transaction. For example, the association technique is used in market basket analysis to identify what products that customers frequently purchase together. Based on this data businesses can have corresponding marketing campaign to sell more products to make more profit.

2. Classification : Classification is a classic data mining technique based on machine learning. Basically classification is used to classify each item in a set of data into one of predefined set of classes or groups. Classification method makes use of mathematical techniques such as decision trees, linear programming, neural network and statistics. In classification, we make the software that can learn how to classify the data items into groups. For example, we can apply classification in application that “given all past records of employees who left the company, predict which current employees are probably to leave in the future.” In this case, we divide the employee’s records into two groups that are “leave” and “stay”. And then we can ask our data mining software to classify the employees into each group.

3. Clustering : Clustering is a data mining technique that makes meaningful or useful cluster of objects that have similar characteristic using automatic technique. Different from classification, clustering technique also defines the classes and put objects in them, while in classification objects are assigned into predefined classes. To make the concept clearer, we can take library as an example. In a library, books have a wide range of topics available. The challenge is how to keep those books in a way that readers can take several books in a specific topic without hassle. By using clustering technique, we can keep books that have some kind of similarities in one cluster or one shelf and label it with a meaningful name. If readers want to grab books in a topic, he or she would only go to that shelf instead of looking the whole in the whole library.

4. Prediction: The prediction as it name implied is one of a data mining techniques that discovers relationship between independent variables and relationship between dependent and independent variables. For instance, prediction analysis technique can be used in sale to predict profit for the future if we consider sale is an independent variable, profit could be a dependent variable. Then based on the historical sale and profit data, we can draw a fitted regression curve that is used for profit prediction.

5. Sequential Patterns: Sequential patterns analysis in one of data mining technique that seeks to discover similar patterns in data transaction over a business period. The uncover patterns are used for further business analysis to recognize relationships among data.

2.2 Telecommunication Marketing

The telecommunications industry generates and stores a tremendousamount of data. These data include call detail data, which describes the calls that traverse the telecommunication networks, network data, which describes the state of the hardware and software components in the network, and customer data, which describes the telecommunication customers. The amount of data is so great that manual analysis of the data is difficult, if not impossible. The need to handle such large volumes of data led to the development of knowledge-based expert systems. These automated systems performed important functions such as identifying fraudulent phone calls and identifying network faults. The problem with this approach is that it is time consuming to obtain the knowledge from human experts (the “knowledge acquisition bottleneck”) and, in many cases, the experts do not have the requisite knowledge. The advent of data mining technology promised solutions to these problems and for this reason the telecommunications industry was an early adopter of data mining technology. Telecommunication data pose several interesting issues for data mining. The first concerns scale, since telecommunication databases may contain billions of records and are amongst the largest in the world. A second issue is that the raw data is often not suitable for data mining. For example, both call detail and network data are time-series data that represent individual events. Before this data can be effectively mined, useful “summary” featuresmust be identified and then the data must be summarized using thesefeatures. Because many data mining applications in the telecommunications industry involve predicting very rare events, such as the failure of a network element or an instance of telephone fraud, rarity is another issue that must be dealt with. The fourth and final data mining issue concerns real-time performance: many data mining applications, such as fraud detection, require that any learned model/rules be applied in real-time.

2.3 Information about WEKA

The Weka work bench contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces for easy access to this functionality. The original non-Java version of Weka was aTCL/TK front-end to (mostly third-party) modeling algorithms implemented in other programming languages, plus data preprocessing utilities in C, and a Make file-based system for running machine learning experiments. This original version was primarily designed as a tool for analyzing data from agricultural domains, but the more recent fully Java-based version (Weka 3), for which development started in 1997, is now used in many different application areas, in particular for educational purposes and research. Advantages of Weka include:

Free availability under the GNU General Public License Portability, since it is fully implemented in the Java

programming language and thus runs on almost any modern computing platform

A comprehensive collection of data preprocessing and modeling techniques

Ease of use due to its graphical user interfaces.

Weka supports several standard data mining tasks, more specifically, data preprocessing, clustering, classification, regression, visualization, and feature selection. All of Weka's techniques are predicated on the assumption that the data is available as a single flat file or relation, where each data point is described by a fixed number of attributes (normally, numeric or nominal attributes, but some other attribute types are also supported). Weka provides access to SQL databases using Java Database Connectivity and can process the result returned by a database query. It is not capable of multi-relational data mining, but there is separate software for converting a collection of linked database tables into a single table that is suitable for processing using Weka. Another important area that is currently

not covered by the algorithms included in the Weka distribution is sequence modeling.The Explorer interface features several panels providing access to the main components of the workbench:

The Preprocess panel has facilities for importing data from a database, a CSV file, etc., and for preprocessing this data using a so-called filtering algorithm. These filters can be used to transform the data (e.g., turning numeric attributes into discrete ones) and make it possible to delete instances and attributes according to specific criteria.

The Classify panel enables the user to apply classification and regression algorithms (indiscriminately called classifiers in Weka) to the resulting dataset, to estimate the accuracy of the resulting predictive model, and to visualize erroneous predictions, ROC curves, etc., or the model itself (if the model is amenable to visualization like, e.g., a decision tree).

The Associate panel provides access to association rule learners that attempt to identify all important interrelationships between attributes in the data.

The Cluster panel gives access to the clustering techniques in Weka, e.g., the simple k-means algorithm. There is also an implementation of the expectation maximization algorithmfor learning a mixture of normal distributions.

The Select attributes panel provides algorithms for identifying the most predictive attributes in a dataset.

The Visualize panel shows a scatter plot matrix, where individual scatter plots can be selected and enlarged, and analyzed further using various selection operators.

2.4 Telecommunication Fraud Detection

Fraud, the act of deceiving others for personal gain, is certainly as old as civilization itself. The word comes from the Latin fraudem, meaning deceit or injury, and over the years has come to represent a wide array of injustices, including forged artwork, confidence schemes, academic plagiarism, and email advance-fee frauds (such as the well-known Nigerian-email scam). Although these forms of fraud are very different in nature, they all have in common a dishonest attempt to convince an innocent party that a legitimate transaction is occurring when in fact it is not. Usually the fraud is for monetary gain, but not always, as fraud may be perpetrated for political causes (e.g., electoral fraud), personal prestige (e.g., plagiarism), or self-preservation (e.g., perjury). In the twentieth century, fraud matured in the area of transactional businesses, most notably in the telecommunications and credit card industries. Due to the sheer volume of transactions in these businesses, fraud could go unnoticed fairly easily, because it was such a small proportion of the overall business. In the early days of the telecommunications business, “social engineering” was used to convince telephone operators to giveAccess to lines or complete calls that were not authorized. In the 1950s, AT&T started automating direct-dial long distance calling, exposing themselves to the first generation of hackers. Because fraud could now be perpetrated without speaking to a human, it could be automated.

2.5 Network Fault Isolation

Determining the cause of a problem. Also known as "fault diagnosis," the term may refer to hardware or software, but always deals with methods that can isolate the component, device or software module causing the error. Fault isolation may be part of hardware design at the circuit level all the way up to the complete system. It is accomplished by building in test circuits and/or by dividing operations into multiple regions or components that can be monitored separately. After fault isolation is accomplished, parts can be replaced manually or automatically.

Although the terms "fault isolation" and "fault detection" are sometimes used synonymously, fault detection means determining that a problem has occurred, whereas fault isolation pinpoints the exact cause and location. Software can also be created and run with fault isolation in mind. Many techniques can be used. For example, program modules can be run in different address spaces to achieve separation. In addition, generating intermediate output that can be examined as well as recording operational steps in a log are ways to assist the trouble shooter to manually determine which routine caused the application to stop working or stop working properly.

3.Telecommunication Fraud

Telecommunication fraud can be defined as the theft of services or deliberate abuse of voice or data networks. Telecommunication fraud can be broken down into several generic classes. These classes describe the mode in which the operator was defrauded, for example, subscription using false identity. Each mode can be used to defraud the network for revenue based purposes or nonrevenue based purpose. Most of these frauds are perpetrated either by the fraudster impersonating someone else or technically deceiving the network systems.

3.1 Subscription Fraud : Subscription fraud is the obtaining of a telecommunication account on postpaid through normal procedure with no intention of paying for the bill either using false or stolen identity.This is usually done by giving a wrong address such that the subscriber remains untraceable. The typical pattern of such fraudsters is to run a high bill in a short time.

3.2 Bad Debt : Bad Debt occurs when payment is not received for good/services rendered. This is, for example, in a telecommunication company, where the callers or customers appear to have originally intended to honour their bills but have since lost the ability or desire to pay. If someone does not pay their bill, then the telecom company has to establish if the person was fraudulent or was merely unable to pay.

3.3 Call Detail Record(CDR): Call Detail Record is descriptive information about the call placed on a telecommunication network. It includes sufficient information to describe the important characteristics of each call such OPC, DPC, duration of the call, Call start and end time, etc.

4.Problem Defnition

Over a period of time, an individual handset’s Subscriber Identity Module (SIM) card generates a large pattern of use. The pattern of use may include international calls and time-varying call patterns among others. Anomalous use can be detectedwithin the overall pattern such as subscribers abuse of free call services such as emergency services. Anomalous use can be identified as belonging to one of two types :

1. The pattern is intrinsically fraudulent; it will almost never occur in normal use. This type is relatively easy to detect.

2. The pattern is anomalous only relative to the historical pattern established for that phone.

In order to detect fraud of the second type, it is necessary to have knowledge of the history of SIM usage. Hence, a descriptive analysis of the call profiling for each subscriber can be used for knowledge extraction. Interpretation by way of clustering or grouping of similar patterns can help in isolating suspicious call behaviour within the mobile telecommunication network. This can also help fraud analysts in their further investigation and call pattern analysis of subscribers. While call data are recorded for subscribers for billing purposes, it is interesting to know that no prior assumptions are made about the data indicative of fraudulent call patterns.

In other words, the calls made for billing purposes are unlabeled. Further analysis is thus required to be able to identify possible fraudulent usage. Because of the huge call volumes, it is virtually impossible to analyse without sophisticated techniques and tools.

4.1 Algorithms Used

Bayesian Networks : A Bayesian network, Bayes network, belief network, hierarchical

Bayes(ian) model or directed acyclic graphical model is aprobabilistic graphical model (a type of statistical model) that represents a set of random variables and their conditional dependencies via adirected acyclic graph (DAG). For example, a Bayesian network could represent the probabilistic relationships between diseases and symptoms. Given symptoms, the network can be used to compute the probabilities of the presence of various diseases.

A Bayesian network is a graphical model that encodes probabilistic relationships among variables of interest. When used in conjunction with statistical techniques, the graphical model has several advantages for data analysis. One, because the model encodes dependencies among all variables, it readily handles situations where some data entries are missing. Two, a Bayesian network can be used to learn causal relationships, and hence can be used to gain understanding about a problem domain and to predict the consequences of intervention. Three, because the model has both a causal and probabilistic semantics, it is an ideal representation for combining prior knowledge (which often comes in causal form) and data. Four, Bayesian statistical methods in conjunction with Bayesian networks offer an efficient and principled approach for avoiding the over fitting of data. In this paper, we discuss methods for constructing Bayesian networks from prior knowledge and summarize Bayesian statistical methods for using data to improve these models. With regard to the latter task, we describe methods for learning both the parameters and structure of a Bayesian network, including techniques for learning with incomplete data. In addition, we relate Bayesian-network methods for learning to techniques for supervised and unsupervised learning.

Neural Network Method

An artificial neural network (ANN), usually called neural network (NN), is a mathematical model or computational model that is inspired by the structure and/or functional aspects of biological neural networks. A neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation. In most cases an ANN is an adaptive system that changes its structure based on external or internal information that flows through the network during the learning phase. Modern neural networks are non-linear statistical data modeling tools. They are usually used to model complex relationships between inputs and outputs or to find patterns in data. The word network in the term 'artificial neural network' refers to the inter–connections between the neurons in the different layers of each system. An example system has three layers. The first layer has input neurons, which send data via synapses to the second layer of neurons, and then via more synapses to the third layer of output neurons. More complex systems will have more layers of neurons with some having increased layers of input neurons and output neurons. The synapses store parameters called "weights" that manipulate the data in the calculations.An ANN is typically defined by three types of parameters:

The interconnection pattern between different layers of neurons

The learning process for updating the weights of the interconnections

The activation function that converts a neuron's weighted input to its output activation.

Rule Based Method

Rule-based methods, rule discovery or rule extraction from data, are data mining techniques aimed at understanding data structures, providing comprehensible description instead of only black-box prediction. Rule based systems should expose in a comprehensible way knowledge hidden in data, providing logical justification for drawing conclusions, showing possible inconsistencies and avoiding unpredictable conclusions that black box predictors may generate in untypical situations. Sets of rules are useful if rules are not too numerous, comprehensible, and have sufficiently high accuracy. Rules are used to support decision making in classification (Classification, Machine Learning), regression (Regression, Statistics) and association tasks. Various forms of rules that allow expression of different types of knowledge are used: classical prepositional logic (C-rules), association rules (Arules), fuzzy logic (F-rules), M-of-N or threshold rules (T-rules), similarity or prototype-based rules (P-rules). Algorithms for extraction of rules from data have been advanced in Statistics, Machine Learning, Computational Intelligence and Artificial Intelligence fields.

SNAPSHOTS

5. Conclusion

In this project report, detection of subscription fraud and bad debts in telecommunication using BayesNet and JRip algorithm pattern learning have been mentioned. Theoritical and experimental results have been demonstrated which showed that pattern learning technique can be useful in detecting subscription fraud and bad debts in telecommunication.

6. References

Cortes, C., Pregibon, D. Signature-based methods for data streams. Data Mining and Knowledge Discovery 2001; 5(3):167-182.

Cortes, C., Pregibon, D. Giga-mining. Proceedings of the Fourth International Conference on Knowledge Discovery and Data Mining; 174-178, 1998 August 27-31; New York, NY: AAAI Press, 1998.

Ezawa, K., Norton, S. Knowledge discovery in telecommunication services data using Bayesian network models. Proceedings of the First International Conference on Knowledge Discovery and Data Mining; 1995 August 20-21. Montreal Canada. AAAI Press: Menlo Park, CA, 1995.

Fawcett, T., Provost, F. Adaptive fraud detection. Data Mining and Knowledge Discovery 1997; 1(3):291-316.

Fawcett, T, Provost, F. Activity monitoring: Noticing interesting changes in behavior.