1.1 BACKGROUND TO THE STUDY
telecommunications industry generates and stores a tremendousamount of data
(Han et al, 2002). These data include call detail data, which describes the
callsthat traverse the telecommunication networks, network data, which describesthe
state of the hardware and software components in the network, andcustomer data,
which describes the telecommunication customers (Roset et al, 1999). Theamount
of data is so great that manual analysis of the data is difficult, if
notimpossible. The need to handle such large volumes of data led to
thedevelopment of knowledge-based expert systems. These automated
systemsperformed important functions such as identifying fraudulent phone calls
andidentifying network faults. The problem with this approach is that it is
timeconsumingto obtain the knowledge from human experts (the
“knowledgeacquisition bottleneck”) and, in many cases, the experts do not have
the requisite knowledge. The advent of data mining technology promisedsolutions
to these problems and for this reason the telecommunicationsindustry was an
early adopter of data mining technology (Roset et al, 1999).
data pose several interesting issues for data mining.The first concerns scale,
since telecommunication databases may containbillions of records and are
amongst the largest in the world. A second issueis that the raw data is often
not suitable for data mining. For example, bothcall detail and network data are
time-series data that represent individualevents. Before this data can be
effectively mined, useful “summary” featuresmust be identified and then the
data must be summarized using thesefeatures. Because many data mining
applications in the telecommunicationsindustry involve predicting very rare
events, such as the failure of a networkelement or an instance of telephone
fraud, rarity is another issue that must bedealt with. The fourth and final
data mining issue concerns real-time performance because many data mining
applications, such as fraud detection, requirethat any learned model/rules be
applied in real-time (Ezawa& Norton, 1995). Several techniques has also
been applied is tackling all these issues in telecommunication companies.
networks are extremely complex configurations ofequipment, comprised of
thousands of interconnected components. Eachnetwork element is capable of
generating error and status messages, whichleads to a tremendous amount of
network data. This data must be stored and analyzed in order to support network
management functions, such as faultisolation. This data will minimally include
a timestamp, a string thatuniquely identifies the hardware or software
component generating themessage and a code that explains why the message is
being generated. Forexample, such a message might indicate that “controller 7
experienced a lossof power for 30 seconds starting at 10:03 pm on Monday, May
to the enormous number of network messages generated, technicianscannot
possibly handle every message. For this reason expert systems havebeen developed
to automatically analyze these messages and takeappropriate action, only
involving a technician when a problem cannot beautomatically resolved (Weiss,
Ros&Singhal, 1998). This study is focused on MTN Nigeria.
Nigeria is part of the MTN Group, Africa's leading cellular telecommunications
company. On May 16, 2001, MTN became the first GSM network to make a call
following the globally lauded Nigerian GSM auction conducted by the Nigerian
Communications Commission earlier in the year. Thereafter the company launched
full commercial operations beginning with Lagos, Abuja and Port Harcourt.MTN
paid $285m for one of four GSM licenses in Nigeria in January 2001. To date, in
excess of US$1.8 billion has been invested building mobile telecommunications infrastructure
launch in August 2001, MTN has steadily deployed its services across Nigeria.
It now provides services in 223 cities and towns, more than 10,000 villages and
communities and a growing number of highways across the country, spanning the
36 states of the Nigeria and the Federal Capital Territory, Abuja. Many of
these villages and communities are being connected to the world of
telecommunications for the first time ever.
1.2 STATEMENT OF THE PROBLEM
is a serious problem for telecommunication companies, leading tobillions of
dollars in lost revenue each year. Fraud can be divided into twocategories:
subscription fraud and superimposition fraud. Subscription fraudoccurs when a
customer opens an account with the intention of never payingfor the account
charges. Superimposition fraud involves a legitimate accountwith some
legitimate activity, but also includes some “superimposed”illegitimate activity
by a person other than the account holder.Superimposition fraud poses a bigger
problem for the telecommunicationsindustry and for this reason data mining
technique is used for identifying this typeof fraud. These applications should
ideally operate in real-time using the calldetail records and, once fraud is
detected or suspected, should trigger someaction. This action may be to
immediately block the call and/or deactivatethe account, or may involve opening
an investigation, which will result in acall to the customer to verify the
legitimacy of the account activity. However, this study will examine various
data mining techniques of telecommunication companies in Nigeria.
1.3 OBJECTIVES OF THE STUDY
The following are the objectives of this study:
provide an overview on data mining.
examine the various
data mining techniques of telecommunication companies in Nigeria
3. To identify the challenges of data
mining faced by telecommunication companies in Nigeria
1.4 RESEARCH QUESTIONS
is data mining?
are the various data mining techniques of telecommunication
companies in Nigeria?
3. What are the challenges of data mining
faced by telecommunication companies in Nigeria?
1.6 SIGNIFICANCE OF THE STUDY
The following are the significance of this study:
outcome of this study will educate on data mining techniques of
telecommunication companies in Nigeria, the data mining applications and how
they can be used in fraud detection.
research will be a contribution to the body of literature in the area of the
effect of personality trait on student’s academic performance, thereby
constituting the empirical literature for future research in the subject area.
1.7 SCOPE/LIMITATIONS OF THE STUDY
This study will cover various data mining
techniques used by telecommunication companies in Nigeria.
LIMITATION OF STUDY
Financial constraint- Insufficient fund tends to impede the
efficiency of the researcher in sourcing for the relevant materials, literature
or information and in the process of data collection (internet, questionnaire
Time constraint- The researcher will simultaneously
engage in this study with other academic work. This consequently will cut down
on the time devoted for the research work.
Weiss, G. M., Ros, J, Singhal, A. ANSWER: Network monitoring
using object-oriented rule. Proceedings of the Tenth Conference on Innovative
Applications of Artificial Intelligence; 1087-1093. AAAI Press, Menlo Park, CA,
Ezawa, K., Norton, S. Knowledge discovery in
telecommunication services data using Bayesian network models. Proceedings of
the First International Conference on Knowledge Discovery and Data Mining; 1995
August 20-21. Montreal Canada. AAAI Press: Menlo Park, CA, 1995.
Han, J., Altman, R. B., Kumar, V., Mannila, H., Pregibon, D.
Emerging scientific applications in data mining. Communications of the ACM
2002; 45(8): 54-58
Roset, S., Murad, U., Neumann, E., Idan, Y., Pinkas, G.
Discovery of fraud rules for telecommunications—challenges and
solutions.Proceedings of the Fifth ACM SIGKDD International Conference on Knowledge
Discovery and Data Mining; 409-413, San Diego CA. New York: ACM Press, 1999.