ieee final year projects 2011-2012 :: elysium technologies pvt ltd::data mining
Post on 18-Jan-2015
3.735 Views
Preview:
DESCRIPTION
TRANSCRIPT
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
A b s t r a c t DATA ENGINEERING 2011 - 2012
01 Dual Framework and Algorithms for Targeted Online Data Delivery
A variety of emerging online data delivery applications challenge existing techniques for data delivery to human users,
applications, or middleware that are accessing data from multiple autonomous servers. In this paper, we develop a
framework for formalizing and comparing pull-based solutions and present dual optimization approaches. The first
approach, most commonly used nowadays, maximizes user utility under the strict setting of meeting a priori constraints
on the usage of system resources. We present an alternative and more flexible approach that maximizes user utility by
satisfying all users. It does this while minimizing the usage of system resources. We discuss the benefits of this latter
approach and develop an adaptive monitoring solution Satisfy User Profiles (SUPs). Through formal analysis, we identify
sufficient optimality conditions for SUP. Using real (RSS feeds) and synthetic traces, we empirically analyze the behavior
of SUP under varying conditions. Our experiments show that we can achieve a high degree of satisfaction of user utility
when the estimations of SUP closely estimate the real event stream, and has the potential to save a significant amount of
system resources. We further show that SUP can exploit feedback to improve user utility with only a moderate increase in
resource utilization.
02 A Flexible Data and Sensor A Fast Multiple Longest Common Subsequence (MLCS) Algorithm
How to achieve a flexible data and sensor planning service to schedule, plan, and empower diverse sensors and
heterogeneous data ordering systems is a big challenge. In this paper, a service-oriented framework of data and sensor
planning service for virtual sensors is proposed. The framework includes an Open Geospatial Consortium (OGC)-compliant
Sensor Planning Service (SPS), a Web Notification Service (WNS), a Sensor Observation Service (SOS), and virtual sensors.
There are two important key technologies in this framework, namely a flexible SPS middleware and an asynchronous
message notification mechanism. The flexible SPS middleware, based on a configuration file and standard interfaces, is
adopted to integrate virtual sensors into a sensor Web. A WNS-based asynchronous notification middleware is used to
inform the user of the status of a task that may need midterm or long-term actions. The framework has been successfully
demonstrated in application scenarios for Simplified General Perturbations Satellite Orbit Model 4 (SGP4) and Earth
Observation System Clearing HOuse (ECHO). The results show that the proposed method has the following improvements
over the existing SPS implementation: a uniform planning service for more satellites, a seamless connection with data
order systems, and a flexible service-oriented framework for virtual sensors.
03 A Fuzzy Self-Constructing Feature Clustering Algorithm for Text Classification
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text classification. In this
paper, we propose a fuzzy similarity-based self-constructing algorithm for feature clustering. The words in the
feature vector of a document set are grouped into clusters, based on similarity test. Words that are similar to each
other are grouped into the same cluster. Each cluster is characterized by a membership function with statistical
mean and deviation. When all the words have been fed in, a desired number of clusters are formed automatically. We
then have one extracted feature for each cluster. The extracted feature, corresponding to a cluster, is a weighted
combination of the words contained in the cluster. By this algorithm, the derived membership functions match
closely with and describe properly the real distribution of the training data. Besides, the user need not specify the
number of extracted features in advance, and trial-and-error for determining the appropriate number of extracted
1
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
features can then be avoided. Experimental results show that our method can run faster and obtain better extracted
features than other methods.
04 A Generic Multilevel Architecture for Time Series Prediction
Rapidly evolving businesses generate massive amounts of time-stamped data sequences and cause a demand for both
univariate and multivariate time series forecasting. For such data, traditional predictive models based on autoregression
are often not sufficient to capture complex nonlinear relationships between multidimensional features and the time series
outputs. In order to exploit these relationships for improved time series forecasting while also better dealing with a wider
variety of prediction scenarios, a forecasting system requires a flexible and generic architecture to accommodate and tune
various individual predictors as well as combination methods. In reply to this challenge, an architecture for combined,
multilevel time series prediction is proposed, which is suitable for many different universal regressors and combination
methods. The key strength of this architecture is its ability to build a diversified ensemble of individual predictors that form
an input to a multilevel selection and fusion process before the final optimized output is obtained. Excellent generalization
ability is achieved due to the highly boosted complementarity of individual models further enforced through cross-
validation-linked training on exclusive data subsets and ensemble output postprocessing. In a sample configuration with
basic neural network predictors and a mean combiner, the proposed system has been evaluated in different scenarios and
showed a clear prediction performance gain.
05 A Link Analysis Extension of Correspondence Analysis for Mining Relational Databases
This work introduces a link analysis procedure for discovering relationships in a relational database or a graph,
generalizing both simple and multiple correspondence analysis. It is based on a random walk model through the
database defining a Markov chain having as many states as elements in the database. Suppose we are interested in
analyzing the relationships between some elements (or records) contained in two different tables of the relational
database. To this end, in a first step, a reduced, much smaller, Markov chain containing only the elements of interest
and preserving the main characteristics of the initial chain, is extracted by stochastic complementation [41]. This
reduced chain is then analyzed by projecting jointly the elements of interest in the diffusion map subspace [42] and
visualizing the results. This two-step procedure reduces to simple correspondence analysis when only two tables are
defined, and to multiple correspondence analysis when the database takes the form of a simple star-schema. On the
other hand, a kernel version of the diffusion map distance, generalizing the basic diffusion map distance to directed
graphs, is also introduced and the links with spectral clustering are discussed. Several data sets are analyzed by
using the proposed methodology, showing the usefulness of the technique for extracting relationships in relational
databases or graphs.
06 A Personalized Ontology Model for Web Information Gathering
As a model for knowledge description and formalization, ontologies are widely used to represent user profiles in
personalized web information gathering. However, when representing user profiles, many models have utilized only
knowledge from either a global knowledge base or a user local information. In this paper, a personalized ontology model is
proposed for knowledge representation and reasoning over user profiles. This model learns ontological user profiles from
both a world knowledge base and user local instance repositories. The ontology model is evaluated by comparing it against
benchmark models in web information gathering. The results show that this ontology model is successful
2
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
07 Adaptive Cluster Distance Bounding for High-Dimensional Indexing
We consider approaches for similarity search in correlated, high-dimensional data sets, which are derived within a
clustering framework. We note that indexing by “vector approximation” (VA-File), which was proposed as a
technique to combat the “Curse of Dimensionality,” employs scalar quantization, and hence necessarily ignores
dependencies across dimensions, which represents a source of suboptimality. Clustering, on the other hand,
exploits interdimensional correlations and is thus a more compact representation of the data set. However, existing
methods to prune irrelevant clusters are based on bounding hyperspheres and/or bounding rectangles, whose lack
of tightness compromises their efficiency in exact nearest neighbor search. We propose a new cluster-adaptive
distance bound based on separating hyperplane boundaries of Voronoi clusters to complement our cluster based
index. This bound enables efficient spatial filtering, with a relatively small preprocessing storage overhead and is
applicable to euclidean and Mahalanobis similarity measures. Experiments in exact nearest-neighbor set retrieval,
conducted on real data sets, show that our indexing method is scalable with data set size and data dimensionality
and outperforms several recently proposed indexes. Relative to the VA-File, over a wide range of quantization
resolutions, it is able to reduce random IO accesses, given (roughly) the same amount of sequential IO operations,
by factors reaching 100X and more.
08 Anonymous Publication of Sensitive Transactional Data
Existing research on privacy-preserving data publishing focuses on relational data: in this context, the objective is to
enforce privacy-preserving paradigms, such as k-anonymity and ‘-diversity, while minimizing the information loss incurred
in the anonymizing process (i.e., maximize data utility). Existing techniques work well for fixed-schema data, with low
dimensionality. Nevertheless, certain applications require privacy-preserving publishing of transactional data (or basket
data), which involve hundreds or even thousands of dimensions, rendering existing methods unusable. We propose two
categories of novel anonymization methods for sparse high-dimensional data. The first category is based on approximate
nearest-neighbor (NN) search in high-dimensional spaces, which is efficiently performed through locality-sensitive hashing
(LSH). In the second category, we propose two data transformations that capture the correlation in the underlying data: 1)
reduction to a band matrix and 2) Gray encoding-based sorting. These representations facilitate the formation of
anonymized groups with low information loss, through an efficient linear-time heuristic. We show experimentally, using
real-life data sets, that all our methods clearly outperform existing state of the art. Among the proposed techniques, NN-
search yields superior data utility compared to the band matrix transformation, but incurs higher computational overhead.
The data transformation based on Gray code sorting performs best in terms of both data utility and execution time.
09 Answering Frequent Probabilistic Inference Queries in Databases
Existing solutions for probabilistic inference queries mainly focus on answering a single inference query, but seldom
address the issues of efficiently returning results for a sequence of frequent queries, which is more popular and
practical in many real applications. In this paper, we mainly study the computation caching and sharing among a
sequence of inference queries in databases. The clique tree propagation (CTP) algorithm is first introduced in
databases for probabilistic inference queries. We use the materialized views to cache the intermediate results of the
previous inference queries, which might be shared with the following queries, and consequently reduce the time
cost. Moreover, we take the query workload into account to identify the frequently queried variables. To optimize
probabilistic inference queries with CTP, we cache these frequent query variables into the materialized views to
maximize the reuse. Due to the existence of different query plans, we present heuristics to estimate costs and select
the optimal query plan. Finally, we present the experimental evaluation in relational databases to illustrate the validity
and superiority of our approaches in answering frequent probabilistic inference queries.
3
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
10 Authenticated Multistep Nearest Neighbor Search
Multistep processing is commonly used for nearest neighbor (NN) and similarity search in applications involving
highdimensional data and/or costly distance computations. Today, many such applications require a proof of result
correctness. In this setting, clients issue NN queries to a server that maintains a database signed by a trusted authority.
The server returns the NN set along with supplementary information that permits result verification using the data set
signature. An adaptation of the multistep NN algorithm incurs prohibitive network overhead due to the transmission of false
hits, i.e., records that are not in the NN set, but are nevertheless necessary for its verification. In order to alleviate this
problem, we present a novel technique that reduces the size of each false hit. Moreover, we generalize our solution for a
distributed setting, where the database is horizontally partitioned over several servers. Finally, we demonstrate the
effectiveness of the proposed solutions with real data sets of various dimensionalities.
11 Automatic Discovery of Personal Name Aliases from the Web
An individual is typically referred by numerous name aliases on the web. Accurate identification of aliases of a given
person name is useful in various web related tasks such as information retrieval, sentiment analysis, personal name
disambiguation, and relation extraction. We propose a method to extract aliases of a given personal name from the
web. Given a personal name, the proposed method first extracts a set of candidate aliases. Second, we rank the
extracted candidates according to the likelihood of a candidate being a correct alias of the given name. We propose a
novel, automatically extracted lexical pattern-based approach to efficiently extract a large set of candidate aliases
from snippets retrieved from a web search engine. We define numerous ranking scores to evaluate candidate aliases
using three approaches: lexical pattern frequency, word co-occurrences in an anchor text graph, and page counts on
the web. To construct a robust alias detection system, we integrate the different ranking scores into a single ranking
function using ranking support vector machines. We evaluate the proposed method on three data sets: an English
personal names data set, an English place names data set, and a Japanese personal names data set. The proposed
method outperforms numerous baselines and previously proposed name alias extraction methods, achieving a
statistically significant mean reciprocal rank (MRR) of 0.67. Experiments carried out using location names and
Japanese personal names suggest the possibility of extending the proposed method to extract aliases for different
types of named entities, and for different languages. Moreover, the aliases extracted using the proposed method are
successfully utilized in an information retrieval task and improve recall by 20 percent in a relationdetection task.
12 Geospatial Automatic Enrichment of Semantic Relation Network and Its Application to Word Sense Disambiguation
The most fundamental step in semantic information processing (SIP) is to construct knowledge base (KB) at the human
level; that is to the general understanding and conception of human knowledge. WordNet has been built to be the most
systematic and as close to the human level and is being applied actively in various works. In one of our previous research,
we found that a semantic gap exists between concept pairs of WordNet and those of real world. This paper contains a study
on the enrichment method to build a KB. We describe the methods and the results for the automatic enrichment of the
semantic relation network. A rule based method using WordNet’s glossaries and an inference method using axioms for
WordNet relations are applied for the enrichment and an enriched WordNet (E-WordNet) is built as the result. Our
experimental results substantiate the usefulness of E-WordNet. An evaluation by comparison with the human level is
attempted. Moreover, WSD-SemNet, a new word sense disambiguation (WSD) method in which E-WordNet is applied, is
proposed and evaluated by comparing it with the state-of-the-art algorithm..
4
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
13 Branch-and-Bound for Model Selection and Its Computational Complexity
Branch-and-bound methods are used in various data analysis problems, such as clustering, seriation and feature
selection. Classical approaches of branch-and-bound based clustering search through combinations of various
partitioning possibilities to optimize a clustering cost. However, these approaches are not practically useful for
clustering of image data where the size of data is large. Additionally, the number of clusters is unknown in most of
the image data analysis problems. By taking advantage of the spatial coherency of clusters, we formulate an
innovative branch-and-bound approach, which solves clustering problem as a model-selection problem. In this
generalized approach, cluster parameter candidates are first generated by spatially coherent sampling. A branch-
andbound search is carried out through the candidates to select an optimal subset. This paper formulates this
approach and investigates its average computational complexity. Improved clustering quality and robustness to
outliers compared to conventional iterative approach are demonstrated with experiments.
14 Measuring Client-Perceived Pageview Response Time of Internet Services
As e-commerce services are exponentially growing, businesses need quantitative estimates of client-perceived response
times to continuously improve the quality of their services. Current server-side nonintrusive measurement techniques are
limited to nonsecured HTTP traffic. In this paper, we present the design and evaluation a monitor, namely sMonitor, which
is able to measure client-perceived response times for both HTTP and HTTPS traffic. At the heart of sMonitor is a novel
size-based analysis method that parses live packets to delimit different webpages and to infer their response times. The
method is based on the observation that most HTTP(S)-compatible browsers send significantly larger requests for
container objects than those for embedded objects. sMonitor is designed to operate accurately in the presence of
complicated browser behaviors, such as parallel downloading of multiple webpages and HTTP pipelining, as well as packet
losses and delays. It requires only to passively collect network traffic in and out of the monitored secured services. We
conduct comprehensive experiments across a wide range of operating conditions using live secured Internet services, on
the PlanetLab, and on controlled networks. The experimental results demonstrate that sMonitor is able to control the
estimation error within 6.7 percent, in comparison with the actual measured time at the client side.
15 Classification and Novel Class Detection in Concept-Drifting Data Streams under Time Constraints
Most existing data stream classification techniques ignore one important aspect of stream data: arrival of a novel
class. We address this issue and propose a data stream classification technique that integrates a novel class
detection mechanism into traditional classifiers, enabling automatic detection of novel classes before the true labels
of the novel class instances arrive. Novel class detection problem becomes more challenging in the presence of
concept-drift, when the underlying data distributions evolve in streams. In order to determine whether an instance
belongs to a novel class, the classification model sometimes needs to wait for more test instances to discover
similarities among those instances. A maximum allowable wait time Tc is imposed as a time constraint to classify a
test instance. Furthermore, most existing stream classification approaches assume that the true label of a data point
can be accessed immediately after the data point is classified. In reality, a time delay Tl is involved in obtaining the
true label of a data point since manual labeling is time consuming. We show how to make fast and correct
classification decisions under these constraints and apply them to real benchmark data. Comparison with state-of-
the-art stream classification techniques proves the superiority of our approach.
5
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
16 Classification Using Streaming Random Forests
We consider the problem of data stream classification, where the data arrive in a conceptually infinite stream, and the
opportunity to examine each record is brief. We introduce a stream classification algorithm that is online, running in
amortized Oð1Þ time, able to handle intermittent arrival of labeled records, and able to adjust its parameters to respond to
changing class boundaries (“concept drift”) in the data stream. In addition, when blocks of labeled data are short, the
algorithm is able to judge internally whether the quality of models updated from them is good enough for deployment on
unlabeled records, or whether further labeled records are required. Unlike most proposed stream-classification algorithms,
multiple target classes can be handled. Experimental results on real and synthetic data show that accuracy is comparable
to a conventional classification algorithm that sees all of the data at once and is able to make multiple passes over it.
17 CoFiDS: A Belief-Theoretic Approach for Automated Collaborative Filtering
Automated Collaborative Filtering (ACF) refers to a group of algorithms used in recommender systems, a research topic
that has received considerable attention due to its e-commerce applications. However, existing techniques are rarely
capable of dealing with imperfections in user-supplied ratings. When such imperfections (e.g., ambiguities) cannot be
avoided, designers resort to simplifying assumptions that impair the system’s performance and utility. We have developed
a novel technique referred to as CoFiDS—Collaborative Filtering based on Dempster-Shafer belief-theoretic framework—
that can represent a wide variety of data imperfections, propagate them throughout the decision-making process without
the need to make simplifying assumptions, and exploit contextual information. With its DS-theoretic predictions, the
domain expert can either obtain a “hard” decision or can narrow the set of possible predictions to a smaller set. With its
capability to handle data imperfections, CoFiDS widens the applicability of ACF to such critical and sensitive domains as
medical decision support systems and defense-related applications. We describe the theoretical foundation of the system
and report experiments with a benchmark movie data set. We explore some essential aspects of CoFiDS’ behavior and
show that its performance compares favorably with other ACF systems
18 Collaborative Filtering with Personalized Skylines
Collaborative filtering (CF) systems exploit previous ratings and similarity in user behavior to recommend the top-k objects/
records which are potentially most interesting to the user assuming a single score per object. However, in various
applications, a record (e.g., hotel) maybe rated on several attributes (value, service, etc.), in which case simply returning the
ones with the highest overall scores fails to capture the individual attribute characteristics and to accommodate different
selection criteria. In order to enhance the flexibility of CF, we propose Collaborative Filtering Skyline (CFS), a general
framework that combines the advantages of CF with those of the skyline operator. CFS generates a personalized skyline for
each user based on scores of other users with similar behavior. The personalized skyline includes objects that are good on
certain aspects, and eliminates the ones that are not interesting on any attribute combination. Although the integration of
skylines and CF has several attractive properties, it also involves rather expensive computations. We face this challenge
through a comprehensive set of algorithms and optimizations that reduce the cost of generating personalized skylines. In
addition to exact skyline processing, we develop an approximate method that provides error guarantees. Finally, we
propose the top-k personalized skyline, where the user specifies the required output cardinality
6
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
19 Comprehensive Citation Index for Research Networks
The existing Science Citation Index only counts direct citations, whereas PageRank disregards the number of direct
citations. We propose a new Comprehensive Citation Index (CCI) that evaluates both direct and indirect intellectual
influence of research papers, and show that CCI is more reliable in discovering research papers with far-reaching
influence.
20 Constrained Skyline Query Processing against Distributed Data.
The skyline of a multidimensional point set is a subset of interesting points that are not dominated by others. In this paper,
we investigate constrained skyline queries in a large-scale unstructured distributed environment, where relevant data are
distributed among geographically scattered sites. We first propose a partition algorithm that divides all data sites into
incomparable groups such that the skyline computations in all groups can be parallelized without changing the final result.
We then develop a novel algorithm framework called PaDSkyline for parallel skyline query processing among partitioned
site groups. We also employ intragroup optimization and multifiltering technique to improve the skyline query processes
within each group. In particular, multiple (local) skyline points are sent together with the query as filtering points, which
help identify unqualified local skyline points early on a data site. In this way, the amount of data to be transmitted via
network connections is reduced, and thus, the overall query response time is shortened further. Cost models and heuristics
are proposed to guide the selection of a given number of filtering points from a superset. A costefficient model is
developed to determine how many filtering points to use for a particular data site.
21 Continuous Monitoring of Distance-Based Range Queries
Given a positive value r, a distance-based range query returns the objects that lie within the distance r of the query
location. In this paper, we focus on the distance-based range queries that continuously change their locations in a
euclidean space. We present an efficient and effective monitoring technique based on the concept of a safe zone.
The safe zone of a query is the area with a property that while the query remains inside it, the results of the query
remain unchanged. Hence, the query does not need to be reevaluated unless it leaves the safe zone. Our
contributions are as follows: 1) We propose a technique based on powerful pruning rules and a unique access order
which efficiently computes the safe zone and minimizes the I/O cost. 2) We theoretically determine and
experimentally verify the expected distance a query moves before leaving the safe zone and, for majority of queries,
the expected number of guard objects. 3) Our experiments demonstrate that the proposed approach is close to
optimal and is an order of magnitude faster than a naı¨ve algorithm. 4) We also extend our technique to monitor the
queries in a road network. Our algorithm is up to two order of magnitude faster than a naı¨ve algorithm.
22 Cosdes: A Collaborative Spam Detection System with a Novel E-Mail Abstraction Scheme
The E-mail communication is indispensable nowadays, but the e-mail spam problem continues growing drastically. In
recent years, the notion of collaborative spam filtering with near-duplicate similarity matching scheme has been widely
discussed. The primary idea of the similarity matching scheme for spam detection is to maintain a known spam database,
formed by user feedback, to block subsequent near-duplicate spams. On purpose of achieving efficient similarity matching
and reducing storage utilization, prior works mainly represent each e-mail by a succinct abstraction derived from e-mail
content text. However, these abstractions of e-mails cannot fully catch the evolving nature of spams, and are thus not
effective enough in near-duplicate detection. In this paper, we propose a novel e-mail abstraction scheme, which considers
7
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
e-mail layout structure to represent e-mails. We present a procedure to generate the e-mail abstraction using HTML content
in e-mail, and this newly devised abstraction can more effectively capture the near-duplicate phenomenon of spams.
Moreover, we design a complete spam detection system Cosdes (standing for COllaborative Spam DEtection System),
which possesses an efficient near-duplicate matching scheme and a progressive update scheme. The progressive update
scheme enables system Cosdes to keep the most up-to-date information for near-duplicate detection. We evaluate Cosdes
on a live data set collected from a real e-mail server and show that our system outperforms the prior approaches in
detection results and is applicable to the real world.
23 Coupling Logical Analysis of Data and Shadow Clustering for Partially Defined Positive Boolean Function Reconstruction
The problem of reconstructing the AND-OR expression of a partially defined positive Boolean function (pdpBf) is
solved by adopting a novel algorithm, denoted by LSC, which combines the advantages of two efficient techniques,
Logical Analysis of Data (LAD) and Shadow Clustering (SC). The kernel of the approach followed by LAD consists in
a breadth-first enumeration of all the prime implicants whose degree is not greater than a fixed maximum d. In
contrast, SC adopts an effective heuristic procedure for retrieving the most promising logical products to be
included in the resulting AND-OR expression. Since the computational cost required by LAD prevents its application
even for relatively small dimensions of the input domain, LSC employs a depth-first approach, with asymptotically
linear memory occupation, to analyze the prime implicants having degree not greater than d. In addition, the
theoretical analysis proves that LSC presents almost the same asymptotic time complexity as LAD. Extensive
simulations on artificial benchmarks validate the good behavior of the computational cost exhibited by LSC, in
agreement with the theoretical analysis. Furthermore, the pdpBf retrieved by LSC always shows a better
performance, in terms of complexity and accuracy, with respect to those obtained by LAD.
24 Data Leakage Detection
We study the following problem: A data distributor has given sensitive data to a set of supposedly trusted agents (third
parties). Some of the data are leaked and found in an unauthorized place (e.g., on the web or somebody’s laptop). The
distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been
independently gathered by other means. We propose data allocation strategies (across the agents) that improve the
probability of identifying leakages. These methods do not rely on alterations of the released data (e.g., watermarks). In
some cases, we can also inject “realistic but fake” data records to further improve our chances of detecting leakage and
identifying the guilty party.
25 Decision Trees for Uncertain Data
Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers
to handle data with uncertain information. Value uncertainty arises in many applications during the data collection
process. Example sources of uncertainty include measurement/quantization errors, data staleness, and multiple
repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but
by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives
(such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the
“complete information” of a data item (taking into account the probability density function (pdf)) is utilized. We
extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments
have been conducted which show that the resulting classifiers are more accurate than those using value averages.
Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree
construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose
a series of pruning techniques that can greatly improve construction efficiency.
8
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
26 Design and Implementation of an Intrusion Response System for Relational Databases
The intrusion response component of an overall intrusion detection system is responsible for issuing a suitable response
to an anomalous request. We propose the notion of database response policies to support our intrusion response system
tailored for a DBMS. Our interactive response policy language makes it very easy for the database administrators to specify
appropriate response actions for different circumstances depending upon the nature of the anomalous request. The two
main issues that we address in context of such response policies are that of policy matching, and policy administration. For
the policy matching problem, we propose two algorithms that efficiently search the policy database for policies that match
an anomalous request. We also extend the PostgreSQL DBMS with our policy matching mechanism, and report
experimental results. The experimental evaluation shows that our techniques are very efficient. The other issue that we
address is that of administration of response policies to prevent malicious modifications to policy objects from legitimate
users. We propose a novel Joint Threshold Administration Model (JTAM) that is based on the principle of separation of
duty. The key idea in JTAM is that a policy object is jointly administered by at least k database administrator (DBAs), that is,
any modification made to a policy object will be invalid unless it has been authorized by at least k DBAs. We present design
details of JTAM which is based on a cryptographic threshold signature scheme, and show how JTAM prevents malicious
modifications to policy objects from authorized users. We also implement JTAM in the PostgreSQL DBMS, and report
experimental results on the efficiency of our techniques.
27 Differential Privacy via Wavelet Transforms
Privacy-preserving data publishing has attracted considerable research interest in recent years. Among the existing
solutions, ˇ-differential privacy provides the strongest privacy guarantee. Existing data publishing methods that
achieve ˇ-differential privacy, however, offer little data utility. In particular, if the output data set is used to answer
count queries, the noise in the query answers can be proportional to the number of tuples in the data, which renders
the results useless. In this paper, we develop a data publishing technique that ensures ˇ-differential privacy while
providing accurate answers for range-count queries, i.e., count queries where the predicate on each attribute is a
range. The core of our solution is a framework that applies wavelet transforms on the data before adding noise to it.
We present instantiations of the proposed framework for both ordinal and nominal data, and we provide a theoretical
analysis on their privacy and utility guarantees. In an extensive experimental study on both real and synthetic data,
we show the effectiveness and efficiency of our solution.
28 Discovering Activities to Recognize and Track in a Smart Environment
The machine learning and pervasive sensing technologies found in smart homes offer unprecedented opportunities for
providing health monitoring and assistance to individuals experiencing difficulties living independently at home. In order to
monitor the functional health of smart home residents, we need to design technologies that recognize and track activities
that people normally perform as part of their daily routines. Although approaches do exist for recognizing activities, the
approaches are applied to activities that have been preselected and for which labeled training data are available. In
contrast, we introduce an automated approach to activity tracking that identifies frequent activities that naturally occur in
an individual’s routine. With this capability, we can then track the occurrence of regular activities to monitor functional
health and to detect changes in an individual’s patterns and lifestyle. In this paper, we describe our activity mining and
tracking approach, and validate our algorithms on data collected in physical smart environments.
9
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
29 Discovering Conditional Functional Dependencies
This paper investigates the discovery of conditional functional dependencies (CFDs). CFDs are a recent extension
of functional dependencies (FDs) by supporting patterns of semantically related constants, and can be used as rules
for cleaning relational data. However, finding quality CFDs is an expensive process that involves intensive manual
effort. To effectively identify data cleaning rules, we develop techniques for discovering CFDs from relations. Already
hard for traditional FDs, the discovery problem is more difficult for CFDs. Indeed, mining patterns in CFDs introduces
new challenges. We provide three methods for CFD discovery. The first, referred to as CFDMiner, is based on
techniques for mining closed item sets, and is used to discover constant CFDs, namely, CFDs with constant patterns
only. Constant CFDs are particularly important for object identification, which is essential to data cleaning and data
integration. The other two algorithms are developed for discovering general CFDs. One algorithm, referred to as
CTANE, is a levelwise algorithm that extends TANE, a well-known algorithm for mining FDs. The other, referred to as
FastCFD, is based on the depth-first approach used in FastFD, a method for discovering FDs. It leverages closed-
item-set mining to reduce the search space. As verified by our experimental study, CFDMiner can be multiple orders
of magnitude faster than CTANE and FastCFD for constant CFD discovery. CTANE works well when a given relation
is large, but it does not scale well with the arity of the relation. FastCFD is far more efficient than CTANE when the
arity of the relation is large; better still, leveraging optimization based on closed-item-set mining, FastCFD also
scales well with the size of the relation. These algorithms provide a set of cleaning-rule discovery tools for users to
choose for different applications.
30 Effective Navigation of Query Results Based on Concept Hierarchies
Search queries on biomedical databases, such as PubMed, often return a large number of results, only a small
subset of which is relevant to the user. Ranking and categorization, which can also be combined, have been
proposed to alleviate this information overload problem. Results categorization for biomedical databases is the focus
of this work. A natural way to organize biomedical citations is according to their MeSH annotations. MeSH is a
comprehensive concept hierarchy used by PubMed. In this paper, we present the BioNav system, a novel search
interface that enables the user to navigate large number of query results by organizing them using the MeSH concept
hierarchy. First, the query results are organized into a navigation tree. At each node expansion step, BioNav reveals
only a small subset of the concept nodes, selected such that the expected user navigation cost is minimized. In
contrast, previous works expand the hierarchy in a predefined static manner, without navigation cost modeling. We
show that the problem of selecting the best concepts to reveal at each node expansion is NP-complete and propose
an efficient heuristic as well as a feasible optimal algorithm for relatively small trees. We show experimentally that
BioNav outperforms state-of-the-art categorization systems by up to an order of magnitude, with respect to the user
navigation cost.
31 Efficient Periodicity Mining in Time Series Databases Using Suffix Trees
Periodic pattern mining or periodicity detection has a number of applications, such as prediction, forecasting,
detection of unusual activities, etc. The problem is not trivial because the data to be analyzed are mostly noisy and
different periodicity types (namely symbol, sequence, and segment) are to be investigated. Accordingly, we argue
that there is a need for a comprehensive approach capable of analyzing the whole time series or in a subsection of it
to effectively handle different types of noise (to a certain degree) and at the same time is able to detect different
types of periodic patterns; combining these under one umbrella is by itself a challenge. In this paper, we present an
algorithm which can detect symbol, sequence (partial), and segment (full cycle) periodicity in time series. The
algorithm uses suffix tree as the underlying data structure; this allows us to design the algorithm such that its
10
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
worstcase complexity is Oðk:n2Þ, where k is the maximum length of periodic pattern and n is the length of the
analyzed portion (whole or subsection) of the time series. The algorithm is noise resilient; it has been successfully
demonstrated to work with replacement, insertion, deletion, or a mixture of these types of noise. We have tested the
proposed algorithm on both synthetic and real data from different domains, including protein sequences. The
conducted comparative study demonstrate the applicability and effectiveness of the proposed algorithm; it is
generally more time-efficient and noise-resilient than existing algorithms.
32 A Efficient Relevance Feedback for Content-Based Image Retrieval by Mining User Navigation Patterns
Nowadays, content-based image retrieval (CBIR) is the mainstay of image retrieval systems. To be more profitable,
relevance feedback techniques were incorporated into CBIR such that more precise results can be obtained by taking
user’s feedbacks into account. However, existing relevance feedback-based CBIR methods usually request a number of
iterative feedbacks to produce refined search results, especially in a large-scale image database. This is impractical and
inefficient in real applications. In this paper, we propose a novel method, Navigation-Pattern-based Relevance Feedback
(NPRF), to achieve the high efficiency and effectiveness of CBIR in coping with the large-scale image data. In terms of
efficiency, the iterations of feedback are reduced substantially by using the navigation patterns discovered from the user
query log. In terms of effectiveness, our proposed search algorithm NPRFSearch makes use of the discovered navigation
patterns and three kinds of query refinement strategies, Query Point Movement (QPM), Query Reweighting (QR), and Query
Expansion (QEX), to converge the search space toward the user’s intention effectively. By using NPRF method, high quality
of image retrieval on RF can be achieved in a small number of feedbacks. The experimental results reveal that NPRF
outperforms other existing methods significantly in terms of precision, coverage, and number of feedbacks.
33 Efficient Techniques for Online Record Linkage
The need to consolidate the information contained in heterogeneous data sources has been widely documented in
recent years. In order to accomplish this goal, an organization must resolve several types of heterogeneity problems,
especially the entity heterogeneity problem that arises when the same real-world entity type is represented using
different identifiers in different data sources. Statistical record linkage techniques could be used for resolving this
problem. However, the use of such techniques for online record linkage could pose a tremendous communication
bottleneck in a distributed environment (where entity heterogeneity problems are often encountered). In order to
resolve this issue, we develop a matching tree, similar to a decision tree, and use it to propose techniques that
reduce the communication overhead significantly, while providing matching decisions that are guaranteed to be the
same as those obtained using the conventional linkage technique. These techniques have been implemented, and
experiments with real-world and synthetic databases show significant reduction in communication overhead.
34 Efficient Top-k Approximate Subtree Matching in Small Memory
We consider the Top-k Approximate Subtree Matching (TASM) problem: finding the k best matches of a small query tree
within a large document tree using the canonical tree edit distance as a similarity measure between subtrees. Evaluating
the tree edit distance for large XML trees is difficult: the best known algorithms have cubic runtime and quadratic space
complexity, and, thus, do not scale. Our solution is TASM-postorder, a memory-efficient and scalable TASM algorithm. We
prove an upper bound for the maximum subtree size for which the tree edit distance needs to be evaluated. The upper
bound depends on the query and is independent of the document size and structure. A core problem is to efficiently prune
subtrees that are above this size threshold. We develop an algorithm based on the prefix ring buffer that allows us to prune
all subtrees above the threshold in a single postorder scan of the document. The size of the prefix ring buffer is linear in the
threshold. As a result, the space complexity of TASM-postorder depends only on k and the query size, and the runtime of
TASM-postorder is linear in the size of the document. Our experimental evaluation on large synthetic and real XML
11
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
documents confirms our analytic results.
35 Energy Time Series Forecasting Based on Pattern Sequence Similarity
This paper presents a new approach to forecast the behavior of time series based on similarity of pattern sequences.
First, clustering techniques are used with the aim of grouping and labeling the samples from a data set. Thus, the
prediction of a data point is provided as follows: first, the pattern sequence prior to the day to be predicted is
extracted. Then, this sequence is searched in the historical data and the prediction is calculated by averaging all the
samples immediately after the matched sequence. The main novelty is that only the labels associated with each
pattern are considered to forecast the future behavior of the time series, avoiding the use of real values of the time
series until the last step of the prediction process. Results from several energy time series are reported and the
performance of the proposed method is compared to that of recently published techniques showing a remarkable
improvement in the prediction.
36 Energy Time Series Forecasting Based on Pattern Sequence Similarity
This paper presents a new approach to forecast the behavior of time series based on similarity of pattern sequences. First,
clustering techniques are used with the aim of grouping and labeling the samples from a data set. Thus, the prediction of a
data point is provided as follows: first, the pattern sequence prior to the day to be predicted is extracted. Then, this
sequence is searched in the historical data and the prediction is calculated by averaging all the samples immediately after
the matched sequence. The main novelty is that only the labels associated with each pattern are considered to forecast the
future behavior of the time series, avoiding the use of real values of the time series until the last step of the prediction
process. Results from several energy time series are reported and the performance of the proposed method is compared to
that of recently published techniques showing a remarkable improvement in the prediction.
37 Estimating and Enhancing Real-Time Data Service Delays: Control-Theoretic Approaches
It is essential to process real-time data service requests such as stock quotes and trade transactions in a timely
manner using fresh data, which represent the current real-world phenomena such as the stock market status. Users
may simply leave when the database service delay is excessive. Also, temporally inconsistent data may give an
outdated view of the real-world status. However, supporting the desired timeliness and freshness is challenging due
to dynamic workloads. To address the problem, we present new approaches for 1) database backlog estimation, 2)
fine-grained closed-loop admission control based on the backlog model, and 3) incoming load smoothing. Our
backlog estimation and control-theoretic approaches aim to support the desired service delay bound without
degrading the data freshness, critical for real-time data services. Specifically, we design, implement, and evaluate
two feedback controllers based on linear control theory and fuzzy logic control theory, to meet the desired service
delay. Workload smoothing, under overload, helps the database admit and process more transactions in a timely
fashion by probabilistically reducing the burstiness of incoming data service requests. In terms of the data service
delay and throughput, our closed-loop admission control and probabilistic load smoothing schemes considerably
outperform several baselines in the experiments undertaken in a stock trading database testbed.
38 Experience Transfer for the Configuration Tuning in Large-Scale Computing Systems
This paper proposes a new strategy, the experience transfer, to facilitate the management of large-scale computing
systems. It deals with the utilization of management experiences in one system (or previous systems) to benefit the same
12
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
management task in other systems (or current systems). We use the system configuration tuning as a case application to
demonstrate all procedures involved in the experience transfer including the experience representation, experience
extraction, and experience embedding. The dependencies between system configuration parameters are treated as
transferable experiences in the configuration tuning for two reasons: 1) because such knowledge is helpful to the efficiency
of the optimal configuration search, and 2) because the parameter dependencies are typically unchanged between two
similar systems. We use the Bayesian network to model configuration dependencies and present a configuration tuning
algorithm based on the Bayesian network construction and sampling. As a result, after the configuration tuning is
completed in the original system, we can obtain a Bayesian network as the by-product which records the dependencies
between system configuration parameters. Such a network is then embedded into the tuning process in other similar
systems as transferred experiences to improve the configuration search efficiency. Experimental results in a web-based
system show that with the help of transferred experiences, the configuration tuning process can be significantly
accelerated.
39 Exploring Application-Level Semantics for Data Compression
This Natural phenomena show that many creatures form large social groups and move in regular patterns. However,
previous works focus on finding the movement patterns of each single object or all objects. In this paper, we first
propose an efficient distributed mining algorithm to jointly identify a group of moving objects and discover their
movement patterns in wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D,
which exploits the obtained group movement patterns to reduce the amount of delivered data. The compression
algorithm includes a sequence merge and an entropy reduction phases. In the sequence merge phase, we propose a
Merge algorithm to merge and compress the location data of a group of moving objects. In the entropy reduction
phase, we formulate a Hit Item Replacement (HIR) problem and propose a Replace algorithm that obtains the optimal
solution. Moreover, we devise three replacement rules and derive the maximum compression ratio. The experimental
results show that the proposed compression algorithm leverages the group movement patterns to reduce the
amount of delivered data effectively and efficiently.
40 Exploring Application-Level Semantics for Data Compression
Natural phenomena show that many creatures form large social groups and move in regular patterns. However, previous
works focus on finding the movement patterns of each single object or all objects. In this paper, we first propose an
efficient distributed mining algorithm to jointly identify a group of moving objects and discover their movement patterns in
wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D, which exploits the obtained group
movement patterns to reduce the amount of delivered data. The compression algorithm includes a sequence merge and an
entropy reduction phases. In the sequence merge phase, we propose a Merge algorithm to merge and compress the
location data of a group of moving objects. In the entropy reduction phase, we formulate a Hit Item Replacement (HIR)
problem and propose a Replace algorithm that obtains the optimal solution. Moreover, we devise three replacement rules
and derive the maximum compression ratio. The experimental results show that the proposed compression algorithm
leverages the group movement patterns to reduce the amount of delivered data effectively and efficiently.
41 Finding Correlated Biclusters from Gene Expression Data
Extracting biologically relevant information from DNA microarrays is a very important task for drug development and
test, function annotation, and cancer diagnosis. Various clustering methods have been proposed for the analysis of
gene expression data, but when analyzing the large and heterogeneous collections of gene expression data,
conventional clustering algorithms often cannot produce a satisfactory solution. Biclustering algorithm has been
presented as an alternative approach to standard clustering techniques to identify local structures from gene
13
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
expression data set. These patterns may provide clues about the main biological processes associated with different
physiological states. In this paper, different from existing bicluster patterns, we first introduce a more general
pattern: correlated bicluster, which has intuitive biological interpretation. Then, we propose a novel transform
technique based on singular value decomposition so that identifying correlated-bicluster problem from gene
expression matrix is transformed into two global clustering problems. The Mixed-Clustering algorithm and the Lift
algorithm are devised to efficiently produce ˇ-corBiclusters. The biclusters obtained using our method from gene
expression data sets of multiple human organs and the yeast Saccharomyces cerevisiae demonstrate clear
biological meanings.
42 Frequent Item Computation on a Chip
This Computing frequent items is an important problem by itself and as a subroutine in several data mining algorithms. In
this paper, we explore how to accelerate the computation of frequent items using field-programmable gate arrays (FPGAs)
with a threefold goal: increase performance over existing solutions, reduce energy consumption over CPU-based systems,
and explore the design space in detail as the constraints on FPGAs are very different from those of traditional software-
based systems. We discuss three design alternatives, each one of them exploiting different FPGA features and each one
providing different performance/scalability trade-offs. An important result of the paper is to demonstrate how the inherent
massive parallelism of FPGAs can improve performance of existing algorithms but only after a fundamental redesign of the
algorithms. Our experimental results show that, e.g., the pipelined solution we introduce can reach more than 100 million
tuples per second of sustained throughput (four times the best available results to date) by making use of techniques that
are not available to CPU-based solutions. Moreover, and unlike in software approaches, the high throughput is independent
of the skew of the Zipf distribution of the input and at a far lower energy cost. paper presents a new approach to forecast
the behavior of time series based on similarity of pattern sequences. First, clustering techniques are used with the aim of
grouping and labeling the samples from a data set. Thus, the prediction of a data point is provided as follows: first, the
pattern sequence prior to the day to be predicted is extracted. Then, this sequence is searched in the historical data and the
prediction is calculated by averaging all the samples immediately after the matched sequence. The main novelty is that only
the labels associated with each pattern are considered to forecast the future behavior of the time series, avoiding the use of
real values of the time series until the last step of the prediction process. Results from several energy time series are
reported and the performance of the proposed method is compared to that of recently published techniques showing a
remarkable improvement in the prediction.
43 Inconsistency-Tolerant Integrity Checking
All methods for efficient integrity checking require all integrity constraints to be totally satisfied, before any update is
executed. However, a certain amount of inconsistency is the rule, rather than the exception in databases. In this
paper, we close the gap between theory and practice of integrity checking, i.e., between the unrealistic theoretical
requirement of total integrity and the practical need for inconsistency tolerance, which we define for integrity
checking methods. We show that most of them can still be used to check whether updates preserve integrity, even if
the current state is inconsistent. Inconsistency-tolerant integrity checking proves beneficial both for integrity
preservation and query answering. Also, we show that it is useful for view updating, repairs, schema evolution, and
other applications.
44 Initialization and Restart in Stochastic Local Search: Computing a Most Probable Explanation in Bayesian Networks
For hard computational problems, stochastic local search has proven to be a competitive approach to finding optimal or
approximately optimal problem solutions. Two key research questions for stochastic local search algorithms are: Which
14
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
algorithms are effective for initialization? When should the search process be restarted? In the present work, we investigate
these research questions in the context of approximate computation of most probable explanations (MPEs) in Bayesian
networks (BNs). We introduce a novel approach, based on the Viterbi algorithm, to explanation initialization in BNs. While
the Viterbi algorithm works on sequences and trees, our approach works on BNs with arbitrary topologies. We also give a
novel formalization of stochastic local search, with focus on initialization and restart, using probability theory and mixture
models. Experimentally, we apply our methods to the problem of MPE computation, using a stochastic local search
algorithm known as Stochastic Greedy Search. By carefully optimizing both initialization and restart, we reduce the MPE
search time for application BNs by several orders of magnitude compared to using uniform at random initialization without
restart. On several BNs from applications, the performance of Stochastic Greedy Search is competitive with clique tree
clustering, a state-of-the-art exact algorithm used for MPE computation in BNs.
45 Integration of the HL7 Standard in a Multiagent System to Support Personalized Access to e-Health Services
This Natural phenomena show that many creatures form large social groups and move in regular patterns. However,
previous works focus on finding the movement patterns of each single object or all objects. In this paper, we first
propose an efficient distributed mining algorithm to jointly identify a group of moving objects and discover their
movement patterns in wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D,
which exploits the obtained group movement patterns to reduce the amount of delivered data. The compression
algorithm includes a sequence merge and an entropy reduction phases. In the sequence merge phase, we propose a
Merge algorithm to merge and compress the location data of a group of moving objects. In the entropy reduction
phase, we formulate a Hit Item Replacement (HIR) problem and propose a Replace algorithm that obtains the optimal
solution. Moreover, we devise three replacement rules and derive the maximum compression ratio. The experimental
results show that the proposed compression algorithm leverages the group movement patterns to reduce the
amount of delivered data effectively and efficiently.
46 Exploring Application-Level Semantics for Data Compression
Natural phenomena show that many creatures form large social groups and move in regular patterns. However, previous
works focus on finding the movement patterns of each single object or all objects. In this paper, we first propose an
efficient distributed mining algorithm to jointly identify a group of moving objects and discover their movement patterns in
wireless sensor networks. Afterward, we propose a compression algorithm, called 2P2D, which exploits the obtained group
movement patterns to reduce the amount of delivered data. The compression algorithm includes a sequence merge and an
entropy reduction phases. In the sequence merge phase, we propose a Merge algorithm to merge and compress the
location data of a group of moving objects. In the entropy reduction phase, we formulate a Hit Item Replacement (HIR)
problem and propose a Replace algorithm that obtains the optimal solution. Moreover, we devise three replacement rules
and derive the maximum compression ratio. The experimental results show that the proposed compression algorithm
leverages the group movement patterns to reduce the amount of delivered data effectively and efficiently.
47 Intertemporal Discount Factors as a Measure of Trustworthiness in Electronic Commerce
In multiagent interactions, such as e-commerce and file sharing, being able to accurately assess the trustworthiness
of others is important for agents to protect themselves from losing utility. Focusing on rational agents in e-
commerce, we prove that an agent’s discount factor (time preference of utility) is a direct measure of the agent’s
trustworthiness for a set of reasonably general assumptions and definitions. We propose a general list of desiderata
for trust systems and discuss how discount factors as trustworthiness meet these desiderata. We discuss how
discount factors are a robust measure when entering commitments that exhibit moral hazards. Using an online
market as a motivating example, we derive some analytical methods both for measuring discount factors and for
aggregating the measurements.
15
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
48 IR-Tree: An Efficient Index for Geographic Document Search
Given a geographic query that is composed of query keywords and a location, a geographic search engine retrieves
documents that are the most textually and spatially relevant to the query keywords and the location, respectively, and ranks
the retrieved documents according to their joint textual and spatial relevances to the query. The lack of an efficient index
that can simultaneously handle both the textual and spatial aspects of the documents makes existing geographic search
engines inefficient in answering geographic queries. In this paper, we propose an efficient index, called IR-tree, that
together with a top-k document search algorithm facilitates four major tasks in document searches, namely, 1) spatial
filtering, 2) textual filtering, 3) relevance computation, and 4) document ranking in a fully integrated manner. In addition, IR-
tree allows searches to adopt different weights on textual and spatial relevance of documents at the runtime and thus
caters for a wide variety of applications. A set of comprehensive experiments over a wide range of scenarios has been
conducted and the experiment results demonstrate that IR-tree outperforms the state-of-theart approaches for geographic
document searches.
49 Knowledge Discovery in Services (KDS): Aggregating Software Services to Discover Enterprise Mashups
Service mashup is the act of integrating the resulting data of two complementary software services into a common
picture. Such an approach is promising with respect to the discovery of new types of knowledge. However, before
service mashup routines can be executed, it is necessary to predict which services (of an open repository) are viable
candidates. Similar to Knowledge Discovery in Databases (KDD), we introduce the Knowledge Discovery in Services
(KDS) process that identifies mashup candidates. In this work, the KDS process is specialized to address a
repository of open services that do not contain semantic annotations. In these situations, specialized techniques are
required to determine equivalences among open services with reasonable precision. This paper introduces a bottom-
up process for KDS that adapts to the environment of services for which it operates. Detailed experiments are
discussed that evaluate KDS techniques on an open repository of services from the Internet and on a repository of
services created in a controlled environment.
50 Learning Semi-Riemannian Metrics for Semisupervised Feature Extraction
Discriminant feature extraction plays a central role in pattern recognition and classification. Linear Discriminant Analysis
(LDA) is a traditional algorithm for supervised feature extraction. Recently, unlabeled data have been utilized to improve
LDA. However, the intrinsic problems of LDA still exist and only the similarity among the unlabeled data is utilized. In this
paper, we propose a novel algorithm, called Semisupervised Semi-Riemannian Metric Map (S3RMM), following the
geometric framework of semi- Riemannian manifolds. S3RMM maximizes the discrepancy of the separability and similarity
measures of scatters formulated by using semi-Riemannian metric tensors. The metric tensor of each sample is learned via
semisupervised regression. Our method can also be a general framework for proposing new semisupervised algorithms,
utilizing the existing discrepancy-criterion-based algorithms. The experiments demonstrated on faces and handwritten
digits show that S3RMM is promising for semisupervised feature extraction.
51 Load Shedding in Mobile Systems with MobiQual
In location-based, mobile continual query (CQ) systems, two key measures of quality-of-service (QoS) are: freshness
and accuracy. To achieve freshness, the CQ server must perform frequent query reevaluations. To attain accuracy,
16
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
the CQ server must receive and process frequent position updates from the mobile nodes. However, it is often
difficult to obtain fresh and accurate CQ results simultaneously, due to 1) limited resources in computing and
communication and 2) fast-changing load conditions caused by continuous mobile node movement. Hence, a key
challenge for a mobile CQ system is: How do we achieve the highest possible quality of the CQ results, in both
freshness and accuracy, with currently available resources? In this paper, we formulate this problem as a load
shedding one, and develop MobiQual—a QoS-aware approach to performing both update load shedding and query
load shedding. The design of MobiQual highlights three important features. 1) Differentiated load shedding: We apply
different amounts of query load shedding and update load shedding to different groups of queries and mobile nodes,
respectively. 2) Per-query QoS specification: Individualized QoS specifications are used to maximize the overall
freshness and accuracy of the query results. 3) Lowcost adaptation: MobiQual dynamically adapts, with a minimal
overhead, to changing load conditions and available resources. We conduct a set of comprehensive experiments to
evaluate the effectiveness of MobiQual. The results show that, through a careful combination of update and query
load shedding, the MobiQual approach leads to much higher freshness and accuracy in the query results in all cases,
compared to existing approaches that lack the QoS-awareness properties of MobiQual, as well as the solutions that
perform query-only or update-only load shedding.
52 Locally Consistent Concept Factorization for Document Clustering
For Previous studies have demonstrated that document clustering performance can be improved significantly in lower
dimensional linear subspaces. Recently, matrix factorization-based techniques, such as Nonnegative Matrix Factorization
(NMF) and Concept Factorization (CF), have yielded impressive results. However, both of them effectively see only the
global euclidean geometry, whereas the local manifold geometry is not fully considered. In this paper, we propose a new
approach to extract the document concepts which are consistent with the manifold geometry such that each concept
corresponds to a connected component. Central to our approach is a graph model which captures the local geometry of the
document submanifold. Thus, we call it Locally Consistent Concept Factorization (LCCF). By using the graph Laplacian to
smooth the document-to-concept mapping, LCCF can extract concepts with respect to the intrinsic manifold structure and
thus documents associated with the same concept can be well clustered. The experimental results on TDT2 and Reuters-
21578 have shown that the proposed approach provides a better representation and achieves better clustering results in
terms of accuracy and mutual information.
53 Mining Cluster-Based Temporal Mobile Sequential Patterns in Location-Based Service Environments
Researches on Location-Based Service (LBS) have been emerging in recent years due to a wide range of potential
applications. One of the active topics is the mining and prediction of mobile movements and associated transactions.
Most of existing studies focus on discovering mobile patterns from the whole logs. However, this kind of patterns
may not be precise enough for predictions since the differentiated mobile behaviors among users and temporal
periods are not considered. In this paper, we propose a novel algorithm, namely, Cluster-based Temporal Mobile
Sequential Pattern Mine (CTMSP-Mine), to discover the Cluster-based Temporal Mobile Sequential Patterns
(CTMSPs). Moreover, a prediction strategy is proposed to predict the subsequent mobile behaviors. In CTMSP-Mine,
user clusters are constructed by a novel algorithm named Cluster-Object-based Smart Cluster Affinity Search
Technique (CO-Smart-CAST) and similarities between users are evaluated by the proposed measure, Location-Based
Service Alignment (LBS-Alignment). Meanwhile, a time segmentation approach is presented to find segmenting time
intervals where similar mobile characteristics exist. To our best knowledge, this is the first work on mining and
prediction of mobile behaviors with considerations of user relations and temporal property simultaneously. Through
experimental evaluation under various simulated conditions, the proposed methods are shown to deliver excellent
performance.
17
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
54 Mining Discriminative Patterns for Classifying Trajectories on Road Networks
Classification has been used for modeling many kinds of data sets, including sets of items, text documents, graphs, and
networks. However, there is a lack of study on a new kind of data, trajectories on road networks. Modeling such data is
useful with the emerging GPS and RFID technologies and is important for effective transportation and traffic planning. In
this work, we study methods for classifying trajectories on road networks. By analyzing the behavior of trajectories on road
networks, we observe that, in addition to the locations where vehicles have visited, the order of these visited locations is
crucial for improving classification accuracy. Based on our analysis, we contend that (frequent) sequential patterns are
good feature candidates since they preserve this order information. Furthermore, when mining sequential patterns, we
propose to confine the length of sequential patterns to ensure high efficiency. Compared with closed sequential patterns,
these partial (i.e., length-confined) sequential patterns allow us to significantly improve efficiency almost without losing
accuracy. In this paper, we present a framework for frequent pattern-based classification for trajectories on road networks.
Our comparative study over a broad range of classification approaches demonstrates that our method significantly
improves accuracy over other methods in some synthetic and real trajectory data.
47 Mining Group Movement Patterns for Tracking Moving Objects Efficiently
Existing object tracking applications focus on finding the moving patterns of a single object or all objects. In
contrast, we propose a distributed mining algorithm that identifies a group of objects with similar movement
patterns. This information is important in some biological research domains, such as the study of animals’ social
behavior and wildlife migration. The proposed algorithm comprises a local mining phase and a cluster ensembling
phase. In the local mining phase, the algorithm finds movement patterns based on local trajectories. Then, based on
the derived patterns, we propose a new similarity measure to compute the similarity of moving objects and identify
the local group relationships. To address the energy conservation issue in resource-constrained environments, the
algorithm only transmits the local grouping results to the sink node for further ensembling. In the cluster ensembling
phase, our algorithm combines the local grouping results to derive the group relationships from a global view. We
further leverage the mining results to track moving objects efficiently. The results of experiments show that the
proposed mining algorithm achieves good grouping quality, and the mining technique helps reduce the energy
consumption by reducing the amount of data to be transmitted.
48 Mining Iterative Generators and Representative Rules for Software Specification Discovery
Billions of dollars are spent annually on software-related cost. It is estimated that up to 45 percent of software cost is due to
the difficulty in understanding existing systems when performing maintenance tasks (i.e., adding features, removing bugs,
etc.). One of the root causes is that software products often come with poor, incomplete, or even without any documented
specifications. In an effort to improve program understanding, Lo et al. have proposed iterative pattern mining which
outputs patterns that are repeated frequently within a program trace, or across multiple traces, or both. Frequent iterative
patterns reflect frequent program behaviors that likely correspond to software specifications. To reduce the number of
patterns and improve the efficiency of the algorithm, Lo et al. have also introduced mining closed iterative patterns, i.e.,
maximal patterns without any superpattern having the same support. In this paper, to technically deepen research on
iterative pattern mining, we introduce mining iterative generators, i.e., minimal patterns without any subpattern having the
same support. Iterative generators can be paired with closed patterns to produce a set of rules expressing forward,
backward, and in-between temporal constraints among events in one general representation. We refer to these rules as
representative rules. A comprehensive performance study shows the efficiency of our approach. A case study on traces of
18
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
an industrial system shows how iterative generators and closed iterative patterns can be merged to form useful rules
shedding light on software design.
49 Missing Value Estimation for Mixed-Attribute Data Sets
Service Missing data imputation is a key issue in learning from incomplete data. Various techniques have been
developed with great successes on dealing with missing values in data sets with homogeneous attributes (their
independent attributes are all either continuous or discrete). This paper studies a new setting of missing data
imputation, i.e., imputing missing data in data sets with heterogeneous attributes (their independent attributes are of
different types), referred to as imputing mixed-attribute data sets. Although many real applications are in this setting,
there is no estimator designed for imputing mixed-attribute data sets. This paper first proposes two consistent
estimators for discrete and continuous missing target values, respectively. And then, a mixture-kernelbased iterative
estimator is advocated to impute mixed-attribute data sets. The proposed method is evaluated with extensive
experiments compared with some typical algorithms, and the result demonstrates that the proposed approach is
better than these existing imputation methods in terms of classification accuracy and root mean square error (RMSE)
at different missing ratios
50 Monochromatic and Bichromatic Reverse Top-k Queries
Discriminant Nowadays, most applications return to the user a limited set of ranked results based on the individual user’s
preferences, which are commonly expressed through top-k queries. From the perspective of a manufacturer, it is imperative
that her products appear in the highest ranked positions for many different user preferences, otherwise the product is not
visible to potential customers. In this paper, we define a novel query type, namely the reverse top-k query, that covers this
requirement: “Given a potential product, which are the user preferences that make this product belong to the top-k query
result set?.” Reverse top-k queries are essential for manufacturers to assess the impact of their products in the market
based on the competition. We formally define reverse top-k queries and introduce two versions of the query,
monochromatic and bichromatic. First, we provide a geometric interpretation of the monochromatic reverse top-k query to
acquire an intuition of the solution space. Then, we study in detail the case of bichromatic reverse top-k query, and we
propose two techniques for query processing, namely an efficient threshold-based algorithm and an algorithm based on
materialized reverse top-k views. Our experimental evaluation demonstrates the efficiency of our techniques
51 On Computing Farthest Dominated Locations
In reality, spatial objects (e.g., hotels) not only have spatial locations but also have quality attributes (e.g., price,
star). An object p is said to dominate another one p0, if p is no worse than p0 with respect to every quality attribute
and p is better on at least one quality attribute. Traditional spatial queries (e.g., nearest neighbor, closest pair) ignore
quality attributes, whereas conventional dominance-based queries (e.g., skyline) neglect spatial locations. Motivated
by these observations, we propose a novel query by combining spatial and quality attributes together meaningfully.
Given a set of (competitors’) spatial objects P, a set of (candidate) locations L, and a quality vector as design
competence (for L), the farthest dominated location (FDL) query retrieves the location s 2 L such that the distance to
its nearest dominating object in P is maximized. FDL queries are suitable for various spatial decision support
applications such as business planning, wild animal protection, and digital battle field systems. As FDL queries
cannot be readily solved by existing techniques, we develop several efficient R-tree-based algorithms for processing
FDL queries, which offer users a range of selections in terms of different indexes available on the data. We also
generalize our methods to support the generic distance metric and other interesting query types. The experimental
results on both real and synthetic data sets disclose the performance of those algorithms, and reveal the most
efficient and scalable one among them.
19
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
52 Optimizing Resource Conflicts in Workflow Management Systems
Resource allocation and scheduling are fundamental issues in a Workflow Management System (WfMS). Effective
resource management in WfMS should examine resource allocation together with task scheduling since these
problems impose mutual constraints. Optimization of the one factor is subject to the other constraints and vice
versa. Thus, an ideal algorithm should take into account not only performance metrics of the infrastructure, such as
the number of resources and their utilization, but also quality criteria such as the percentage of tasks undergone
violation in their temporal restrictions. In this paper, we propose an innovative algorithm which jointly optimizes the
two aforementioned contradictory criteria. The algorithm, called Resource Conflicts Joint Optimization
(Re.Co.Jo.Op.), minimizes resource conflicts subject to temporal constraints and simultaneously optimizes
throughput or utilization subject to resources constraints. To achieve the optimization, the two factors are formulated
in a matrix form and the optimal solution is found by applying concepts of the generalized eigenvalue analysis. A
rough outline of an agent-based architecture is proposed to achieve runtime integration of our algorithm into a
functional WfMS, while experimental results under different load environments and tasks assumption reveal the
superiority of the proposed strategy than the other conventional approaches.
53 Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries
Given a record set D and a query score function F, a top-k query returns k records from D, whose values of function F on
their attributes are the highest. In this paper, we investigate the intrinsic connection between top-k queries and dominant
relationships between records, and based on which, we propose an efficient layer-based indexing structure, Pareto-Based
Dominant Graph (DG), to improve the query efficiency. Specifically, DG is built offline to express the dominant relationship
between records and top-k query is implemented as a graph traversal problem, i.e., Traveler algorithm. We prove
theoretically that the size of search space (that is the number of retrieved records from the record set to answer top-k
query) in our algorithm is directly related to the cardinality of skyline points in the record set (see Theorem 3). Considering
I/O cost, we propose cluster-based storage schema to reduce I/O cost in Traveler algorithm. We also propose the cost
estimation methods in this paper. Based on cost analysis, we propose an optimization technique, pseudorecord, to further
improve the search efficiency. In order to handle the top-k query in the high-dimension record set, we also propose N-Way
Traveler algorithm. In order to handle DG maintenance efficiently, we propose “Insertion” and “Deletion” algorithms for DG.
Finally, extensive experiments demonstrate that our proposed methods have significant improvement over its counterparts,
including both classical and state art of top-k algorithms.
54 Privacy-Preserving OLAP: An Information-Theoretic Approach
We address issues related to the protection of private information in Online Analytical Processing (OLAP) systems,
where a major privacy concern is the adversarial inference of private information from OLAP query answers. Most
previous work on privacypreserving OLAP focuses on a single aggregate function and/or addresses only exact
disclosure, which eliminates from consideration an important class of privacy breaches where partial information,
but not exact values, of private data is disclosed (i.e., partial disclosure). We address privacy protection against both
exact and partial disclosure in OLAP systems with mixed aggregate functions. In particular, we propose an
information-theoretic inference control approach that supports a combination of common aggregate functions (e.g.,
COUNT, SUM, MIN, MAX, and MEDIAN) and guarantees the level of privacy disclosure not to exceed thresholds
predetermined by the data owners. We demonstrate that our approach is efficient and can be implemented in existing
OLAP systems with little modification. It also satisfies the simulatable auditing model and leaks no private
information through query rejections. Through performance analysis, we show that compared with previous
approaches, our approach provides more effective privacy protection while maintaining a higher level of query-
answer availability.
20
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
55 Ranking Spatial Data by Quality Preferences
A spatial preference query ranks objects based on the qualities of features in their spatial neighborhood. For example,
using a real estate agency database of flats for lease, a customer may want to rank the flats with respect to the
appropriateness of their location, defined after aggregating the qualities of other features (e.g., restaurants, cafes, hospital,
market, etc.) within their spatial neighborhood. Such a neighborhood concept can be specified by the user via different
functions. It can be an explicit circular region within a given distance from the flat. Another intuitive definition is to assign
higher weights to the features based on their proximity to the flat. In this paper, we formally define spatial preference
queries and propose appropriate indexing techniques and search algorithms for them. Extensive evaluation of our methods
on both real and synthetic data reveals that an optimized branch-and-bound solution is efficient and robust with respect to
different parameters
56 RFID Data Processing in Supply Chain Management Using a Path Encoding Scheme
RFID technology can be applied to a broad range of areas. In particular, RFID is very useful in the area of business,
such as supply chain management. However, the amount of RFID data in such an environment is huge. Therefore,
much time is needed to extract valuable information from RFID data for supply chain management. In this paper, we
present an efficient method to process a massive amount of RFID data for supply chain management. We first define
query templates to analyze the supply chain. We then propose an effective path encoding scheme that encodes the
flows of products. However, if the flows are long, the numbers in the path encoding scheme that correspond to the
flows will be very large. We solve this by providing a method that divides flows. To retrieve the time information for
products efficiently, we utilize a numbering scheme for the XML area. Based on the path encoding scheme and the
numbering scheme, we devise a storage scheme that can process tracking queries and path oriented queries
efficiently on an RDBMS. Finally, we propose a method that translates the queries to SQL queries. Experimental
results show that our approach can process the queries efficiently.
57 Pareto-Based Dominant Graph: An Efficient Indexing Structure to Answer Top-K Queries
Given a record set D and a query score function F, a top-k query returns k records from D, whose values of function F on
their attributes are the highest. In this paper, we investigate the intrinsic connection between top-k queries and dominant
relationships between records, and based on which, we propose an efficient layer-based indexing structure, Pareto-Based
Dominant Graph (DG), to improve the query efficiency. Specifically, DG is built offline to express the dominant relationship
between records and top-k query is implemented as a graph traversal problem, i.e., Traveler algorithm. We prove
theoretically that the size of search space (that is the number of retrieved records from the record set to answer top-k
query) in our algorithm is directly related to the cardinality of skyline points in the record set (see Theorem 3). Considering
I/O cost, we propose cluster-based storage schema to reduce I/O cost in Traveler algorithm. We also propose the cost
estimation methods in this paper. Based on cost analysis, we propose an optimization technique, pseudorecord, to further
improve the search efficiency. In order to handle the top-k query in the high-dimension record set, we also propose N-Way
Traveler algorithm. In order to handle DG maintenance efficiently, we propose “Insertion” and “Deletion” algorithms for DG.
Finally, extensive experiments demonstrate that our proposed methods have significant improvement over its counterparts,
including both classical and state art of top-k algorithms.
58 Privacy-Preserving OLAP: An Information-Theoretic Approach
21
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
We address issues related to the protection of private information in Online Analytical Processing (OLAP) systems,
where a major privacy concern is the adversarial inference of private information from OLAP query answers. Most
previous work on privacypreserving OLAP focuses on a single aggregate function and/or addresses only exact
disclosure, which eliminates from consideration an important class of privacy breaches where partial information,
but not exact values, of private data is disclosed (i.e., partial disclosure). We address privacy protection against both
exact and partial disclosure in OLAP systems with mixed aggregate functions. In particular, we propose an
information-theoretic inference control approach that supports a combination of common aggregate functions (e.g.,
COUNT, SUM, MIN, MAX, and MEDIAN) and guarantees the level of privacy disclosure not to exceed thresholds
predetermined by the data owners. We demonstrate that our approach is efficient and can be implemented in existing
OLAP systems with little modification. It also satisfies the simulatable auditing model and leaks no private
information through query rejections. Through performance analysis, we show that compared with previous
approaches, our approach provides more effective privacy protection while maintaining a higher level of query-
answer availability.
59 Ranking Spatial Data by Quality Preferences
A spatial preference query ranks objects based on the qualities of features in their spatial neighborhood. For example,
using a real estate agency database of flats for lease, a customer may want to rank the flats with respect to the
appropriateness of their location, defined after aggregating the qualities of other features (e.g., restaurants, cafes, hospital,
market, etc.) within their spatial neighborhood. Such a neighborhood concept can be specified by the user via different
functions. It can be an explicit circular region within a given distance from the flat. Another intuitive definition is to assign
higher weights to the features based on their proximity to the flat. In this paper, we formally define spatial preference
queries and propose appropriate indexing techniques and search algorithms for them. Extensive evaluation of our methods
on both real and synthetic data reveals that an optimized branch-and-bound solution is efficient and robust with respect to
different parameters
60 RFID Data Processing in Supply Chain Management Using a Path Encoding Scheme
RFID technology can be applied to a broad range of areas. In particular, RFID is very useful in the area of business,
such as supply chain management. However, the amount of RFID data in such an environment is huge. Therefore,
much time is needed to extract valuable information from RFID data for supply chain management. In this paper, we
present an efficient method to process a massive amount of RFID data for supply chain management. We first define
query templates to analyze the supply chain. We then propose an effective path encoding scheme that encodes the
flows of products. However, if the flows are long, the numbers in the path encoding scheme that correspond to the
flows will be very large. We solve this by providing a method that divides flows. To retrieve the time information for
products efficiently, we utilize a numbering scheme for the XML area. Based on the path encoding scheme and the
numbering scheme, we devise a storage scheme that can process tracking queries and path oriented queries
efficiently on an RDBMS. Finally, we propose a method that translates the queries to SQL queries. Experimental
results show that our approach can process the queries efficiently.
61 Seeking Quality of Web Service Composition in a Semantic Dimension
We address issues related to the protection of private information in Online Analytical Processing (OLAP) systems,
where a major privacy concern is the adversarial inference of private information from OLAP query answers. Most
previous work on privacypreserving OLAP focuses on a single aggregate function and/or addresses only exact
disclosure, which eliminates from consideration an important class of privacy breaches where partial information,
but not exact values, of private data is disclosed (i.e., partial disclosure). We address privacy protection against both
22
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
exact and partial disclosure in OLAP systems with mixed aggregate functions. In particular, we propose an
information-theoretic inference control approach that supports a combination of common aggregate functions (e.g.,
COUNT, SUM, MIN, MAX, and MEDIAN) and guarantees the level of privacy disclosure not to exceed thresholds
predetermined by the data owners. We demonstrate that our approach is efficient and can be implemented in existing
OLAP systems with little modification. It also satisfies the simulatable auditing model and leaks no private
information through query rejections. Through performance analysis, we show that compared with previous
approaches, our approach provides more effective privacy protection while maintaining a higher level of query-
answer availability.
62 Semantic Knowledge-Based Framework to Improve the Situation Awareness of Autonomous Underwater Vehicles
This paper proposes a semantic world model framework for hierarchical distributed representation of knowledge in
autonomous underwater systems. This framework aims to provide a more capable and holistic system, involving semantic
interoperability among all involved information sources. This will enhance interoperability, independence of operation, and
situation awareness of the embedded service-oriented agents for autonomous platforms. The results obtained specifically
affect the mission flexibility, robustness, and autonomy. The presented framework makes use of the idea that
heterogeneous real-world data of very different type must be processed by (and run through) several different layers, to be
finally available in a suited format and at the right place to be accessible by high-level decision-making agents. In this
sense, the presented approach shows how to abstract away from the raw real-world data step by step by means of
semantic technologies. The paper concludes by demonstrating the benefits of the framework in a real scenario. A hardware
fault is simulated in a REMUS 100 AUV while performing a mission. This triggers a knowledge exchange between the status
monitoring agent and the adaptive mission planner embedded agent. By using the proposed framework, both services can
interchange information while remaining domain independent during their interaction with the platform. The results of this
paper are readily applicable to land and air robotics.
63 Straggler Identification in Round-Trip Data Streams via Newton’s Identities and Invertible Bloom Filters
In this paper, we study the straggler identification problem, in which an algorithm must determine the identities of the
remaining members of a set after it has had a large number of insertion and deletion operations performed on it, and
now has relatively few remaining members. The goal is to do this in oðnÞ space, where n is the total number of
identities. Straggler identification has applications, for example, in determining the unacknowledged packets in a
high-bandwidth multicast data stream. We provide a deterministic solution to the straggler identification problem that
uses only Oðd log nÞ bits, based on a novel application of Newton’s identities for symmetric polynomials. This
solution can identify any subset of d stragglers from a set of nOðlog nÞ-bit identifiers, assuming that there are no
false deletions of identities not already in the set. Indeed, we give a lower bound argument that shows that any small-
space deterministic solution to the straggler identification problem cannot be guaranteed to handle false deletions.
Nevertheless, we provide a simple randomized solution, using Oðd log n logð1=ˇÞÞ bits that can maintain a multiset
and solve the straggler identification problem, tolerating false deletions, where ˇ > 0 is a user-defined parameter
bounding the probability of an incorrect response. This randomized solution is based on a new type of Bloom filter,
which we call the invertible Bloom filter.
64 Seeking Quality of Web Service Composition in a Semantic Dimension
We address issues related to the protection of private information in Online Analytical Processing (OLAP) systems,
where a major privacy concern is the adversarial inference of private information from OLAP query answers. Most
previous work on privacypreserving OLAP focuses on a single aggregate function and/or addresses only exact
disclosure, which eliminates from consideration an important class of privacy breaches where partial information,
but not exact values, of private data is disclosed (i.e., partial disclosure). We address privacy protection against both
23
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
exact and partial disclosure in OLAP systems with mixed aggregate functions. In particular, we propose an
information-theoretic inference control approach that supports a combination of common aggregate functions (e.g.,
COUNT, SUM, MIN, MAX, and MEDIAN) and guarantees the level of privacy disclosure not to exceed thresholds
predetermined by the data owners. We demonstrate that our approach is efficient and can be implemented in existing
OLAP systems with little modification. It also satisfies the simulatable auditing model and leaks no private
information through query rejections. Through performance analysis, we show that compared with previous
approaches, our approach provides more effective privacy protection while maintaining a higher level of query-
answer availability.
65 SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis
In this article, we provide a new technique for temporal data mining which is based on classification rules that can easily be
understood by human domain experts. Basically, time series are decomposed into short segments, and short-term trends
of the time series within the segments (e.g., average, slope, and curvature) are described by means of polynomial models.
Then, the classifiers assess short sequences of trends in subsequent segments with their rule premises. The conclusions
gradually assign an input to a class. As the classifier is a generative model of the processes from which the time series are
assumed to originate, anomalies can be detected, too. Segmentation and piecewise polynomial modeling are done
extremely fast in only one pass over the time series. Thus, the approach is applicable to problems with harsh timing
constraints. We lay the theoretical foundations for this classifier, including a new distance measure for time series and a
new technique to construct a dynamic classifier from a static one, and demonstrate its properties by means of various
benchmark time series, for example, Lorenz attractor time series, energy consumption in a building, or ECG data.
66 Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations
Temporal data clustering provides underpinning techniques for discovering the intrinsic structure and condensing
information over temporal data. In this paper, we present a temporal data clustering framework via a weighted
clustering ensemble of multiple partitions produced by initial clustering analysis on different temporal data
representations. In our approach, we propose a novel weighted consensus function guided by clustering validation
criteria to reconcile initial partitions to candidate consensus partitions from different perspectives, and then,
introduce an agreement function to further reconcile those candidate consensus partitions to a final partition. As a
result, the proposed weighted clustering ensemble algorithm provides an effective enabling technique for the joint
use of different representations, which cuts the information loss in a single representation and exploits various
information sources underlying temporal data. In addition, our approach tends to capture the intrinsic structure of a
data set, e.g., the number of clusters. Our approach has been evaluated with benchmark time series, motion
trajectory, and time-series data stream clustering tasks. Simulation results demonstrate that our approach yields
favorite results for a variety of temporal data clustering tasks. As our weighted cluster ensemble algorithm can
combine any input partitions to generate a clustering ensemble, we also investigate its limitation by formal analysis
and empirical studies.
67 Seeking Quality of Web Service Composition in a Semantic Dimension
We address issues related to the protection of private information in Online Analytical Processing (OLAP) systems,
where a major privacy concern is the adversarial inference of private information from OLAP query answers. Most
previous work on privacypreserving OLAP focuses on a single aggregate function and/or addresses only exact
disclosure, which eliminates from consideration an important class of privacy breaches where partial information,
but not exact values, of private data is disclosed (i.e., partial disclosure). We address privacy protection against both
exact and partial disclosure in OLAP systems with mixed aggregate functions. In particular, we propose an
information-theoretic inference control approach that supports a combination of common aggregate functions (e.g.,
24
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
COUNT, SUM, MIN, MAX, and MEDIAN) and guarantees the level of privacy disclosure not to exceed thresholds
predetermined by the data owners. We demonstrate that our approach is efficient and can be implemented in existing
OLAP systems with little modification. It also satisfies the simulatable auditing model and leaks no private
information through query rejections. Through performance analysis, we show that compared with previous
approaches, our approach provides more effective privacy protection while maintaining a higher level of query-
answer availability.
68 SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis
In this article, we provide a new technique for temporal data mining which is based on classification rules that can easily be
understood by human domain experts. Basically, time series are decomposed into short segments, and short-term trends
of the time series within the segments (e.g., average, slope, and curvature) are described by means of polynomial models.
Then, the classifiers assess short sequences of trends in subsequent segments with their rule premises. The conclusions
gradually assign an input to a class. As the classifier is a generative model of the processes from which the time series are
assumed to originate, anomalies can be detected, too. Segmentation and piecewise polynomial modeling are done
extremely fast in only one pass over the time series. Thus, the approach is applicable to problems with harsh timing
constraints. We lay the theoretical foundations for this classifier, including a new distance measure for time series and a
new technique to construct a dynamic classifier from a static one, and demonstrate its properties by means of various
benchmark time series, for example, Lorenz attractor time series, energy consumption in a building, or ECG data.
69 Text Clustering with Seeds Affinity Propagation
Based on an effective clustering algorithm—Affinity Propagation (AP)—we present in this paper a novel
semisupervised text clustering algorithm, called Seeds Affinity Propagation (SAP). There are two main contributions
in our approach: 1) a new similarity metric that captures the structural information of texts, and 2) a novel seed
construction method to improve the semisupervised clustering process. To study the performance of the new
algorithm, we applied it to the benchmark data set Reuters-21578 and compared it to two state-of-the-art clustering
algorithms, namely, k-means algorithm and the original AP algorithm. Furthermore, we have analyzed the individual
impact of the two proposed contributions. Results show that the proposed similarity metric is more effective in text
clustering (F-measures ca. 21 percent higher than in the AP algorithm) and the proposed semisupervised strategy
achieves both better clustering results and faster convergence (using only 76 percent iterations of the original AP).
The complete SAP algorithm obtains higher F-measure (ca. 40 percent improvement over k-means and AP) and lower
entropy (ca. 28 percent decrease over k-means and AP), improves significantly clustering execution time (20 times
faster) in respect that k-means, and provides enhanced robustness compared with all other methods.
70 Temporal Data Clustering via Weighted Clustering Ensemble with Different Representations
Temporal data clustering provides underpinning techniques for discovering the intrinsic structure and condensing
information over temporal data. In this paper, we present a temporal data clustering framework via a weighted
clustering ensemble of multiple partitions produced by initial clustering analysis on different temporal data
representations. In our approach, we propose a novel weighted consensus function guided by clustering validation
criteria to reconcile initial partitions to candidate consensus partitions from different perspectives, and then,
introduce an agreement function to further reconcile those candidate consensus partitions to a final partition. As a
result, the proposed weighted clustering ensemble algorithm provides an effective enabling technique for the joint
use of different representations, which cuts the information loss in a single representation and exploits various
information sources underlying temporal data. In addition, our approach tends to capture the intrinsic structure of a
data set, e.g., the number of clusters. Our approach has been evaluated with benchmark time series, motion
trajectory, and time-series data stream clustering tasks. Simulation results demonstrate that our approach yields
favorite results for a variety of temporal data clustering tasks. As our weighted cluster ensemble algorithm can
25
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
combine any input partitions to generate a clustering ensemble, we also investigate its limitation by formal analysis
and empirical studies.
71 TEXT: Automatic Template Extraction from Heterogeneous Web Pages
We World Wide Web is the most useful source of information. In order to achieve high productivity of publishing, the
webpages in many websites are automatically populated by using the common templates with contents. The
templates provide readers easy access to the contents guided by consistent structures. However, for machines, the
templates are considered harmful since they degrade the accuracy and performance of web applications due to
irrelevant terms in templates. Thus, template detection techniques have received a lot of attention recently to
improve the performance of search engines, clustering, and classification of web documents. In this paper, we
present novel algorithms for extracting templates from a large number of web documents which are generated from
heterogeneous templates. We cluster the web documents based on the similarity of underlying template structures in
the documents so that the template for each cluster is extracted simultaneously. We develop a novel goodness
measure with its fast approximation for clustering and provide comprehensive analysis of our algorithm. Our
experimental results with real-life data sets confirm the effectiveness and robustness of our algorithm compared to
the state of the art for template detection algorithms.
72 SwiftRule: Mining Comprehensible Classification Rules for Time Series Analysis
In this article, we provide a new technique for temporal data mining which is based on classification rules that can easily be
understood by human domain experts. Basically, time series are decomposed into short segments, and short-term trends
of the time series within the segments (e.g., average, slope, and curvature) are described by means of polynomial models.
Then, the classifiers assess short sequences of trends in subsequent segments with their rule premises. The conclusions
gradually assign an input to a class. As the classifier is a generative model of the processes from which the time series are
assumed to originate, anomalies can be detected, too. Segmentation and piecewise polynomial modeling are done
extremely fast in only one pass over the time series. Thus, the approach is applicable to problems with harsh timing
constraints. We lay the theoretical foundations for this classifier, including a new distance measure for time series and a
new technique to construct a dynamic classifier from a static one, and demonstrate its properties by means of various
benchmark time series, for example, Lorenz attractor time series, energy consumption in a building, or ECG data.
73 The CoQUOS Approach to Continuous Queries in Unstructured Overlays
The current peer-to-peer (P2P) content distribution systems are constricted by their simple on-demand content
discovery mechanism. The utility of these systems can be greatly enhanced by incorporating two capabilities,
namely a mechanism through which peers can register their long term interests with the network so that they can be
continuously notified of new data items, and a means for the peers to advertise their contents. Although researchers
have proposed a few unstructured overlay-based publishsubscribe systems that provide the above capabilities, most
of these systems require intricate indexing and routing schemes, which not only make them highly complex but also
render the overlay network less flexible toward transient peers. This paper argues that for many P2P applications,
implementing full-fledged publish-subscribe systems is an overkill. For these applications, we study the alternate
continuous query paradigm, which is a best-effort service providing the above two capabilities. We present a
scalable and effective middleware, called CoQUOS, for supporting continuous queries in unstructured overlay
networks. Besides being independent of the overlay topology, CoQUOS preserves the simplicity and flexibility of the
unstructured P2P network. Our design of the CoQUOS system is characterized by two novel techniques, namely
cluster-resilient random walk algorithm for propagating the queries to various regions of the network and dynamic
probability-based query registration scheme to ensure that the registrations are well distributed in the overlay.
Further, we also develop effective and efficient schemes for providing resilience to the churn of the P2P network and
26
Elysium Technologies Private Limited ISO 9001:2008 A leading Research and Development Division Madurai | Chennai | Trichy | Coimbatore | Kollam| Singapore Website: elysiumtechnologies.com, elysiumtechnologies.info Email: info@elysiumtechnologies.com
IEEE Final Year Project List 2011-2012
Madurai Elysium Technologies Private Limited
230, Church Road, Annanagar,
Madurai , Tamilnadu – 625 020.
Contact : 91452 4390702, 4392702, 4394702.
eMail: info@elysiumtechnologies.com
Trichy Elysium Technologies Private Limited
3rd
Floor,SI Towers,
15 ,Melapudur , Trichy,
Tamilnadu – 620 001.
Contact : 91431 - 4002234.
eMail: elysium.trichy@gmail.com
Kollam Elysium Technologies Private Limited
Surya Complex,Vendor junction,
kollam,Kerala – 691 010.
Contact : 91474 2723622.
eMail: elysium.kollam@gmail.com
for ensuring a fair distribution of the notification load among the peers.
74 The World in a Nutshell: Concise Range Queries
With the advance of wireless communication technology, it is quite common for people to view maps or get related services
from the handheld devices, such as mobile phones and PDAs. Range queries, as one of the most commonly used tools, are
often posed by the users to retrieve needful information from a spatial database. However, due to the limits of
communication bandwidth and hardware power of handheld devices, displaying all the results of a range query on a
handheld device is neither communicationefficient nor informative to the users. This is simply because that there are often
too many results returned from a range query. In view of this problem, we present a novel idea that a concise
representation of a specified size for the range query results, while incurring minimal information loss, shall be computed
and returned to the user. Such a concise range query not only reduces communication costs, but also offers better
usability to the users, providing an opportunity for interactive exploration. The usefulness of the concise range queries is
confirmed by comparing it with other possible alternatives, such as sampling and clustering. Unfortunately, we prove that
finding the optimal representation with minimum information loss is an NP-hard problem. Therefore, we propose several
effective and nontrivial algorithms to find a good approximate result. Extensive experiments on real-world data have
demonstrated the effectiveness and efficiency of the proposed techniques.
75 Usher: Improving Data Quality with Dynamic Forms
Data quality is a critical problem in modern databases. data-entry forms present the first and arguably best
opportunity for detecting and mitigating errors, but there has been little research into automatic methods for
improving data quality at entry time. In this paper, we propose USHER, an end-to-end system for form design, entry,
and data quality assurance. Using previous form submissions, USHER learns a probabilistic model over the
questions of the form. USHER then applies this model at every step of the data-entry process to improve data quality.
Before entry, it induces a form layout that captures the most important data values of a form instance as quickly as
possible and reduces the complexity of error-prone questions. During entry, it dynamically adapts the form to the
values being entered by providing real-time interface feedback, reasking questions with dubious responses, and
simplifying questions by reformulating them. After entry, it revisits question responses that it deems likely to have
been entered incorrectly by reasking the question or a reformulation thereof. We evaluate these components of
USHER using two real-world data sets. Our results demonstrate that USHER can improve data quality considerably at
a reduced cost when compared to current practice.
27
top related