Textual Data Partitioning with Relationship and Discriminative Analys

Publication Date : 31/01/2015

Ms. P. Tamilarasi, Research Scholar , Mrs. T. R. Vithya , Mr. R. Subramanian.

Volume 2
Issue 1
(01 - 2015)

Data partitioning methods are used to partition the data values with similarity. Similarity measures are used to estimate transaction relationships. Hierarchical clustering model produces tree structured results. Partitioned clustering produces results in grid format. Text documents are unstructured data values with high dimensional attributes. Document clustering group ups unlabeled text documents into meaningful clusters. Traditional clustering methods require cluster count (K) for the document grouping process. Clustering accuracy degrades drastically with reference to the unsuitable cluster count.   Textual data elements are divided into two types’ discriminative words and nondiscriminative words. Only discriminative words are useful for grouping documents. The involvement of nondiscriminative words confuses the clustering process and leads to poor clustering solution in return. A variation inference algorithm is used to infer the document collection structure and partition of document words at the same time. Dirichlet Process Mixture (DPM) model is used to partition documents. DPM clustering model uses both the data likelihood and the clustering property of the Dirichlet Process (DP). Dirichlet Process Mixture Model for Feature Partition (DPMFP) is used to discover the latent cluster structure based on the DPM model. DPMFP clustering is performed without requiring the number of clusters as input.   Document labels are used to estimate the discriminative word identification process. Concept relationships are analyzed with Ontology support. Semantic weight model is used for the document similarity analysis. The system improves the scalability with the support of labels and concept relations for dimensionality reduction process. 

January 22, 2015