Call For Paper Volume:5 Issue:8 Aug'2018 |

Document Clustering with Feature Behavior based Distance Analysis

Publication Date : 30/09/2015

Author(s) :

A. Kanimozhi , M. Subha.

Volume/Issue :
Volume 2
Issue 9
(09 - 2015)

Abstract :

Machine learning and data mining methods are applied to perform large data analysis. Clustering methods are applied to group the related data values. Partitional clustering and hierarchical clustering methods are applied to handle the clustering operations. Tabular format data processing is carried out under the partitional clustering models. Tree based data clustering is adapted in the hierarchical clustering models. Clustering techniques are also applied to group the text documents. Distance measures are employed to estimate the document relationships in clustering process. Cosine and Euclidean distance measures are widely used in the clustering operations. Dimensionality is the key factor in the document clustering process. Document contents are parsed and represented as vector model. Features and associated weight values are assigned under the document vector model. Feature behavior distance model faces the High dimensionality and sparsity issues. Feature based similarity estimation is carried out using Similarity Measurement for Text Process (SMTP). Clustering and classification operations are performed with the SMTP distance measure. Text document clustering is performed using the Hybrid Similarity Measure for Text Process (HSMTP). Feature appearance and weight factors are integrated in the HSMTP scheme. The HSMTP scheme is integrated with the Spherical K-Means clustering algorithm to partition the documents. Feature reduction process is initiated to minimize the dimensionality of the document vector. Ontology is used to fetch the concept relationship values. Concept relationship based distance model is also supported by the HSMTP scheme.  

No. of Downloads :



Web Design MymensinghPremium WordPress ThemesWeb Development

Document Clustering with Feature Behavior based Distance Analysis

September 22, 2015