A Survey on Frequent Web Page Mining with Improving Data Quality of Log Cleaner
- Abstract
- PDF Full Text
Data cleaning is the process of detecting and correcting the irrelevant, incomplete data from the datasets and log files and then replacing modifying this dirty data. Data cleaning is one of the major techniques used in the Data Preprocessing and Web Usage Mining. Data cleaning is very useful in the fields like banking, insurance, retailing, etc. There is lots of work on data cleaning of web server logs irrelevant items and useless data can not completely removed and Overlapped data causes difficulty during page ranking. Studied in previous paper there are many techniques of web log mining. They are Two-level clustering method, Effective and scalable technique. This paper presents an overview of web usage mining, its techniques and also provides a summary of LogCleaner that can filter out plenty of some irrelevant, inconsistent data based on the common of their URLs and improve the data quality and efficiency of Web Log