Accomplishments
An Improvised Approach for Website Domain Classification
- Abstract
In today's world, all search engine portals use suggestions to provide users with various options while searching for websites of a genre. The search engine needs to understand the genre of these websites to give the user relevant suggestions for websites. According to a survey, as of June 2021, about 1.86 billion websites are available on the Web. After looking at these statistics, a manual classification seems rather impractical and tedious. Owing to technological advancements, the authors have used machine learning to categorize websites into a group of preset genres provided in a curated dataset prepared in this particular work. The websites are classified into eleven domains according to popularity and user convenience. It was then made accessible through a dynamic web application that used the machine learning model to predict the type of website after taking the user's input. The authors have explored various classification models for this categorization and implemented the most suitable model with the highest accuracy as the final website domain classifier. The proposed model is also verified using the “web genre corpus 2004 (Genre-KI-04)” open dataset created for website genre classification.