An Analysis of Feature Engineering Approaches for Unlabeled Dark Web

Back to Accomplishments

Accomplishments

An Analysis of Feature Engineering Approaches for Unlabeled Dark Web

Details
Share

Category

Conference

Authors

Ashwini Dalvi

Conference Name

International Conference on Worldwide Computing and Its Applications 2023

Conference From

03-Jul-2023

Conference To

06-Jul-2023

Conference Venue

Online

Abstract

Feature engineering is critical in building machine learning models for unlabeled text classification. In feature engineering, relevant features are extracted and created from unlabeled text data to improve machine learning models. Unlabeled text classification can be achieved through a variety of feature engineering approaches. The proposed work used crawled dark web data in the raw HTML as unlabeled text data. Bag of Words, TF-IDF, and BM25 are all commonly used feature engineering techniques in natural language processing. The proposed work applied Bag of Words, TF-IDF, and BM25 techniques to extract useful features to train Latent Dirichlet Allocation (LDA) model that identifies latent topics from unlabeled text data. The text data has been transformed with each feature engineering technique mentioned. The log perplexity score was used as a performance measure to evaluate the LDA model and compare the Bag of Words, TF-IDF, and BM25 feature engineering techniques.

Related Items

ASHWINI DALVI.

PhishNet: a comprehensive system for detecting phishing websites using machine learning. International Conference on Advancing Technology in Engineering and Science,,October 04-05, 2025

ASHWINI DALVI.

Improving image visual quality of petroglyph images using ESRGAN (enhanced super-resolution generative adversarial networks). International Conference on Advancing Technology in Engineering and Science,,October 04-05, 2025

View All

Apply Now Enquire Now Chat with a Student