Hierarchical website classification system (by category)
- Category: NLP
- Client: Commercial Client
- Project date: 2022
The system assigns a website to one or more of more than 500 categories and is used for marketing and advertising campaigns.
Approaches
- Tf-Idf + NodesLocalClassifier
- ByteLevelBPE + HMCN
- Word-piece + RuBert
Data specifics
- Inaccessible web pages
- Incorrect labels
- Class imbalance
Model evaluation
Metrics for evaluating the quality of models:
- h-fbeta (MSE)
- h-precision
- h-recall
Metrics
HMCN | NLC | RuBert | |
---|---|---|---|
H-fbeta | 0.53 | 0.39 | 0.2 |
H-precision | 0.87 | 0.48 | 0.93 |
H-recall | 0.38 | 0.33 | 0.11 |