Improving Your ML Datasets, Part 2: NER

Improving Your ML Datasets, Part 2: NER

In our first post, we dug into 20 Newsgroups, a standard dataset for text classification. We uncovered numerous errors and garbage samples, cleaned  about 6.5% of the dataset, and improved validation by 7.24 point F1-score. In this blog, we look at a new task: Named...