Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Actually, it is easy to come up with reasonably decent heuristics that can auto-tag a corpus. From that you can look for anomalies and adjust your tagging system.

The problem of getting a representative body is (surprisingly) much harder than the annotation. I know. I spent quite some time years ago doing this.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: