As the need for ML increases rapidly across all industry sectors, there is a significant interest in integrating model inference into critical decision making workflows. Debugging ML enabled workflows is very challenging since an unexpected workflow result may be caused by errors in training data (e.g. wrong labels or corrupted features). In response, we envision a complaint-driven data debugging system that allow users to specify complaints over the workflow’s output. As a stepping stone towards such a general system, we build Rain, a complaint-driven data debugging system specialized for workflows integrating ML inference into SQL queries. Our proposed approach combines tools from influence analysis and database provenance to solve the problem holistically. Experimental results show that specifying complaints over query outputs can be as effective at detecting training data corruptions as manually correcting hundreds of model mispredictions.
This is a demo paper.