Our ProvCite paper was accepted at VLDB 2019!

Our paper,  ProvCite: Provenance-based Data Citation, was accepted for publication at the 45th International Conference on Very Large Data Bases 2019. Here’s the abstract.

A computational challenge associated with data citation is how to automatically generate citations to arbitrary queries against a structured dataset. Previous work has explored this problem in the context of conjunctive queries and views using a Rewriting-based Model (RBM). However, an increasing number of scienti c queries are aggregate, e.g. showing statistical summaries of the underlying data, for which the RBM cannot be easily extended. In this paper, we show how a Provenance-Based Model (PBM) can be leveraged to 1) generate citations to conjunctive as well as aggregate queries and views; 2) associate citations with individual result tuples to enable arbitrary subsets of the result set to be cited (fi ne-grained citations); and 3) be optimized to return citations in acceptable time. Our implementation of PBM in ProvCite shows that it not only handles a larger class of queries and views than RBM, but can outperform it when restricted to conjunctive views.

Leave a Reply

Your email address will not be published. Required fields are marked *