We clearly do not have consensus on the use of the term curation. There have been good references for its use in and end-to-end data management context, and just for its common use in the archival context.
While I personally like the term curation, I think the consensus would be better served to heed Rick’s initial suggestion (and we appreciate his persistence to make his point), and change to two terms:
Let’s use Preparation for the process that turns raw data into information (cleansing, outlier removal, imputation, regularization, transformation, etc) for the follow-on analytics or visualization processes. (So we have collect-prepare-analyze-act).
Let’s use Curation to refer to the activities not on the critical path to achieve the analytics objective; but on the part to ensure all policy requirements are met and the data is always available both now with fault-tolerance and in the future (backup, archive, distribution across cloud regions,…)