|
When there are multiple datasets which cover similar domains ( for example airports), its not obvious which dataset is preferable to use. It depends on your requirement, but one problem is that much of the information on which the decision might be based is not available or easily available such as: currency I quess an overarching criteria is veracity over the expected lifespan of the project, for which some of the above serve as surrogates. What do folks really use and how do they assess these dataset properties? |
|
How about "convenience", at least in the sense that: 1) the dataset covers the literals or ranges you need; 2) the dataset is available in a format that you can easily work with; (for a novice such as myself, unicode encodings give me all sorts of problems!); 3) the dataset appears to be relatively friction free in terms of licensing and/or payment for use; 4) the provenance seems okay, and where appropriate the data appears to be maintained? I guess directories such as CKAN could help in this area, for example by supporting trust/reputation metrics? |
|
Just come across this blog item by Stefan Urbanek on Data Quality which presents a useful discussion of criteria, although not without some terminological difficulties |
Get the Data