Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Trust as a Proxy Measure for the Quality of VGI in the Case of OSM

Trust as a Proxy Measure for the Quality of VGI in the Case of OSM

Paper presented at AGILE 2013 in Leuven, Belgium. The paper is available from http://carsten.io/kessler-de_groot-agile-2013.pdf

Carsten Keßler

May 20, 2013
Tweet

More Decks by Carsten Keßler

Other Decks in Research

Transcript

  1. Carsten Keßler a,b and René de Groot a a Institute

    for Geoinformatics, University of Münster | b soon: Hunter College, CUNY http://carsten.io | @carstenkessler Trust as a Proxy Measure for the Quality of VGI in the Case of OSM
  2. The Idea ‣ Develop a measure to assess the degree

    to which a data consumer can trust the quality of a feature
  3. The Idea ‣ Develop a measure to assess the degree

    to which a data consumer can trust the quality of a feature ‣ Trust measure is based on a feature’s editing history
  4. The Idea ‣ Develop a measure to assess the degree

    to which a data consumer can trust the quality of a feature ‣ Trust measure is based on a feature’s editing history ‣ Benefits ‣ Works at feature level ‣ Filter features by quality ‣ Spot problematic features
  5. Does this work? Can we reliably assess the quality of

    a feature in OpenStreetMap based on its editing history?
  6. Does this work? Can we reliably assess the quality of

    a feature in OpenStreetMap based on its editing history? amenity = university name = Institute for Geoinformatics v1
  7. Does this work? Can we reliably assess the quality of

    a feature in OpenStreetMap based on its editing history? amenity = university name = Institute for Geoinformatics amenity = university building = yes name = Institute for Geoinformatics v1 v2
  8. Does this work? Can we reliably assess the quality of

    a feature in OpenStreetMap based on its editing history? amenity = university name = Institute for Geoinformatics amenity = university building = yes name = Institute for Geoinformatics addr:city = Münster addr:country = DE addr:housenumber = 253 addr:street = Weseler Straße building = yes wheelchair = limited v1 v2 v3 …
  9. OSM Provenance Ontology http://carsten.io/osm/osm-provenance.rdf prv:Tag includesEdit Changeset prv:CreationGuideline Edit prv:createdBy

    prv:precededBy prv:usedData NodeState WayState prv:DataCreation User prv:performedBy changesGeometry addsTag removesTag changesValueOfKey rdfs:Literal prv:DataItem prv:HumanActor subClassOf hasTag FeatureState
  10. Does this work? ‣ Get a first idea whether this

    is a viable approach ‣ Compare results of ‣ a simple trust measure and ‣ observed feature quality ‣ Is there a correlation between the two?
  11. Feature Selection ‣ Re-mapping the whole district was not feasible

    ‣ Up to 100 features were manageable ‣ Selection based on minimum number of versions
  12. Feature Selection ‣ Re-mapping the whole district was not feasible

    ‣ Up to 100 features were manageable ‣ Selection based on minimum number of versions ‣ 74 features with 6+ versions
  13. Trust measure ‣ Positive factors: ‣ Versions ‣ Users ‣

    Indirect confirmations = edits in the direct vicinity (50m)
  14. Trust measure ‣ Positive factors: ‣ Versions ‣ Users ‣

    Indirect confirmations = edits in the direct vicinity (50m) ‣ Negative factors: ‣ Tag corrections ‣ Rollbacks
  15. Trust measure (contd.) ‣ Classification for each factor: 5 equal

    classes ‣ Combined into one classification ‣ Equal weights
  16. Field Survey ‣ Thematic accuracy 4 classes: 1. Main tag

    wrong 2. Other tags wrong 3. Thematic ambiguities 4. Thematically correct
  17. Field Survey ‣ Thematic accuracy 4 classes: 1. Main tag

    wrong 2. Other tags wrong 3. Thematic ambiguities 4. Thematically correct ‣ Results: ‣ 6 features (~8%) ‣ 2 features (~3%) ‣ 9 features (~12%) ‣ 57 features (~77%)
  18. Field Survey (contd.) ‣ Topological consistency ‣ Is the feature

    correctly positioned relative to the surrounding features?
  19. Field Survey (contd.) ‣ Topological consistency ‣ Is the feature

    correctly positioned relative to the surrounding features? ‣ Results: ‣ 73 out of 74 features (~99%)
  20. Field Survey (contd.) ‣ Topological consistency ‣ Is the feature

    correctly positioned relative to the surrounding features? ‣ Results: ‣ 73 out of 74 features (~99%) ‣ Information completeness ‣ TF-IDF measure to identify relevant tags per main tag
  21. Field Survey (contd.) ‣ Topological consistency ‣ Is the feature

    correctly positioned relative to the surrounding features? ‣ Results: ‣ 73 out of 74 features (~99%) ‣ Information completeness ‣ TF-IDF measure to identify relevant tags per main tag ‣ ~37% tags missing (avg.)
  22. Do we get the trend right? ‣ Removed outliers ‣

    Kendall’s τ: 0.52 ‣ Moderate, but significant positive correlation
  23. Conclusions ‣ Initial study ‣ A feature’s history can determine

    its trustworthiness ‣ Trust values correlate with observed quality
  24. Conclusions ‣ Initial study ‣ A feature’s history can determine

    its trustworthiness ‣ Trust values correlate with observed quality ‣ Even with a very simple model
  25. Conclusions ‣ Initial study ‣ A feature’s history can determine

    its trustworthiness ‣ Trust values correlate with observed quality ‣ Even with a very simple model ‣ Outliers cannot be explained yet
  26. Tons of Future Work ‣ Extend and refine the trust

    model: Classification, weighting, positive vs negative aspects, …
  27. Tons of Future Work ‣ Extend and refine the trust

    model: Classification, weighting, positive vs negative aspects, … ‣ Social aspects: Who has edited a feature?
  28. Tons of Future Work ‣ Extend and refine the trust

    model: Classification, weighting, positive vs negative aspects, … ‣ Social aspects: Who has edited a feature? ‣ Repeat study without spatial focus
  29. Tons of Future Work ‣ Extend and refine the trust

    model: Classification, weighting, positive vs negative aspects, … ‣ Social aspects: Who has edited a feature? ‣ Repeat study without spatial focus ‣ How to scale the data collection?
  30. Tons of Future Work ‣ Extend and refine the trust

    model: Classification, weighting, positive vs negative aspects, … ‣ Social aspects: Who has edited a feature? ‣ Repeat study without spatial focus ‣ How to scale the data collection? ‣ Learn the trust model from the data
  31. Thank you! All data used in this research © OpenStreetMap

    contributors. [email protected] | http://carsten.io | @carstenkessler Carsten Keßler | René de Groot