Trust as a Proxy Measure for the Quality of VGI in the Case of OSM

Trust as a Proxy Measure for the Quality of VGI in the Case of OSM

Paper presented at AGILE 2013 in Leuven, Belgium. The paper is available from http://carsten.io/kessler-de_groot-agile-2013.pdf

Ee36c21b1a92a643c73b120fafe10b54?s=128

Carsten Keßler

May 20, 2013
Tweet

Transcript

  1. Carsten Keßler a,b and René de Groot a a Institute

    for Geoinformatics, University of Münster | b soon: Hunter College, CUNY http://carsten.io | @carstenkessler Trust as a Proxy Measure for the Quality of VGI in the Case of OSM
  2. The Idea ‣ Develop a measure to assess the degree

    to which a data consumer can trust the quality of a feature
  3. The Idea ‣ Develop a measure to assess the degree

    to which a data consumer can trust the quality of a feature ‣ Trust measure is based on a feature’s editing history
  4. The Idea ‣ Develop a measure to assess the degree

    to which a data consumer can trust the quality of a feature ‣ Trust measure is based on a feature’s editing history ‣ Benefits ‣ Works at feature level ‣ Filter features by quality ‣ Spot problematic features
  5. Does this work? Can we reliably assess the quality of

    a feature in OpenStreetMap based on its editing history?
  6. Does this work? Can we reliably assess the quality of

    a feature in OpenStreetMap based on its editing history? amenity = university name = Institute for Geoinformatics v1
  7. Does this work? Can we reliably assess the quality of

    a feature in OpenStreetMap based on its editing history? amenity = university name = Institute for Geoinformatics amenity = university building = yes name = Institute for Geoinformatics v1 v2
  8. Does this work? Can we reliably assess the quality of

    a feature in OpenStreetMap based on its editing history? amenity = university name = Institute for Geoinformatics amenity = university building = yes name = Institute for Geoinformatics addr:city = Münster addr:country = DE addr:housenumber = 253 addr:street = Weseler Straße building = yes wheelchair = limited v1 v2 v3 …
  9. OSM Heatmap Kudos: Johannes Trame

  10. OSM Provenance Ontology http://carsten.io/osm/osm-provenance.rdf prv:Tag includesEdit Changeset prv:CreationGuideline Edit prv:createdBy

    prv:precededBy prv:usedData NodeState WayState prv:DataCreation User prv:performedBy changesGeometry addsTag removesTag changesValueOfKey rdfs:Literal prv:DataItem prv:HumanActor subClassOf hasTag FeatureState
  11. Does this work? ‣ Get a first idea whether this

    is a viable approach ‣ Compare results of ‣ a simple trust measure and ‣ observed feature quality ‣ Is there a correlation between the two?
  12. Study area: Münster’s old town

  13. Feature Selection

  14. Feature Selection ‣ Re-mapping the whole district was not feasible

  15. Feature Selection ‣ Re-mapping the whole district was not feasible

    ‣ Up to 100 features were manageable
  16. Feature Selection ‣ Re-mapping the whole district was not feasible

    ‣ Up to 100 features were manageable ‣ Selection based on minimum number of versions
  17. Feature Selection ‣ Re-mapping the whole district was not feasible

    ‣ Up to 100 features were manageable ‣ Selection based on minimum number of versions ‣ 74 features with 6+ versions
  18. 74 features selected

  19. Trust measure

  20. Trust measure ‣ Positive factors: ‣ Versions ‣ Users ‣

    Indirect confirmations = edits in the direct vicinity (50m)
  21. Trust measure ‣ Positive factors: ‣ Versions ‣ Users ‣

    Indirect confirmations = edits in the direct vicinity (50m) ‣ Negative factors: ‣ Tag corrections ‣ Rollbacks
  22. Trust measure (contd.) ‣ Classification for each factor: 5 equal

    classes ‣ Combined into one classification ‣ Equal weights
  23. Trust measure

  24. Field Survey ‣ Thematic accuracy 4 classes: 1. Main tag

    wrong 2. Other tags wrong 3. Thematic ambiguities 4. Thematically correct
  25. Field Survey ‣ Thematic accuracy 4 classes: 1. Main tag

    wrong 2. Other tags wrong 3. Thematic ambiguities 4. Thematically correct ‣ Results: ‣ 6 features (~8%) ‣ 2 features (~3%) ‣ 9 features (~12%) ‣ 57 features (~77%)
  26. Field Survey (contd.) ‣ Topological consistency

  27. Field Survey (contd.) ‣ Topological consistency ‣ Is the feature

    correctly positioned relative to the surrounding features?
  28. Field Survey (contd.) ‣ Topological consistency ‣ Is the feature

    correctly positioned relative to the surrounding features? ‣ Results: ‣ 73 out of 74 features (~99%)
  29. Field Survey (contd.) ‣ Topological consistency ‣ Is the feature

    correctly positioned relative to the surrounding features? ‣ Results: ‣ 73 out of 74 features (~99%) ‣ Information completeness ‣ TF-IDF measure to identify relevant tags per main tag
  30. Field Survey (contd.) ‣ Topological consistency ‣ Is the feature

    correctly positioned relative to the surrounding features? ‣ Results: ‣ 73 out of 74 features (~99%) ‣ Information completeness ‣ TF-IDF measure to identify relevant tags per main tag ‣ ~37% tags missing (avg.)
  31. Observed quality: combined results

  32. Trust measure

  33. None
  34. mean quality class: ~4.2 mean trust class: ~2.8

  35. Do we get the trend right?

  36. Do we get the trend right? ‣ Removed outliers ‣

    Kendall’s τ: 0.52 ‣ Moderate, but significant positive correlation
  37. Conclusions

  38. Conclusions ‣ Initial study

  39. Conclusions ‣ Initial study ‣ A feature’s history can determine

    its trustworthiness
  40. Conclusions ‣ Initial study ‣ A feature’s history can determine

    its trustworthiness ‣ Trust values correlate with observed quality
  41. Conclusions ‣ Initial study ‣ A feature’s history can determine

    its trustworthiness ‣ Trust values correlate with observed quality ‣ Even with a very simple model
  42. Conclusions ‣ Initial study ‣ A feature’s history can determine

    its trustworthiness ‣ Trust values correlate with observed quality ‣ Even with a very simple model ‣ Outliers cannot be explained yet
  43. Tons of Future Work

  44. Tons of Future Work ‣ Extend and refine the trust

    model: Classification, weighting, positive vs negative aspects, …
  45. Tons of Future Work ‣ Extend and refine the trust

    model: Classification, weighting, positive vs negative aspects, … ‣ Social aspects: Who has edited a feature?
  46. Tons of Future Work ‣ Extend and refine the trust

    model: Classification, weighting, positive vs negative aspects, … ‣ Social aspects: Who has edited a feature? ‣ Repeat study without spatial focus
  47. Tons of Future Work ‣ Extend and refine the trust

    model: Classification, weighting, positive vs negative aspects, … ‣ Social aspects: Who has edited a feature? ‣ Repeat study without spatial focus ‣ How to scale the data collection?
  48. Tons of Future Work ‣ Extend and refine the trust

    model: Classification, weighting, positive vs negative aspects, … ‣ Social aspects: Who has edited a feature? ‣ Repeat study without spatial focus ‣ How to scale the data collection? ‣ Learn the trust model from the data
  49. Thank you! All data used in this research © OpenStreetMap

    contributors. carsten.kessler@uni-muenster.de | http://carsten.io | @carstenkessler Carsten Keßler | René de Groot