Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Literacy

Data Literacy

On 7 June, Rostislav Yavorski, Head of Research, Exactpro, held the lecture on Data Literacy.

Data literacy is the ability to read, understand, create, and communicate data as information. Data literacy skills include knowing what data is appropriate to use, interpreting data visualisations, understanding data analytics tools and methods, data storytelling, etc.

---

Follow us on
LinkedIn https://www.linkedin.com/company/exactpro-systems-llc
Twitter https://twitter.com/exactpro

Exactpro

June 07, 2022
Tweet

More Decks by Exactpro

Other Decks in Technology

Transcript

  1. 1 BUILD SOFTWARE TO TEST SOFTWARE BUILD SOFTWARE TO TEST

    SOFTWARE exactpro.com Data Literacy Rostislav Yavorski Head of Research, Exactpro
  2. 4 BUILD SOFTWARE TO TEST SOFTWARE What is data literacy

    The ability to read, understand, create, and communicate data as information. Data literacy skills include the following abilities: • Knowing what data is appropriate to use for a particular purpose • Interpreting data visualizations • Understanding data analytics tools and methods • Data storytelling, communicating information about data to other people
  3. 5 BUILD SOFTWARE TO TEST SOFTWARE Data collection The process

    of gathering and measuring information on targeted variables in an established system, which then enables one to answer relevant questions and evaluate outcomes
  4. 6 BUILD SOFTWARE TO TEST SOFTWARE Data formats: CSV, JSON

    Comma Separated Values, CSV JavaScript Object Notation, JSON
  5. 7 BUILD SOFTWARE TO TEST SOFTWARE Data quality • Accuracy,

    whether or not given values are correct and consistent • Completeness, there should be no gaps or missing information • Reliability, how well a method measures something • Relevance, consistency between the content of data and the area of interest • Timeliness, the time between when information is expected and when it is readily available for use
  6. 8 BUILD SOFTWARE TO TEST SOFTWARE Data visualization The graphic

    representation of data, particularly efficient way of communicating Data visualization principles: • show the data • present many numbers in a small space • encourage the eye to compare different pieces of data • reveal the data at several levels of detail • serve clear purpose: description, exploration, or decoration
  7. 9 BUILD SOFTWARE TO TEST SOFTWARE 9 BUILD SOFTWARE TO

    TEST SOFTWARE Exploratory Data Analysis (EDA)
  8. 10 BUILD SOFTWARE TO TEST SOFTWARE Exploratory data analysis •

    Size, format, source • Data quality: missing values • Minimum, maximum, median, quartiles for each parameter • Outliers • Simple visualization: boxplot, histogram, scatter plot
  9. 11 BUILD SOFTWARE TO TEST SOFTWARE Exploratory data analysis I

    would recommend: • “What is Exploratory Data Analysis” by Mel Restori https://chartio.com/learn/data-analytics/what-is-exploratory-data-analysis/ • “What is Exploratory Data Analysis” by Prasad Patil https://towardsdatascience.com/exploratory-data-analysis-8fc1cb20fd15
  10. 12 BUILD SOFTWARE TO TEST SOFTWARE Google Sheets Tutorials •

    Railsware Product Academy, YouTube • Google Cloud Skills Boost, Video+Docs+Quiz • W3Schools, Lessons with practical examples • Goodwill Community Foundation, 19 lessons
  11. 23 BUILD SOFTWARE TO TEST SOFTWARE Todo list • Make

    a list of several datasets you like • Perform EDA on the one of your choice (a histogram and a scatter plot are vital) • Make up a story to present your insights from the data (3-5 slides) • Send to [email protected] • Be ready to defend with a 5 minutes talk
  12. 24 BUILD SOFTWARE TO TEST SOFTWARE Example 1. Software Defect

    Prediction Size: 10 885 modules, 22 attributes • 5 different lines of code measure • 3 McCabe metrics (cyclomatic, essential, design complexity) • 4 base Halstead measures (volume, length, difficulty, intelligence) • 8 derived Halstead measures, a branch-count • 1 goal field (module has/has not one or more reported defects) Hypotheses: • code with complicated pathways are more error-prone • code that is hard to read is more likely to be fault prone • static measures can never be a certain indicator of the presence of a fault https://www.kaggle.com/datasets/semustafacevik/software-defect-prediction
  13. 25 BUILD SOFTWARE TO TEST SOFTWARE Example 2. Operational Data

    from Enterprise Application https://www.kaggle.com/datasets/anomalydetectionml/rawdata Goal: effectively detect run-time anomalies using machine learning on operation metrics The dataset consists of metrics measured from the operating system and from WebLogic Server monitoring beans