Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Data Analytics: Best techniques, tools and different usage

Data Analytics: Best techniques, tools and different usage

Teaching demo material presented via online platform (Google Meet) in Cavite State University – Silang Campus on Sept 30, 2021.

Jaychrist Teves

September 30, 2021

More Decks by Jaychrist Teves

Other Decks in Technology


  1. Best techniques, tools and different usage DATA ANALYTICS Mr. Jaychrist

    Teves Cavite State University - Silang, Campus September 30, 2021
  2. Here’s what you’ll learn from this session: 1. What is

    Data Analytics 2. Various types of Data Analytics 3. Role of Data Analytics in solving real-world problems 4. The importance of Data Analytics 5. What are the analytical tools used in data analytics 6. What are the best techniques and different usage of Data Analytics 7. What is the career growth in data analytics What to expect
  3. I'm a CvSU Silang Campus graduate of Computer Science batch

    2015, user experience designer by profession and an educator at heart with excellent skills demonstrated by years of experience in the field of innovation, entrepreneurship and design. For the past several years (11+ years), I’m passionately helping local entrepreneurs and micro, small to medium brands & businesses in the Philippines transform, design and build functional-meaningful services and digital products. About the speaker
  4. • Google I/O Manila by GDG Philippines • Center for

    Technopreneurship and Innovations – Batangas State University • Hack Manila 2018 • WeRemote Philippines • Ambidextr Media • Creative Manila – Portfolio of the week • Fomolist – Filipino Tech & Business • Innovation+ Creative Edge • Hustle to Freedom Podcast (US) • And other local educational podcasts, design and business related digital media initiatives Notable Feature

    • Draw meaningful conclusions with actionable approach • Execution, usability testing and iteration PROCESS DATA PROBLEM & SOLUTION Identify the problem and sketch possible solutions 01 • Identify data sources • Select the data • Clean the data • Transform the data 02 Data Analytics at a glance 03 Preprocessing Analytics Post-processing
  6. Understanding the problem Historical Data Generally, this process begins with

    descriptive analytics. Descriptive analytics aims to answer the question “what happened?” Advance Analytics Industry practice used that is part of data science which takes advantage of advanced tools to extract data, make predictions and discover trends. This process addresses “what if?”
  7. Designing solutions Cost Estimate Who Will Benefit Ideation Involves of

    sketching of hundreds & thousands of possible solutions in sketchpad or similar digital tools. Expected Outcome Process begins on drawing meaningful conclusions from complex and varied data sources, and presentation of ROI / Return of Investment It’s great to understand why your idea(s) would be worth pursuing and who will greatly benefit when it is executed. Understanding and giving almost perfect financial evaluations of worth of your solution including timeline, skillsets and expertise involve
  8. Data Processing Identify Data Sources In business it can be

    from direct competitors, teammates, other similar organization and notable materials such as academic or scientifically proven research. Select Important Data Clean the Data Transform the Data This process involves of identifying what’s really useful for your project or suggested solution. You have to be very careful and selective to be able to extract meaningful conclusion. Involves of carefully and selectively data cleaning. It’s needs a very detailed-oriented individual aided by data analytics tools. Process involves of transforming and customizing cleaned data to meet desired goals.
  9. Analyse, interpret, and deploy solutions Process begins with analyzing what

    works and what stick in a long-term basis then deploying solutions (in a form of software, new process or innovative team) to meet expected outcome and goals. 0
  10. 4 Primary Types of Data Analytics Descriptive Helps answer questions

    about what happened Diagnostic Helps answer questions about why things happened Predictive helps answer questions about what will happen in the future Prescriptive helps answer questions about what should be done
  11. • These techniques summarize large datasets to describe outcomes to

    stakeholders. By developing key performance indicators (KPIs,) these strategies can help track successes or failures. Metrics such as return on investment (ROI) are used in many industries. • Specialized metrics are developed to track performance in specific industries. This process requires the collecHon of relevant data, processing of the data, data analysis and data visualizaHon. This process provides essenHal insight into past performance. Descriptive Analytics
  12. • These techniques supplement more basic descripHve analyHcs. They take

    the findings from descripHve analyHcs and dig deeper to find the cause. • The performance indicators are further invesHgated to discover why they got beJer or worse. This generally occurs in three steps: Diagnostic Analytics 1. IdenHfy anomalies in the data. These may be unexpected changes in a metric or a parHcular market. 2. Data that is related to these anomalies is collected. 3. StaHsHcal techniques are used to find relaHonships and trends that explain these anomalies.
  13. • These techniques use historical data to idenHfy trends and

    determine if they are likely to recur. • PredicHve analyHcal tools provide valuable insight into what may happen in the future and its techniques include a variety of staHsHcal and machine learning techniques, such as: neural networks, decision trees, and regression. Predictive Analytics
  14. • By using insights from predicHve analyHcs, data-driven decisions can

    be made. This allows businesses to make informed decisions in the face of uncertainty. • PrescripHve analyHcs techniques rely on machine learning strategies that can find paJerns in large datasets. By analyzing past decisions and events, the likelihood of different outcomes can be esHmated. Prescriptive Analytics
  15. • Data analysts exist at the intersecHon of informaHon technology,

    staHsHcs and business. They combine these fields in order to help businesses and organizaHons succeed. The primary goal of a data analyst is to increase efficiency and improve performance by discovering paJerns in data. • The work of a data analyst involves working with data throughout the data analysis pipeline. This means working with data in various ways. The primary steps in the data analyHcs process are data mining, data management, staHsHcal analysis, and data presentaHon. The importance and balance of these steps depend on the data being used and the goal of the analysis. • AddiHonally, they discover how data can be used to answer quesHons and solve problems. With the development of computers and an ever increasing move toward technological intertwinement, data analysis has evolved. The development of the relaHonal database gave a new breath to data analysts, which allowed analysts to use SQL (pronounced “sequel” or “s-q-l”) to retrieve data from databases. Data Analyst
  16. Data mining is an essen+al process for many data analy+cs

    tasks. This involves extracHng data from unstructured data sources. These may include wriJen text, large complex databases, or raw sensor data. The key steps in this process are to extract, transform, and load data (oZen called ETL.) These steps convert raw data into a useful and manageable format. This prepares data for storage and analysis. Data mining is generally the most Hme-intensive step in the data analysis pipeline. Data Mining
  17. Data management or data warehousing is another key aspect of

    a data analyst’s job. Data warehousing involves designing and implemenHng databases that allow easy access to the results of data mining. This step generally involves creaHng and managing SQL databases. Non-relaHonal and NoSQL databases are becoming more common as well. Data Management
  18. The final step in most data analy+cs processes is data

    presenta+on. This step allows insights to be shared with stakeholders. Data visualizaHon is oZen the most important tool in data presentaHon. Compelling visualizaHons can help tell the story in the data which may help execuHves and managers understand the importance of these insights. Data Presentation
  19. “'Data is the new oil' is a popular quote pinpointing

    the increasing value of data and — to our liking — accurately characterizes data as raw material. Data are to be seen as an input or basic resource needing further processing before actually being of use.”
  20. Data Analytics Model DEFINE a thorough definition of the business

    problem to be addressed is needed INTERPRET + EVALUATE The key issue is to find the unknown yet interesting and actionable patterns (sometimes also referred to as knowledge diamonds) that can provide new insights into your data that can then be translated into new profit opportunities! SOURCE The golden rule here is: the more data, the better! The analytical model itself will later decide which data are relevant and which are not for the task at hand. PROCESS + TRANSFORM Depending on the business objective and the exact task at hand, a particular analytical technique will be selected and implemented by the data scientist. STEP 01 STEP 02 STEP 03 STEP 04 Steps in the development, implementation, and operation of analytics within an organization. Some examples are: customer segmentation of a mortgage portfolio, retention modeling for a postpaid Telco subscription, or fraud detection for credit cards. All data will then be gathered and consolidated in a staging area which could be, for example, a data warehouse, data mart, or even a simple spreadsheet file. Analytical model will be estimated on the preprocessed and transformed data STEP 05 VALIDATED + APPROVED it can be put into production as an analytics application (e.g., decision support system, scoring engine). Important considerations here are how to represent the model output in a user- friendly way
  21. • Google AnalyHcs is used to track business website performance

    and collect visitor insights. It can help organizaHons determine top sources of user traffic, gauge the success of their markeHng acHviHes and campaigns, track goal compleHons (such as purchases, adding products to carts), discover paJerns and trends in user engagement and obtain other visitor informaHon such as demographics. • Small and medium-sized retail websites oZen use Google AnalyHcs to obtain and analyze various customer behavior analyHcs, which can be used to improve markeHng campaigns, drive website traffic and beJer retain visitors. • Google AnalyHcs acquires user data from each website visitor through the use of page tags. A JavaScript page tag is inserted into the code of each page. This tag runs in the web browser of each visitor, collecHng data and sending it to one of Google's data collecHon servers. • Google AnalyHcs can then generate customizable reports to track and visualize data such as the number of users, bounce rates, average session duraHons, sessions by channel, page views, goal compleHons and more. The page tag funcHons as a web bug or web beacon, to gather visitor informaHon. However, because it relies on cookies, the system can't collect data for users who have disabled them. Google Analytics
  22. • Python was at first designed as an Object-Oriented Programming

    language for programming and web improvement and later upgraded for data science. • It is the quickest developing programming language today. Python is an amazing data analyHcs tool and has an incredible set of friendly libraries for any part of scienHfic compuHng. • With Python, you can do advanced data manipulaHons and numeric analysis uHlizing data frames. Pandas is an integral tool for data masking, indexing and grouping data, data visualizing, data cleaning, and much more. Python
  23. • Apache Spark is one of the amazing open-source big

    data analyHcs tools. It offers more than 80 high-level administrators that make it simple to assemble parallel applicaHons. • It is one of the open-source data analyHcs tools uHlized by a wide range of companies to handle huge datasets. • It assists with running an applicaHon in a Hadoop cluster, up to mulHple Hmes quicker in memory, and mulHple Hmes quicker on disk. It is one of the open- source big data analyHcs tools that gives built-in APIs in Java, Scala, or Python. Apache Spark
  24. Apache Spark Architecture Apache Spark works in a master-slave architecture

    where the master is called “Driver” and slaves are called “Workers”. When you run a Spark applicaHon, Spark Driver creates a context that is an entry point to your applicaHon, and all operaHons (transformaHons and acHons) are executed on worker nodes, and the resources are managed by Cluster Manager.
  25. 4,498,300,000 Imagine serving billions of user or customer without Data

    Analytics. What would you do to cope up with fast changing needs?
  26. PredicHng paHent outcomes, efficiently allocaHng funding and improving diagnosHc techniques

    are just a few examples of how data analyHcs is revoluHonizing healthcare. Data Analytics in Healthcare
  27. The pharmaceuHcal industry is also being revoluHonized by machine learning.

    Drug discovery is a complex task with many variables. Machine learning can greatly improve drug discovery. PharmaceuHcal companies also use data analyHcs to understand the market for drugs and predict their sales. Data Analytics in Healthcare
  28. Data Analytics in Business • Being uHlized for creaHng new

    products and services • Being uHlized for compeHtor research • Being uHlized for predicHng trends and business value • Being uHlized for markeHng and sales report • Being uHlised for analyzing and predicHng future consumer behaviour
  29. According to O*NET, the projected growth for data analysts is

    8% between 2019-2029. On average, data analysts earned $94,280 in 2019. However, salary compensaHon for data analysts varies depending on where they work and what industry they work in. Industry Insight
  30. - The InsHtute and Faculty of Actuaries (IFoA) & the

    Royal StaHsHcal Society (RSS) (2019) “A Guide for Ethical Data Science.” hJps:/ /www.rss.org.uk/Images/PDF/influencing-change/ 2019/A-Guide-for-Ethical-Data-Science-Final-Oct-2019.pdf - Unwin, A. (2020). “Why is Data VisualizaHon Important? What is Important in Data VisualizaHon?” Harvard Data Science Review, 2(1). hJps:/ /doi.org/10.1162/99608f92.8ae4d525 - Wing, J. M. (2019). “The Data Life Cycle,” Harvard Data Science Review, 1(1). hJps:/ /doi.org/10.1162/99608f92.e26845b4 - Wing, J. M. (2020). “Ten Research Challenge Areas in Data Science,” Harvard Data Science Review, 2(3). hJps:/ /doi.org/10.1162/99608f92.c6577b1f - Yong, F. H. (2015), “QuanHtaHve Methods for StraHfied Medicine.” PhD Disserta3on, Department of BiostaHsHcs, Harvard T.H. Chan School of Public Health, Harvard University. hJps:/ / dash.harvard.edu/handle/1/17463130 - Yousra, A., Salleh, M., & Razzaque, M.A. (2015). “A Comprehensive Review on Privacy Preserving Data Mining.” SpringerPlus 4:694. hJps:/ /link.springer.com/arHcle/10.1186/ s40064-015-1481-x - Zhao, Y. (2017). “UpliZ Modeling with MulHple Treatments.” PhD Disserta3on, Department of Electrical Engineering and Computer Science, MIT. hJps:/ /dspace.mit.edu/handle/ 1721.1/113979 - Zhang, W., Li, J., & Liu, L. (2020). “A Unified Survey on Treatment Effect Heterogeneity and UpliZ Modeling.” hJps:/ /arxiv.org/pdf/2007.12769.pdf - Master in Data Science (2021) - Northeastern University (2020) - SAS InsHtute Inc. (2019). Big data in business analyHcs: Talking about the analyHcs process model - TechTarget (2020). Search Business AnalyHcs - Towards Data Science (2019) - AnalyHcs Insight (2020) - Technopreneurship in the Philippines (2020) - Slidesgo Data AnalyHcs Keynote Template References