Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Algorithms to live by

Algorithms to live by

Practical applications of algorithms to real-life problems.

Aletheia

May 22, 2022
Tweet

More Decks by Aletheia

Other Decks in Technology

Transcript

  1. NOME CLIENTE 21/05/22 Who am I? Luca Bianchi, PhD Chief

    Technology Officer @ Neosperience AWS Hero, passionate about serverless and machine learning github.com/aletheia https://it.linkedin.com/in/lucabianchipavia https://speakerdeck.com/aletheia www.ai4devs.io @bianchiluca
  2. X Big-O notation — intuition As engineers we are interested

    into computational costs in different scenarios. Moreover, the best measure of an algorithm is its cost in term of time/computation in the worst case scenario. Computer science has developed a shorthand specifically for measuring algorithmic worst-case scenarios: it’s called “Big-O” notation. Big-O notation has a particular quirk, which is that it’s inexact by design. That is, rather than expressing an algorithm’s performance in minutes and seconds, Big-O notation provides a way to talk about the kind of relationship that holds between the size of the problem and the program’s running time.
  3. X Big-O notation — real life example Imagine you’re hosting

    a dinner party with n guests. The time required to clean the house for their arrival doesn’t depend on the number of guests at all. We can refer to this problem as O(1) complexity. Now, the time required to pass the roast around the table will be “Big-O of n,” written O(n), also known as “linear time”. What if, as the guests arrived, each one hugged the others in greeting? Your first guest hugs you; your second guest has two hugs to give; your third guest, three. How many hugs will there be in total? This turns out to be “Big-O of n-squared,” written O(n^2).
  4. X Our goal for today is to revise algorithms and

    data structure, providing examples about how to use them in real life scenarios.
  5. X Linked Lists A linked list is a linear data

    structure, in which the elements are not stored at contiguous memory locations. The elements in a linked list are linked using pointers
  6. X Linked Lists - Applications Image viewer – Previous and

    next images are linked, hence can be accessed by next and previous button. Previous and next page in web browser – We can access previous and next url searched in web browser by pressing back and next button since, they are linked as linked list. Music Player – Songs in music player are linked to previous and next song. you can play songs either from starting or ending of the list.
  7. X Linked Lists - Applications Redo and undo functionality –

    implement a list of operations to be followed in any direction (forward or backward) Most recently used – LinkedLists can be used to store most used functions or items within a website Jobs or Task Scheduling – First-Come-First-Served, Round Robin or other CPU job scheduling.
  8. X Arrays An array is a collection of items stored

    at contiguous memory locations. The idea is to store multiple items of the same type together. This makes it easier to calculate the position of each element by simply adding an offset to a base value, i.e., the memory location of the first element of the array (generally denoted by the name of the array).
  9. X Find the duplicate elements in a limited range array

    Given a limited range array of size n containing elements between 1 and n-1 with one element repeating, find the duplicate number in it without using any extra space.
  10. X Find the duplicate elements in a limited range array

    A naive solution would be to consider every subarray of size k and check for duplicates in it. The time complexity of this solution is O(n.k2) since there can be n subarrays of size k, and each subarray might take O(k2) time for checking duplicates.
  11. X Approach: Hashings The problem can be efficiently solved using

    hashing in O(n) time and O(n) extra space. The idea is to traverse the array and store each element and its index in a map, i.e., (element, index) as (key, value) pairs in a map. If any element is already found present on the map, check if that element repeats within the range of k using its previous occurrence information from the map.
  12. X Sorting Sorting is arranging items in a particular order

    so as to access them easily in later times. We apply sorting in our day to day life either knowingly or unknowingly in several instances. Our telephone directories, English Dictionaries are some examples in which names or words are arranged in alphabetical order. Ranks based on the scores is another common example. What is the advantage we can have from sorting the things. The biggest advantage is search and retrieval can happen much faster. Sorting is one fundamental operation done in computer databases.
  13. X The “socks” problem You as a student have to

    do the laundry, then recovert a pair of socks. The naive approach is to pull a sock out of the clean laundry hamper, then a second sock, compare to find a match, and if it isn’t the right one, throw that back into the laundry and pull a new sock. With just 10 different pairs of socks, following this method will take on average 19 pulls merely to complete the first pair, and 17 more pulls to complete the second. In total, you can expect to go fishing in the hamper 110 times just to pair 20 socks.
  14. X The “socks” problem — solution After socks are washed,

    to pair them, you first need to sort them. Pairing socks without sorting: N*N/2 comparisons Pairing socks after sorting: 0 comparisons. just taking every i*2 and i*2+1 sock together Do radix-sort with these as digits: • color • pattern of colors if exists • texture (cotton, silk, etc) • numbers • size Also eliminate any pairs found while in sorting phase.
  15. X The importance of sorting The first code ever written

    for a “stored program” computer was a program for efficient sorting. By the 1960s, one study estimated that more than a quarter of the computing resources of the world were being spent on sorting. With sorting size does matter: complexity and computational costs grow with list size.
  16. X Dynamic Programming Dynamic Programming in a nutshell is super

    easy: 1. Break down the problem into smaller pieces 2. Solve a smaller problem 3. Remember the result 4. Use these results to continue
  17. X Memoization Memoization is an optimization technique that makes applications

    more efficient and hence faster. It does this by storing computation results in cache, and retrieving that same information from the cache the next time it's needed instead of computing it again. Memoization is a type of dynamic programming, since the results are not computed every time, but stored for subsequent usages.
  18. X Memoization — example Let’s define a recursive function that

    we can use to display the first factorials up to n. The factorial is defined for an integer n, such that it is the product of that integer and all integers below it • 1! = 1, • 2! = 2*1= 2 • 3! = 3*2*1 = 6 Computing 10! is pretty neat. What about computing 5000! ?
  19. X Dynamic Programming — Applications • In Google Maps to

    find the shortest path between source and the series of destinations (one by one) out of the various available paths. • In networking to transfer data from a sender to various receivers in a sequential manner. • Document Distance Algorithms- to identify the extent of similarity between two text documents used by Search engines like Google, Wikipedia, Quora, and other websites • Edit distance algorithm used in spell checkers.
  20. X Dynamic Programming — Applications • Databases caching common queries

    in memory: through dedicated cache tiers storing data to avoid DB access, web servers store common data like configuration that can be used across requests. Then multiple levels of caching in code abstractions within every single request that prevents fetching the same data multiple times and save CPU cycles by avoiding recomputation. Finally, caches within your browser or mobile phones that keep the data that doesn't need to be fetched from the server every time. • Git merge. Document diffing is one of the most prominent uses of LCS. • Dynamic programming is used in TeX's system of calculating the right amounts of hyphenations and justifications. • Genetic algorithms.
  21. X Genetic Algorithms A genetic algorithm is a search heuristic

    that is inspired by Charles Darwin’s theory of natural evolution. This algorithm reflects the process of natural selection where the fittest individuals are selected for reproduction in order to produce offspring of the next generation.
  22. X Genetic Algorithms The process begins with a set of

    individuals which is called a Population. Each individual is a solution to the problem you want to solve. An individual is characterized by a set of parameters (variables) known as Genes. Genes are joined into a string to form a Chromosome (solution). In a genetic algorithm, the set of genes of an individual is represented using a string, in terms of an alphabet. Usually, binary values are used (string of 1s and 0s). We say that we encode the genes in a chromosome. The fitness function determines how fit an individual is (the ability of an individual to compete with other individuals). It gives a fitness score to each individual. The probability that an individual will be selected for reproduction is based on its fitness score. The idea of selection phase is to select the fittest individuals and let them pass their genes to the next generation.
  23. X Genetic Algorithms — applications • Automated design of sophisticated

    trading systems in the financial sector • Portfolio optimization (what-if scenarios) • Design of anti-terrorism systems 
 (https://www.researchgate.net/publication/ 23657202_Reducing_Risk_Through_Real_Options_in_Systems_Design_The_Case_of_Architecting_a_Maritime_Domain_Protection_System) • Marketing mix analysis • Design of particle accelerator beamlines 
 (https://www.sciencedirect.com/science/article/pii/S0168900218302158)
  24. X Queues A queue is an ordered collection of items

    where the addition of new items happens at one end. Queues are fundamental structures in computer science because enable a number of applications: • Producer / Consumer 
 producer workers push data into a queue, and consumers pull it out for processing. Producers and consumers are decoupled. • Fan-in / Fan-out strategies 
 queues are used to change a system throughput and slow down its requirements • Line management 
 the most basic usage of queue is to manage a line for a capped or shared resource (i.e. CPU, GPU, tickets, toilet access, …)
  25. X Issues with trees In most of the other self-balancing

    search trees (like AVL and Red-Black Trees), it is assumed that everything is in main memory. What can be done when we have a huge amount of data that cannot fit in main memory? Disk access time is very high compared to the main memory access time.
  26. X B Tree & B+ Tree Generalizes the binary search

    tree, allowing for nodes with more than two children. Well suited for storage systems that read and write relatively large blocks of data, such as disks. Commonly used in databases and file systems to store and retrieve data in an efficient manner. Complexity • Search: O(log n) • Insert: O(log n) • Delete: O(log n)
  27. X R Tree Indexing multi-dimensional information such as geographical coordinates,

    rectangles or polygons. Store spatial objects such as restaurant locations or the polygons that typical maps are made of: streets, buildings, outlines of lakes, coastlines, etc. and then find answers quickly to queries Complexity • Search O(log_M n) Average / O(n) Worst-Case
  28. X QuadTree The two-dimensional analog of octrees and are most

    often used to partition a two-dimensional space by recursively subdividing it into four quadrants or regions. • Image processing, • Mesh generation • Spatial indexing, point location queries, 
 and range queries, • Efficient collision detection in two dimensions • View frustum culling of terrain data • Storing sparse data • Conway’s Game of Life simulation program, • State estimation • CFD
  29. X Interval Tree Allows one to efficiently find all intervals

    that overlap with any given interval or point. Can be used for windowing queries, for instance, to find all roads on a computerized map inside a rectangular viewport, or to find all visible elements inside a three-dimensional scene. Complexity • Creation O(n log n) • Query all intervals that overlap with any given interval or point O(log n+m)
  30. X Graphs A graph is a structure made of nodes

    and edges (often called arcs). Graphs can be: • cyclic or acyclic • directed or undirected • weighted or unweighted
  31. X Cyclic and Acyclic Graphs • A cyclic graph is

    a graph that has cycles: moving from a node to the others you’ll find yourself back to the starting node. • A directed graph is made up of a set of vertices connected by directed edges often called arcs • A directed acyclic graph (DAG) is a directed graph with no directed cycles. • That is, it consists of vertices and edges (also called arcs), with each edge directed from one vertex to another, • Such that following those directions will never form a closed loop. • A directed graph is a DAG if and only if it can be topologically ordered, by arranging the vertices as a linear ordering that is consistent with all edge directions.
  32. X Detect Cycle in an Undirected Graph Depth First Search

    can be used to detect a cycle in a Graph. DFS for a connected graph produces a tree. There is a cycle in a graph only if there is a back edge present in the graph. A back edge is an edge that is joining a node to itself (self- loop) or one of its ancestor in the tree produced by DFS. To find the back edge to any of its ancestors keep a visited array and if there is a back edge to any visited node then there is a loop and return true.
  33. X Detect cycles using DFS 1. Create the graph using

    the given number of edges and vertices. 2. Create a recursive function that have current index or vertex, visited array and parent node. 3. Mark the current node as visited . 4. Find all the vertices which are not visited and are adjacent to the current node. Recursively call the function for those vertices, If the recursive function returns true return true. 5. If the adjacent node is not parent and already visited then return true. 6. Create a wrapper class, that calls the recursive function for all the vertices and if any function returns true, return true. 7. Else if for all vertices the function returns false return false.
  34. X

  35. X Directed Acyclic Graphs (DAG) An “acyclic graph” is a

    graph in which it is not possible to find at least one cyclic path. a DAG is a graph in which all the edges are directed, such that it is impossible to find a node and follow a sequence of edges that eventually loops back to the same node
  36. X Directed Acyclic Graphs (DAG) A key property of DAGs

    is that they have what is known as a “topological ordering”, which means that the nodes of a DAG can be put into a linear sequence with the nodes given an “ordering”, specifically nodes at the beginning of the sequence have a “lower value” than nodes at the end of the sequence. This topological ordering property, as well as other key properties, make DAGs very efficient at a number of tasks (such as finding the shortest path from one node to another) and are the reason DAGs have a wide range of use-cases.
  37. X Use case: PM DAGs are used in project management

    to plan, design, and implement complex projects or tasks. For example, DAGs are used in popular projects such as Apache Airflow (a workflow management system originally developed by Airbnb) and in Apache Spark. For instance, in Spark, DAGs are used to represent a chain of Resilient Distributed Dataset (RDD) dependencies
  38. X Use case: Blockchain A promising application of DAGs is

    in the development of faster and cheaper distributed ledgers. Despite the hype, distributed ledgers such as “blockchain” have failed to be widely adopted, due in large part to their poor scalability, low speed and high transaction costs. For example, the Bitcoin blockchain (which uses a linear sequence of blocks) only manages to process 4 to 7 transactions per second, which is simply not viable for wide scale adoption.
  39. X Use case: Blockchain A DAG based distributed ledger (due

    to its graph structure) can process hundreds of thousands of transactions per second, and do so with far lower transaction costs. This could enable use cases such as P2P energy trading to become entirely viable, a feat that has so far eluded current blockchains.
  40. X Use case: sources of bias DAGs can be used

    to identify confounding and sources of bias, which is particularly important in medical and clinical studies. For example, consider a clinical study with the objective of identifying the relationships between “screen time” and “childhood obesity”. It might seem reasonable to hypothesize that more screen time may lead to an increased risk of childhood obesity
  41. X Use case: sources of bias DAGs can be used

    to identify confounding and sources of bias, which is particularly important in medical and clinical studies. For example, consider a clinical study with the objective of identifying the relationships between “screen time” and “childhood obesity”. It might seem reasonable to hypothesize that more screen time may lead to an increased risk of childhood obesity
  42. X Use case: sources of bias However, higher screen time

    probably doesn’t cause obesity directly – it is more likely there is an intermediate process (being a reduction in physical activity) that is responsible for weight gain.
  43. X Use case: sources of bias However, there are factors

    (called confounders) that influence both the amount of screen time and the risk of obesity. Indeed the authors of this study identified low parental education to be such an influence.
  44. X Dijkstra’s Algorithm Dijkstra’s algorithm is an iterative algorithm that

    provides us with the shortest path from one particular starting node to all other nodes in the graph.
  45. X A* algorithm A* is an informed search algorithm, or

    a best-first search, meaning that it is formulated in terms of weighted graphs: starting from a specific starting node of a graph, it aims to find a path to the given goal node having the smallest cost (least distance travelled, shortest time, etc.). It does this by maintaining a tree of paths originating at the start node and extending those paths one edge at a time until its termination criterion is satisfied. Compared to Dijkstra's algorithm, the A* algorithm only finds the shortest path from a specified source to a specified goal, and not the shortest-path tree from a specified source to all possible goals
  46. X Applications • Google Maps • Logistics: finding Shortest Path.

    • Distance between the location refers to edges. • IP routing to find Open shortest Path First. • The telephone network • Search algorithms in AI
  47. X Binary search Many people use binary searches from childhood

    without being aware of it. For example, when you search for words in a dictionary, you don’t review all the words; you just check one word in the middle and thus narrow down the set of remaining words to check. A Self-Balancing Binary Search Tree is used to maintain sorted stream of data. For example, suppose we are getting online orders placed and we want to maintain the live data (in RAM) in sorted order of prices. For example, we wish to know number of items purchased at cost below a given cost at any moment. Or we wish to know number of items purchased at higher cost than given cost.
  48. X Neosperience People Analytics A solution to detect unique people

    within an image, track them across many different frames, and segment pixels into meaningful areas. Leverages the following algorithms: • unique detection —> hashing • tracking —> dynamic programming • segmentation —> trees
  49. X People Analytics Being able to recognise people and track

    their movements in front of a camera leds to interesting results not only related to people counting Store managers can obtain a clear view of the preferred areas inside a store And event the overall amount of people that do not enter the store Store Analytics over delivered about store understanding, delivering a different but more meaningful metric