Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rewriting Recipe Search: A dash of sugar, just a smidgen of graph database

Rewriting Recipe Search: A dash of sugar, just a smidgen of graph database

I’m starvin’. I throw open my laptop, and search by what I've got in the house to minimize me running to the store. I'm craving a flavor, not necessarily a cuisine. Some southern food has its origins in West African food. How do recipe searches address this? How do we combine curation of good recipe content with better search to avoid having to build a recommendation engine? In this talk, we’ll take a look at the current state of web recipe discovery and how I naively tried to come up with an alternative with brute JavaScript force and graph-databasing(Neo4j); like baking a souffle for a dinner party and doing a no-fall dance before it comes out of the oven. Plan for the worst and be delighted when it doesn’t happen(or chuckle when it does)!

Tracy Hinds

June 01, 2015
Tweet

More Decks by Tracy Hinds

Other Decks in Technology

Transcript

  1. RECIPE SEARCH
    Rewriting
    A dash of sugar, just a smidgen of graph database

    View Slide

  2. Tracy Hinds
    Tracy, Web Engineer, conference organizer, sugar fiend.
    I’ve worn many hats. I have been a conference organizer, a baker, an addictions program coordinator, a restaurant manager. I’m currently a web engineer(this is not the exhaustive list).
    !
    Because I love making, whether it is via code or cookies, I’d long considered culinary school as a means to fulfillment. I nerd out on cooking, baking, and recipes as much I make long lists
    of all the new technologies I’m excited to keep up with in JavaScript.
    !
    In one of my younger fits of courage, I was determined to run my own bakery. The idea of being able to take recipes I’ve crafted to perfection and build a successful business was
    intoxicating. In hindsight, we maybe could have chalked this up to me being on a pretty serious sugar mania.

    View Slide

  3. On many a night, I finish working and I’m starvin’. I throw open my laptop, and search by what I've got in the house to minimize me running to the store. I'm craving a flavor, not
    necessarily a cuisine. How do recipe searches address this? How do we combine curation of good recipe content with better search to avoid having to build a recommendation engine?
    !
    In one of my prior lives, I was a baker. I love cooking. I especially love baking. I love researching recipes. I delight in the ‘why’ as much as the ‘how’. Following the methods, repeating
    experiments, seeing what I’ve done wrong.
    !
    My methodology for making something new begins with recipe compilation in paper and digital formats. I read the comments(lots and lots of them). I find where cultural habits have
    crept in and try to hone in on the recipe to what I believe to be the most honest-to-history and then run with it.

    View Slide

  4. So to do this, where do you go?
    Books and magazine articles I’ve hoarded over the years.(This very quickly became nearly obsolete for me as I’ve downsized and moved around) You can only schlep a box full of
    professional culinary recipe books up a few flights of stairs so many times before you start wishing it were digitized.
    !
    The Internet!
    Recipe search. Turns out Google has this thing they do pretty well. I have my favorite sources--often food bloggers who can be trusted to test their recipes and accept feedback for
    improvement. The ones that tend to give me the best recipes time and time again. How do I find them? google.com This wins for me right now. Curated web vs. social web(something
    like AllRecipes) is a super interesting rabbit hole I was distracted by but will spare you until another talk.

    View Slide

  5. So then I watched Alton Brown say: “ ...and there is no way to find good recipes on the Internet, I don’t think. Not through a general recipe search.” (Google Presents Alton Brown:
    Good Eats 3, the Later Years. 34:42) and was pretty flabbergasted. Yet nodding my head in agreement. (Play clip in slide showing on the screen here)

    View Slide

  6. BigOven
    Yummly
    Food2Fork
    Edamam
    many
    So choices.
    From a content perspective, what’s the current state of recipe discovery? There are recipe APIs! Yummly, BigOven, WebKnox Recipe, Food2Fork
    that provide your website with promises of X00,000 recipes to include but what does this mean if the data you have is limited? I want to cross-
    reference reviews, ingredient lists, types of food, or time of preparation/cooking. It’s not that I think all of these resources, in existence for years, are
    broken. I want to play with the idea of ‘another way’. I wanted more--to explore.

    View Slide

  7. The bright side
    AllRecipes is the most robust site for collection and discovery I have yet to come across. It labels itself as ‘a food-focused social media website’. I would say that’s fair.
    What’s been so interesting about AllRecipes over the years is that even when adding features and cleaning up the UI, the user added comments have remained AS-
    valuable as the recipes. It’s like I’ve got my mom standing in the kitchen next to me telling me not to use so much baking soda or a add a little more salt. Blogs generally
    have lower traffic. Most of the commenters are appreciating the art of the food photography and descriptive writing, but haven’t actually tested the recipe(and they’ll
    say so).

    View Slide

  8. Might seem like a silly question, but what are recipes? They can be a fairly standard formula of how to cook or bake something you know you want. They can be an experiment to try flavors you’re curious
    about. They can be a family legacy passed down and recreated with each generation. The nights I chased my mom around the kitchen asking questions to understand what’s going where and why.
    !
    There’s a series I watched, Mind of a Chef, that features a few chefs celebrating cooking, travel, science, and history--all wrapped up into this rich study of past heritage and the line it carries inform the
    delicious food they prepare now. Chef Sean Brock of Charleston’s Husk fame, was known for his passion in sharing Southern preservation techniques and elevating generations-old grains and heirloom seeds
    he uses in his own kitchen. He’s converted many a skeptic into appreciating Southern food for more than its sweet tea and grits. In tracing the roots of the flavors he was so familiar, he traveled to Senegal to
    explore the West African influences on Southern cuisine. There’s a point where he starts to notice the remarkable familiarity in flavors and techniques such as gumbo-like broths and rice dishes despite the
    availability of similar raw ingredients.
    !

    View Slide

  9. Why graphdb?
    Why not?
    I was saw John Resig speak at Brooklyn.js on his more recent work to Building an Art History Database Using Computer Vision with the Frick Library in NYC. He shared how clear
    and concise the queries could be using Neo4j and how it allowed him to connect unattributed Italian artwork. It was incredible.
    !
    Common uses for graph databases include geospatial problems, recommendation engines, network analysis, and bioinformatics -- anywhere that the relationship between the
    data is just as important as the data itself. What if you could see if there are any interesting histories or similarities you could surface by doing large-scale comparisons of
    ingredients and cooking preparation across millions of recipes?

    View Slide

  10. GRAPH theory.
    R
    R
    i i i
    i
    i
    i i
    i
    i
    i
    i
    Let’s dive into explaining a little bit of graph theory and understanding why it might be useful here.
    !
    Graph theory, at its core, is the study of graphs, in this case--mathematical structures used to model relations between objects. They can be slightly different from the use of the word graphs we might be familiar with but are a powerful way to
    visually represent relationships.
    !
    So here we have a graph for the initial concept of my recipe exploration.
    The fundamental units that form a graph are nodes and relationships.
    A property graph is made up of nodes, relationships(edges), and properties. Nodes contain properties. Some great insight provided by online docs is to think of nodes as documents that store properties in the form of arbitrary key-value pairs. This
    is the most common graph model application and is referred to as the Property Graph Model.
    The keys are strings and the values are arbitrary data types.
    In Neo4j, both nodes and relationships can contain properties.
    Relationships connect and structure nodes. A relationship always has a direction, a label, and a start node and an end node—there are no dangling relationships.
    Together, a relationship’s direction and label add semantic clarity to the structuring of nodes. The ability to add properties to relationships is particularly useful for providing additional metadata for graph

    View Slide

  11. CYPHER is for ASCII lovers.
    (recipe) -[:CONTAINS]-> (ingredient)
    Node syntax Relationship syntax
    (ingredient) -->
    -[recipe]->
    -[:CONTAINS]->
    -[recipe:CONTAINS]->
    Pattern syntax
    These primitives are all we need to create sophisticated and semantically rich models. So far, all our models have been in the form of diagrams. Diagrams are great for describing graphs outside of any technology context, but when it
    comes to using a database, we need some other mechanism for creating, manipulating, and querying data.
    !
    Cypher is the query language that Neo4j uses to do just that. I immediately appreciated how intuitive it seemed when I realized it took advantage of ASCII art to represent graph patterns. It still uses SQL-like clauses and keywords
    (eg, MATCH, WHERE, DELETE) to combine these patterns and specify desired actions.
    !
    Cypher uses a pair of parentheses (usually containing a string) to represent a node, eg: (), (ingredient)
    !
    A pair of dashes (--) is used to represent an undirected relationship. Directed relationships have an arrowhead at one end (eg, <--, -->). Bracketed expressions (eg:[...]) can be used to add details.
    !
    Combining the syntax for nodes and relationships, we can express patterns.
    !

    View Slide

  12. Complete graph cycles
    Treating relationships, the edges of a graph, as a first class object is the fundamental innovation of graph databases. The database doesn’t only store just information about individual
    things, but it also stores the relationships between those things.
    !
    This capability makes it much easier to express sophisticated questions, and get answers in a small fraction of the time it takes a traditional database. The relationships in the database
    can express the nature of each connection (parent, child, owns, friend) and capture any number of qualitative or quantitative facts about that relationship (weighting, start and end date,
    etc.).
    !
    Because of this you can write a queries that express constraints. I consider this exploratory modeling. It’s all in what you’re interested in. I can hypothesize a connection and very quickly
    be delighted or disappointed when my results return.

    View Slide

  13. DOCUMENTATION IS FOR SUCKERS?
    Or perhaps clear & helpful docs can encourage adoption.
    The small lessons we learn.
    I’d had a couple of recommendations and seen some incredible talks influencing which database would be a good try. Turns out I don’t have a lot of friends
    attempting to mess around with graph databases, so the recommendations weren’t as strong as I’d initially hoped. I really wanted to use levelgraph built off of
    LevelDB, as it has a great community of developers contributing to it. I read what docs I could find and looked at the other most popular option--neo4j. The
    battle was a very quick one. Neo4j not only had a wide breadth of developers who had written on using it, it had a whole manual of documentation that included
    examples, tutorials, and links to other community projects.

    View Slide

  14. The
    (Mistakes were made.)
    choiceswe make.
    The next choice I had to make in exploring this work was how the heck I would find a hearty collection of digitized recipes. There’s no Library of Congress for recipes. No central repository for
    baking. Yet.
    !
    We all make mistakes. It’s a bit silly how quickly they can pile up.
    !
    Mine was in choosing to attempt maximizing daily requests on developer friendly Recipe APIs over the down and dirty job of web scraping the glorious amount of cooking that is spread across
    the far reaches of the Internet.
    !
    Let’s look at some of the API response structures. (BigOven, Food2Fork, Edamam, and then my normalized api structure)
    !

    View Slide

  15. APIs party
    keep movin’

    View Slide

  16. The kitchen sink of APIs
    All of these had a decent start for documentation
    !
    BigOven(the kitchen sink with nutritional info for free)
    Edamam(useful breakdown and much more digestible)
    Food2Fork(very friendly API but so sparse)
    My normalized API dream structure.
    !
    I deal with very deeply nested payloads in my daywork with push notifications so I try to be extra concscientious when designing this type of thing for my own projects. I also found out later on that the
    nesting makes neo4j very cranky.
    !
    My normalized structure accounted for a small wishlist of items I hope to tackle in the future but were not present in more than one API.

    View Slide

  17. So dreamy.

    View Slide

  18. Maximize my laziness
    by minimizing my grocery store runs.
    i i i
    i
    R R
    i
    i
    i
    i
    i
    So the ultimate goal of the stage one of this project was to lock down the ingredient to recipe relationships. If I have a couple of items in my pantry, optimize for
    the recipes that will require the minimum amount of purchasing when I stop by the grocery store after work.
    !
    For this, one can give a weight to the recipes that are a perfect match, but this might not be a very exciting recipe if I’ve put in 5 ingredients. My exact result
    comes up as pita. Being able to expand this to a few more ingredients and add in matches for category(dinner), or cuisine(Southern) makes for far more delicious
    results!
    !

    View Slide

  19. WHAT IT IS
    what it is isn’t!
    Stage 2 of this project is still being implemented, unfortunately. The limitations of the APIs I chose over web scraping left me with too small of a data set around cuisine for me to be able to play with it. This is the
    problem I’m most excited to solve for. Again, this notion of being able to discover culinary connections between cultures could mean I get to explore the world through eating instead of sitting in a library pouring over
    culinary history. Both are really interesting, but I prefer a food experiences that fires on more sense cylinders.
    !
    Currently, this project is all command line and neo4j interface(which is gorgeous and a bunch of fun to work with).
    !
    18.5 what this project isn’t
    a business. It’s an exploration. I’m not particularly interested in started my own company to compete against the likes of the behemoths I’ve mentioned prior who have contributed so much towards being able to cook a
    good meal. I am interested in encouraging these projects to rethink the way they have approached the rich data they have at their fingertips.
    !

    View Slide

  20. !
    &
    open sourcing
    convert to scraping
    web interface
    !
    Oh my!
    https://github.com/hackygolucky/recipe-modeling
    what this project WILL be
    (to be continued,
    open sourcing--I need to clean it up but it will live at https://github.com/hackygolucky/recipe-modeling,
    !
    build a web interface so you can do fun things like throw in a few ingredients in a search box and it’ll just work!
    !
    convert to web scraping, as my previous regrets have clarified
    !

    View Slide

  21. Let’s review-
    So we’ve learned about graph theory, neo4j, gaps in current recipe discovery, how helpful documentation can be, kinder data structures, and all of the mistakes that can go into building
    an idea that many people have already attempted.
    !
    This is how I get to be a better programmer, and you do too. There’s a pervasive notion in many programming communities that trying these sorts of things out can be a waste of effort.
    That they’ve already been done well or can be done better in another language. While this might be true and that time of that this discouraging whining ensues, each person attempting
    these projects a bit out of their reach(and potentially ours) are forging a path to contribute in a much larger way in the future.
    !
    Practicing failure by taking bigger and bigger risks prepares you for your eventual luck at even more awesome endeavors. I’d first considered many of my choices in this project as
    failures. Experiencing failure is universal. I tried to start a bakery when I was younger and had a long, tangled adventure in having to let go. We can start a bakery or fail a little more,
    there you go.
    !

    View Slide

  22. Fin
    Why do you care?
    Sharing projects that help us explore and learn to be better programmers through the intersection of passions is a powerful means of including people from many facets of life and
    getting others to try it. Like graph theory? I don’t know. How about being able to explore unique connections between ingredients or recipe cuisine? YES! Well that’s how I got here. :)
    !
    I don’t have formal CS background, I’m always reaching for projects that will push me to fill in these gaps of knowledge. I’d seen some really cool graph database related projects but
    hadn’t considered myself as interested until I found a connection that was meangingful to me .so please chat me up after the talk with resources youv’e been excited about to help
    communicate data structures and algorithms and graph theory.
    !
    !

    View Slide

  23. -Resources-
    http://info.neotechnology.com/rs/neotechnology/images/GraphDatabases.pdf
    Neo4j manual
    Mind of a Chef, tv series
    Google Presents Alton Brown: Good Eats 3, the Later Years.
    Graph Databases by Ian Robinson, Jim Webber, & Emil Eifrem
    http://neo4j.com/docs/stable/graphdb-concepts.html
    Neo4j REST API wrapper
    https://github.com/philippkueng/node-neo4j

    View Slide

  24. The biggest of
    thanks!
    These folks deserve some cookies.

    View Slide