Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Rewriting Recipe Search: A dash of sugar, just a smidgen of graph database

Rewriting Recipe Search: A dash of sugar, just a smidgen of graph database

I’m starvin’. I throw open my laptop, and search by what I've got in the house to minimize me running to the store. I'm craving a flavor, not necessarily a cuisine. Some southern food has its origins in West African food. How do recipe searches address this? How do we combine curation of good recipe content with better search to avoid having to build a recommendation engine? In this talk, we’ll take a look at the current state of web recipe discovery and how I naively tried to come up with an alternative with brute JavaScript force and graph-databasing(Neo4j); like baking a souffle for a dinner party and doing a no-fall dance before it comes out of the oven. Plan for the worst and be delighted when it doesn’t happen(or chuckle when it does)!

Tracy Hinds

June 01, 2015
Tweet

More Decks by Tracy Hinds

Other Decks in Technology

Transcript

  1. Tracy Hinds Tracy, Web Engineer, conference organizer, sugar fiend. I’ve

    worn many hats. I have been a conference organizer, a baker, an addictions program coordinator, a restaurant manager. I’m currently a web engineer(this is not the exhaustive list). ! Because I love making, whether it is via code or cookies, I’d long considered culinary school as a means to fulfillment. I nerd out on cooking, baking, and recipes as much I make long lists of all the new technologies I’m excited to keep up with in JavaScript. ! In one of my younger fits of courage, I was determined to run my own bakery. The idea of being able to take recipes I’ve crafted to perfection and build a successful business was intoxicating. In hindsight, we maybe could have chalked this up to me being on a pretty serious sugar mania.
  2. On many a night, I finish working and I’m starvin’.

    I throw open my laptop, and search by what I've got in the house to minimize me running to the store. I'm craving a flavor, not necessarily a cuisine. How do recipe searches address this? How do we combine curation of good recipe content with better search to avoid having to build a recommendation engine? ! In one of my prior lives, I was a baker. I love cooking. I especially love baking. I love researching recipes. I delight in the ‘why’ as much as the ‘how’. Following the methods, repeating experiments, seeing what I’ve done wrong. ! My methodology for making something new begins with recipe compilation in paper and digital formats. I read the comments(lots and lots of them). I find where cultural habits have crept in and try to hone in on the recipe to what I believe to be the most honest-to-history and then run with it.
  3. So to do this, where do you go? Books and

    magazine articles I’ve hoarded over the years.(This very quickly became nearly obsolete for me as I’ve downsized and moved around) You can only schlep a box full of professional culinary recipe books up a few flights of stairs so many times before you start wishing it were digitized. ! The Internet! Recipe search. Turns out Google has this thing they do pretty well. I have my favorite sources--often food bloggers who can be trusted to test their recipes and accept feedback for improvement. The ones that tend to give me the best recipes time and time again. How do I find them? google.com This wins for me right now. Curated web vs. social web(something like AllRecipes) is a super interesting rabbit hole I was distracted by but will spare you until another talk.
  4. So then I watched Alton Brown say: “ ...and there

    is no way to find good recipes on the Internet, I don’t think. Not through a general recipe search.” (Google Presents Alton Brown: Good Eats 3, the Later Years. 34:42) and was pretty flabbergasted. Yet nodding my head in agreement. (Play clip in slide showing on the screen here)
  5. BigOven Yummly Food2Fork Edamam many So choices. From a content

    perspective, what’s the current state of recipe discovery? There are recipe APIs! Yummly, BigOven, WebKnox Recipe, Food2Fork that provide your website with promises of X00,000 recipes to include but what does this mean if the data you have is limited? I want to cross- reference reviews, ingredient lists, types of food, or time of preparation/cooking. It’s not that I think all of these resources, in existence for years, are broken. I want to play with the idea of ‘another way’. I wanted more--to explore.
  6. The bright side AllRecipes is the most robust site for

    collection and discovery I have yet to come across. It labels itself as ‘a food-focused social media website’. I would say that’s fair. What’s been so interesting about AllRecipes over the years is that even when adding features and cleaning up the UI, the user added comments have remained AS- valuable as the recipes. It’s like I’ve got my mom standing in the kitchen next to me telling me not to use so much baking soda or a add a little more salt. Blogs generally have lower traffic. Most of the commenters are appreciating the art of the food photography and descriptive writing, but haven’t actually tested the recipe(and they’ll say so).
  7. Might seem like a silly question, but what are recipes?

    They can be a fairly standard formula of how to cook or bake something you know you want. They can be an experiment to try flavors you’re curious about. They can be a family legacy passed down and recreated with each generation. The nights I chased my mom around the kitchen asking questions to understand what’s going where and why. ! There’s a series I watched, Mind of a Chef, that features a few chefs celebrating cooking, travel, science, and history--all wrapped up into this rich study of past heritage and the line it carries inform the delicious food they prepare now. Chef Sean Brock of Charleston’s Husk fame, was known for his passion in sharing Southern preservation techniques and elevating generations-old grains and heirloom seeds he uses in his own kitchen. He’s converted many a skeptic into appreciating Southern food for more than its sweet tea and grits. In tracing the roots of the flavors he was so familiar, he traveled to Senegal to explore the West African influences on Southern cuisine. There’s a point where he starts to notice the remarkable familiarity in flavors and techniques such as gumbo-like broths and rice dishes despite the availability of similar raw ingredients. !
  8. Why graphdb? Why not? I was saw John Resig speak

    at Brooklyn.js on his more recent work to Building an Art History Database Using Computer Vision with the Frick Library in NYC. He shared how clear and concise the queries could be using Neo4j and how it allowed him to connect unattributed Italian artwork. It was incredible. ! Common uses for graph databases include geospatial problems, recommendation engines, network analysis, and bioinformatics -- anywhere that the relationship between the data is just as important as the data itself. What if you could see if there are any interesting histories or similarities you could surface by doing large-scale comparisons of ingredients and cooking preparation across millions of recipes?
  9. GRAPH theory. R R i i i i i i

    i i i i i Let’s dive into explaining a little bit of graph theory and understanding why it might be useful here. ! Graph theory, at its core, is the study of graphs, in this case--mathematical structures used to model relations between objects. They can be slightly different from the use of the word graphs we might be familiar with but are a powerful way to visually represent relationships. ! So here we have a graph for the initial concept of my recipe exploration. The fundamental units that form a graph are nodes and relationships. A property graph is made up of nodes, relationships(edges), and properties. Nodes contain properties. Some great insight provided by online docs is to think of nodes as documents that store properties in the form of arbitrary key-value pairs. This is the most common graph model application and is referred to as the Property Graph Model. The keys are strings and the values are arbitrary data types. In Neo4j, both nodes and relationships can contain properties. Relationships connect and structure nodes. A relationship always has a direction, a label, and a start node and an end node—there are no dangling relationships. Together, a relationship’s direction and label add semantic clarity to the structuring of nodes. The ability to add properties to relationships is particularly useful for providing additional metadata for graph
  10. CYPHER is for ASCII lovers. (recipe) -[:CONTAINS]-> (ingredient) Node syntax

    Relationship syntax (ingredient) --> -[recipe]-> -[:CONTAINS]-> -[recipe:CONTAINS]-> Pattern syntax These primitives are all we need to create sophisticated and semantically rich models. So far, all our models have been in the form of diagrams. Diagrams are great for describing graphs outside of any technology context, but when it comes to using a database, we need some other mechanism for creating, manipulating, and querying data. ! Cypher is the query language that Neo4j uses to do just that. I immediately appreciated how intuitive it seemed when I realized it took advantage of ASCII art to represent graph patterns. It still uses SQL-like clauses and keywords (eg, MATCH, WHERE, DELETE) to combine these patterns and specify desired actions. ! Cypher uses a pair of parentheses (usually containing a string) to represent a node, eg: (), (ingredient) ! A pair of dashes (--) is used to represent an undirected relationship. Directed relationships have an arrowhead at one end (eg, <--, -->). Bracketed expressions (eg:[...]) can be used to add details. ! Combining the syntax for nodes and relationships, we can express patterns. !
  11. Complete graph cycles Treating relationships, the edges of a graph,

    as a first class object is the fundamental innovation of graph databases. The database doesn’t only store just information about individual things, but it also stores the relationships between those things. ! This capability makes it much easier to express sophisticated questions, and get answers in a small fraction of the time it takes a traditional database. The relationships in the database can express the nature of each connection (parent, child, owns, friend) and capture any number of qualitative or quantitative facts about that relationship (weighting, start and end date, etc.). ! Because of this you can write a queries that express constraints. I consider this exploratory modeling. It’s all in what you’re interested in. I can hypothesize a connection and very quickly be delighted or disappointed when my results return.
  12. DOCUMENTATION IS FOR SUCKERS? Or perhaps clear & helpful docs

    can encourage adoption. The small lessons we learn. I’d had a couple of recommendations and seen some incredible talks influencing which database would be a good try. Turns out I don’t have a lot of friends attempting to mess around with graph databases, so the recommendations weren’t as strong as I’d initially hoped. I really wanted to use levelgraph built off of LevelDB, as it has a great community of developers contributing to it. I read what docs I could find and looked at the other most popular option--neo4j. The battle was a very quick one. Neo4j not only had a wide breadth of developers who had written on using it, it had a whole manual of documentation that included examples, tutorials, and links to other community projects.
  13. The (Mistakes were made.) choiceswe make. The next choice I

    had to make in exploring this work was how the heck I would find a hearty collection of digitized recipes. There’s no Library of Congress for recipes. No central repository for baking. Yet. ! We all make mistakes. It’s a bit silly how quickly they can pile up. ! Mine was in choosing to attempt maximizing daily requests on developer friendly Recipe APIs over the down and dirty job of web scraping the glorious amount of cooking that is spread across the far reaches of the Internet. ! Let’s look at some of the API response structures. (BigOven, Food2Fork, Edamam, and then my normalized api structure) !
  14. The kitchen sink of APIs All of these had a

    decent start for documentation ! BigOven(the kitchen sink with nutritional info for free) Edamam(useful breakdown and much more digestible) Food2Fork(very friendly API but so sparse) My normalized API dream structure. ! I deal with very deeply nested payloads in my daywork with push notifications so I try to be extra concscientious when designing this type of thing for my own projects. I also found out later on that the nesting makes neo4j very cranky. ! My normalized structure accounted for a small wishlist of items I hope to tackle in the future but were not present in more than one API.
  15. Maximize my laziness by minimizing my grocery store runs. i

    i i i R R i i i i i So the ultimate goal of the stage one of this project was to lock down the ingredient to recipe relationships. If I have a couple of items in my pantry, optimize for the recipes that will require the minimum amount of purchasing when I stop by the grocery store after work. ! For this, one can give a weight to the recipes that are a perfect match, but this might not be a very exciting recipe if I’ve put in 5 ingredients. My exact result comes up as pita. Being able to expand this to a few more ingredients and add in matches for category(dinner), or cuisine(Southern) makes for far more delicious results! !
  16. WHAT IT IS what it is isn’t! Stage 2 of

    this project is still being implemented, unfortunately. The limitations of the APIs I chose over web scraping left me with too small of a data set around cuisine for me to be able to play with it. This is the problem I’m most excited to solve for. Again, this notion of being able to discover culinary connections between cultures could mean I get to explore the world through eating instead of sitting in a library pouring over culinary history. Both are really interesting, but I prefer a food experiences that fires on more sense cylinders. ! Currently, this project is all command line and neo4j interface(which is gorgeous and a bunch of fun to work with). ! 18.5 what this project isn’t a business. It’s an exploration. I’m not particularly interested in started my own company to compete against the likes of the behemoths I’ve mentioned prior who have contributed so much towards being able to cook a good meal. I am interested in encouraging these projects to rethink the way they have approached the rich data they have at their fingertips. !
  17. ! & open sourcing convert to scraping web interface !

    Oh my! https://github.com/hackygolucky/recipe-modeling what this project WILL be (to be continued, open sourcing--I need to clean it up but it will live at https://github.com/hackygolucky/recipe-modeling, ! build a web interface so you can do fun things like throw in a few ingredients in a search box and it’ll just work! ! convert to web scraping, as my previous regrets have clarified !
  18. Let’s review- So we’ve learned about graph theory, neo4j, gaps

    in current recipe discovery, how helpful documentation can be, kinder data structures, and all of the mistakes that can go into building an idea that many people have already attempted. ! This is how I get to be a better programmer, and you do too. There’s a pervasive notion in many programming communities that trying these sorts of things out can be a waste of effort. That they’ve already been done well or can be done better in another language. While this might be true and that time of that this discouraging whining ensues, each person attempting these projects a bit out of their reach(and potentially ours) are forging a path to contribute in a much larger way in the future. ! Practicing failure by taking bigger and bigger risks prepares you for your eventual luck at even more awesome endeavors. I’d first considered many of my choices in this project as failures. Experiencing failure is universal. I tried to start a bakery when I was younger and had a long, tangled adventure in having to let go. We can start a bakery or fail a little more, there you go. !
  19. Fin Why do you care? Sharing projects that help us

    explore and learn to be better programmers through the intersection of passions is a powerful means of including people from many facets of life and getting others to try it. Like graph theory? I don’t know. How about being able to explore unique connections between ingredients or recipe cuisine? YES! Well that’s how I got here. :) ! I don’t have formal CS background, I’m always reaching for projects that will push me to fill in these gaps of knowledge. I’d seen some really cool graph database related projects but hadn’t considered myself as interested until I found a connection that was meangingful to me .so please chat me up after the talk with resources youv’e been excited about to help communicate data structures and algorithms and graph theory. ! !
  20. -Resources- http://info.neotechnology.com/rs/neotechnology/images/GraphDatabases.pdf Neo4j manual Mind of a Chef, tv series

    Google Presents Alton Brown: Good Eats 3, the Later Years. Graph Databases by Ian Robinson, Jim Webber, & Emil Eifrem http://neo4j.com/docs/stable/graphdb-concepts.html Neo4j REST API wrapper https://github.com/philippkueng/node-neo4j