A dash of sugar, just a smidgen of graph database
Tracy, Web Engineer, conference organizer, sugar fiend.
I’ve worn many hats. I have been a conference organizer, a baker, an addictions program coordinator, a restaurant manager. I’m currently a web engineer(this is not the exhaustive list).
Because I love making, whether it is via code or cookies, I’d long considered culinary school as a means to fulfillment. I nerd out on cooking, baking, and recipes as much I make long lists
In one of my younger fits of courage, I was determined to run my own bakery. The idea of being able to take recipes I’ve crafted to perfection and build a successful business was
intoxicating. In hindsight, we maybe could have chalked this up to me being on a pretty serious sugar mania.
On many a night, I finish working and I’m starvin’. I throw open my laptop, and search by what I've got in the house to minimize me running to the store. I'm craving a flavor, not
necessarily a cuisine. How do recipe searches address this? How do we combine curation of good recipe content with better search to avoid having to build a recommendation engine?
In one of my prior lives, I was a baker. I love cooking. I especially love baking. I love researching recipes. I delight in the ‘why’ as much as the ‘how’. Following the methods, repeating
experiments, seeing what I’ve done wrong.
My methodology for making something new begins with recipe compilation in paper and digital formats. I read the comments(lots and lots of them). I find where cultural habits have
crept in and try to hone in on the recipe to what I believe to be the most honest-to-history and then run with it.
So to do this, where do you go?
Books and magazine articles I’ve hoarded over the years.(This very quickly became nearly obsolete for me as I’ve downsized and moved around) You can only schlep a box full of
professional culinary recipe books up a few flights of stairs so many times before you start wishing it were digitized.
Recipe search. Turns out Google has this thing they do pretty well. I have my favorite sources--often food bloggers who can be trusted to test their recipes and accept feedback for
improvement. The ones that tend to give me the best recipes time and time again. How do I find them? google.com This wins for me right now. Curated web vs. social web(something
like AllRecipes) is a super interesting rabbit hole I was distracted by but will spare you until another talk.
So then I watched Alton Brown say: “ ...and there is no way to find good recipes on the Internet, I don’t think. Not through a general recipe search.” (Google Presents Alton Brown:
Good Eats 3, the Later Years. 34:42) and was pretty flabbergasted. Yet nodding my head in agreement. (Play clip in slide showing on the screen here)
From a content perspective, what’s the current state of recipe discovery? There are recipe APIs! Yummly, BigOven, WebKnox Recipe, Food2Fork
that provide your website with promises of X00,000 recipes to include but what does this mean if the data you have is limited? I want to cross-
reference reviews, ingredient lists, types of food, or time of preparation/cooking. It’s not that I think all of these resources, in existence for years, are
broken. I want to play with the idea of ‘another way’. I wanted more--to explore.
The bright side
AllRecipes is the most robust site for collection and discovery I have yet to come across. It labels itself as ‘a food-focused social media website’. I would say that’s fair.
What’s been so interesting about AllRecipes over the years is that even when adding features and cleaning up the UI, the user added comments have remained AS-
valuable as the recipes. It’s like I’ve got my mom standing in the kitchen next to me telling me not to use so much baking soda or a add a little more salt. Blogs generally
have lower traffic. Most of the commenters are appreciating the art of the food photography and descriptive writing, but haven’t actually tested the recipe(and they’ll
Might seem like a silly question, but what are recipes? They can be a fairly standard formula of how to cook or bake something you know you want. They can be an experiment to try flavors you’re curious
about. They can be a family legacy passed down and recreated with each generation. The nights I chased my mom around the kitchen asking questions to understand what’s going where and why.
There’s a series I watched, Mind of a Chef, that features a few chefs celebrating cooking, travel, science, and history--all wrapped up into this rich study of past heritage and the line it carries inform the
delicious food they prepare now. Chef Sean Brock of Charleston’s Husk fame, was known for his passion in sharing Southern preservation techniques and elevating generations-old grains and heirloom seeds
he uses in his own kitchen. He’s converted many a skeptic into appreciating Southern food for more than its sweet tea and grits. In tracing the roots of the flavors he was so familiar, he traveled to Senegal to
explore the West African influences on Southern cuisine. There’s a point where he starts to notice the remarkable familiarity in flavors and techniques such as gumbo-like broths and rice dishes despite the
availability of similar raw ingredients.
I was saw John Resig speak at Brooklyn.js on his more recent work to Building an Art History Database Using Computer Vision with the Frick Library in NYC. He shared how clear
and concise the queries could be using Neo4j and how it allowed him to connect unattributed Italian artwork. It was incredible.
Common uses for graph databases include geospatial problems, recommendation engines, network analysis, and bioinformatics -- anywhere that the relationship between the
data is just as important as the data itself. What if you could see if there are any interesting histories or similarities you could surface by doing large-scale comparisons of
ingredients and cooking preparation across millions of recipes?
i i i
Let’s dive into explaining a little bit of graph theory and understanding why it might be useful here.
Graph theory, at its core, is the study of graphs, in this case--mathematical structures used to model relations between objects. They can be slightly different from the use of the word graphs we might be familiar with but are a powerful way to
visually represent relationships.
So here we have a graph for the initial concept of my recipe exploration.
The fundamental units that form a graph are nodes and relationships.
A property graph is made up of nodes, relationships(edges), and properties. Nodes contain properties. Some great insight provided by online docs is to think of nodes as documents that store properties in the form of arbitrary key-value pairs. This
is the most common graph model application and is referred to as the Property Graph Model.
The keys are strings and the values are arbitrary data types.
In Neo4j, both nodes and relationships can contain properties.
Relationships connect and structure nodes. A relationship always has a direction, a label, and a start node and an end node—there are no dangling relationships.
Together, a relationship’s direction and label add semantic clarity to the structuring of nodes. The ability to add properties to relationships is particularly useful for providing additional metadata for graph
CYPHER is for ASCII lovers.
(recipe) -[:CONTAINS]-> (ingredient)
Node syntax Relationship syntax
These primitives are all we need to create sophisticated and semantically rich models. So far, all our models have been in the form of diagrams. Diagrams are great for describing graphs outside of any technology context, but when it
comes to using a database, we need some other mechanism for creating, manipulating, and querying data.
Cypher is the query language that Neo4j uses to do just that. I immediately appreciated how intuitive it seemed when I realized it took advantage of ASCII art to represent graph patterns. It still uses SQL-like clauses and keywords
(eg, MATCH, WHERE, DELETE) to combine these patterns and specify desired actions.
Cypher uses a pair of parentheses (usually containing a string) to represent a node, eg: (), (ingredient)
A pair of dashes (--) is used to represent an undirected relationship. Directed relationships have an arrowhead at one end (eg, <--, -->). Bracketed expressions (eg:[...]) can be used to add details.
Combining the syntax for nodes and relationships, we can express patterns.
Complete graph cycles
Treating relationships, the edges of a graph, as a first class object is the fundamental innovation of graph databases. The database doesn’t only store just information about individual
things, but it also stores the relationships between those things.
This capability makes it much easier to express sophisticated questions, and get answers in a small fraction of the time it takes a traditional database. The relationships in the database
can express the nature of each connection (parent, child, owns, friend) and capture any number of qualitative or quantitative facts about that relationship (weighting, start and end date,
Because of this you can write a queries that express constraints. I consider this exploratory modeling. It’s all in what you’re interested in. I can hypothesize a connection and very quickly
be delighted or disappointed when my results return.
DOCUMENTATION IS FOR SUCKERS?
Or perhaps clear & helpful docs can encourage adoption.
The small lessons we learn.
I’d had a couple of recommendations and seen some incredible talks influencing which database would be a good try. Turns out I don’t have a lot of friends
attempting to mess around with graph databases, so the recommendations weren’t as strong as I’d initially hoped. I really wanted to use levelgraph built off of
LevelDB, as it has a great community of developers contributing to it. I read what docs I could find and looked at the other most popular option--neo4j. The
battle was a very quick one. Neo4j not only had a wide breadth of developers who had written on using it, it had a whole manual of documentation that included
examples, tutorials, and links to other community projects.
(Mistakes were made.)
The next choice I had to make in exploring this work was how the heck I would find a hearty collection of digitized recipes. There’s no Library of Congress for recipes. No central repository for
We all make mistakes. It’s a bit silly how quickly they can pile up.
Mine was in choosing to attempt maximizing daily requests on developer friendly Recipe APIs over the down and dirty job of web scraping the glorious amount of cooking that is spread across
the far reaches of the Internet.
Let’s look at some of the API response structures. (BigOven, Food2Fork, Edamam, and then my normalized api structure)
The kitchen sink of APIs
All of these had a decent start for documentation
BigOven(the kitchen sink with nutritional info for free)
Edamam(useful breakdown and much more digestible)
Food2Fork(very friendly API but so sparse)
My normalized API dream structure.
I deal with very deeply nested payloads in my daywork with push notifications so I try to be extra concscientious when designing this type of thing for my own projects. I also found out later on that the
nesting makes neo4j very cranky.
My normalized structure accounted for a small wishlist of items I hope to tackle in the future but were not present in more than one API.
Maximize my laziness
by minimizing my grocery store runs.
i i i
So the ultimate goal of the stage one of this project was to lock down the ingredient to recipe relationships. If I have a couple of items in my pantry, optimize for
the recipes that will require the minimum amount of purchasing when I stop by the grocery store after work.
For this, one can give a weight to the recipes that are a perfect match, but this might not be a very exciting recipe if I’ve put in 5 ingredients. My exact result
comes up as pita. Being able to expand this to a few more ingredients and add in matches for category(dinner), or cuisine(Southern) makes for far more delicious
WHAT IT IS
what it is isn’t!
Stage 2 of this project is still being implemented, unfortunately. The limitations of the APIs I chose over web scraping left me with too small of a data set around cuisine for me to be able to play with it. This is the
problem I’m most excited to solve for. Again, this notion of being able to discover culinary connections between cultures could mean I get to explore the world through eating instead of sitting in a library pouring over
culinary history. Both are really interesting, but I prefer a food experiences that fires on more sense cylinders.
Currently, this project is all command line and neo4j interface(which is gorgeous and a bunch of fun to work with).
18.5 what this project isn’t
a business. It’s an exploration. I’m not particularly interested in started my own company to compete against the likes of the behemoths I’ve mentioned prior who have contributed so much towards being able to cook a
good meal. I am interested in encouraging these projects to rethink the way they have approached the rich data they have at their fingertips.
convert to scraping
what this project WILL be
(to be continued,
open sourcing--I need to clean it up but it will live at https://github.com/hackygolucky/recipe-modeling,
build a web interface so you can do fun things like throw in a few ingredients in a search box and it’ll just work!
convert to web scraping, as my previous regrets have clarified
So we’ve learned about graph theory, neo4j, gaps in current recipe discovery, how helpful documentation can be, kinder data structures, and all of the mistakes that can go into building
an idea that many people have already attempted.
This is how I get to be a better programmer, and you do too. There’s a pervasive notion in many programming communities that trying these sorts of things out can be a waste of effort.
That they’ve already been done well or can be done better in another language. While this might be true and that time of that this discouraging whining ensues, each person attempting
these projects a bit out of their reach(and potentially ours) are forging a path to contribute in a much larger way in the future.
Practicing failure by taking bigger and bigger risks prepares you for your eventual luck at even more awesome endeavors. I’d first considered many of my choices in this project as
failures. Experiencing failure is universal. I tried to start a bakery when I was younger and had a long, tangled adventure in having to let go. We can start a bakery or fail a little more,
there you go.
Why do you care?
Sharing projects that help us explore and learn to be better programmers through the intersection of passions is a powerful means of including people from many facets of life and
getting others to try it. Like graph theory? I don’t know. How about being able to explore unique connections between ingredients or recipe cuisine? YES! Well that’s how I got here. :)
I don’t have formal CS background, I’m always reaching for projects that will push me to fill in these gaps of knowledge. I’d seen some really cool graph database related projects but
hadn’t considered myself as interested until I found a connection that was meangingful to me .so please chat me up after the talk with resources youv’e been excited about to help
communicate data structures and algorithms and graph theory.
Mind of a Chef, tv series
Google Presents Alton Brown: Good Eats 3, the Later Years.
Graph Databases by Ian Robinson, Jim Webber, & Emil Eifrem
Neo4j REST API wrapper
The biggest of
These folks deserve some cookies.