⇽ home

Introduction: How do non-Western recipes get whitewashed?

Online recipe databases and food media websites such as New York Times Cooking and Bon Appetit have helped make cooking from a wide variety of cuisines an increasingly accessible activity. However, food media remains overwhelmingly white--for example, within the New York Times Cooking recipe archive, even non-Western recipes are written by a majority white group of authors.[1]

At the same time, the positioning of white authors as authority figures on non-white cuisines has increasingly been challenged, with critics pointing out the whitewashing and related harms that occur. In an interview[2] with Yewande Komolafe, a Nigerian recipe developer and author, fellow recipe developer Priya Krishna discusses some of the frustrations they have experienced as BIPOC members of food media (emphasis mine):

We lamented the extra labor that non-white people are often asked to do—from including **additional explanations** of dishes in the head note to finding imperfect **substitutions** for ingredients—in service of making their recipes more accessible to a white audience.

In addition to the implied white audience that they must consider when adding explanations that would be redundant for non-white audiences, Krishna also observes how non-Western recipes are often framed in ways to make them appear more accessible:

It's why modifiers like “simple” or “weeknight” that many publications (including BA) have historically added always make me laugh. Indian is my weeknight food. There's no weeknight dal and then dal. It's all just dal. **It’s as if our food needs to be made more approachable**.

I highly recommend reading the full interview, as there are many other observations that I think you'll find relatable. Meanwhile, the interview also made me wonder what empirical evidence I could find for the whitewashing of recipes, and more generally, what biases may be present in how food media talks about Western versus non-Western recipes.

Data

Overview

To empirically test some of Komolafe's and Krishna's hypotheses (such as non-Western recipes attracting words that signal "approachability") and to identify large-scale patterns in how food media's treatment of Western and non-Western recipes differ, I assembled3 a dataset of 53K recipes from New York Times Cooking (NYT), Bon Appetit (BA), Food52 (F52), Epicurious (EPI), and Serious Eats (SE).4 I chose these five websites for the simple reason that they are the ones I visit most frequently. For the NYT recipes, I additionally scraped the text of the longer articles that sometimes accompany recipes, in which the author reflects on their inspirations and process for creating the recipe (see the highlighted "Featured in" box in the screenshot below for an example).

In [1]:
from IPython.display import Image
Image("./figures/nyt_ex.png")
Out[1]:

Let's plot the distribution of recipes over websites (using our food-themed color "palate": "tomato", "orange", "lime", "wheat", "plum", "honeydew", "chocolate", "olive", "bisque", and "salmon"). Apart from Serious Eats, the distribution of recipes is relatively even across the other 4 websites:

In [2]:
Image(filename="./figures/gen_dist.png", width = 500)
Out[2]:

Getting cuisine labels

Using existing cuisine categories

For the analysis, I need to assign each recipe to the label of either non-Western (nw) or Western (w). Since the 5 websites in our dataset provide cuisine categories such as "Chinese" and "Italian" for the majority of recipes, I created a mapping from each of these cuisine categories to either nw or w to get a label for these recipes. I mapped cuisines onto either w or nw based on the simple heuristic that w is associated with the continents and countries of Europe, North America, and Australia, whereas nw is associated with the continents/countries of Asia, Africa, and South America.

In total, there are 103 unique cuisine categories in use across the 5 websites, shown below. A lot can be said about the idiosyncrasies of the cuisine classification that these websites show, such as the varying granularities (e.g., asian vs. californian), overlap (e.g., asian, japanese), and presence of fusion (e.g., tex-mex).

Cuisine Category
African non_west
Afro-brazilian non_west
American fusion
Argentine non_west
Ashkenazi west
Asian non_west
Australian west
Australian-new-zealander west
Austrian west
Bangladeshi non_west
Basque west
Belgian west
Brazilian non_west
British west
Cajun fusion
Cajun-creole non_west
Californian fusion
Canadian west
Cantonese non_west
Caribbean fusion
Central-american non_west
Central-asian non_west
Central-south-american non_west
Chilean non_west
Chinese non_west
Colombian non_west
Creole fusion
Cuban fusion
Czech west
Danish west
Dutch west
East-african non_west
Eastern-european west
Eastern-european-russian west
Ecuadorian non_west
Egyptian non_west
English west
Ethiopian non_west
European west
Filipino non_west
Finnish west
French west
French-provencal west
German west
Greek west
Hungarian west
Icelandic west
Indian non_west
Indonesian non_west
Iranian non_west
Irish west
Israeli non_west
Italian west
Italian-american west
Jamaican non_west
Japanese non_west
Jewish fusion
Korean non_west
Latin-american non_west
Lebanese non_west
Malaysian non_west
Mediterranean fusion
Mexican non_west
Middle-eastern non_west
Midwestern fusion
Moroccan non_west
Moroccan-north-african non_west
New-england fusion
Nigerian non_west
North-african non_west
Northern-italian west
Norwegian west
Nuevo-latino non_west
Pacific-northwest fusion
Pakistani non_west
Persian non_west
Peruvian non_west
Polish west
Portuguese west
Provencal west
Puerto-rican non_west
Quebec west
Russian west
Scandinavian west
Scottish west
Sephardic west
South-african non_west
South-american non_west
South-asian non_west
Southeast-asian non_west
Southern fusion
Southern-italian west
Southwestern fusion
Spanish west
Spanish-portuguese west
Swedish west
Swiss west
Szechuan non_west
Taiwanese non_west
Tex-mex fusion
Thai non_west
Tibetan non_west
Turkish non_west
Tuscan west
Vietnamese non_west
Welsh west
West-african non_west

For the purposes of this project, I was concerned only with the macro-categories of "Western" (w) and "non-Western" (nw), which are of course slippery in their boundaries as well, often due to the presence of cross-cultural exchange, immigration, and other sociocultural phenomena resulting in what is broadly termed "fusion". To further simplify matters, I excluded recipes belonging to categories that are inherently at the intersection of w and nw, such as Tex-Mex and Mediterranean.

Though the majority of recipes are categorized as exactly one cuisine, about $1/3$ are missing an associated cuisine category (all of the BA recipes fall into this bin), and a handful of recipes are categorized as multiple cuisines. Excluding one outlier recipe ("homemade adobo seasoning mix & dry rub", which is associated with 42 different cuisines according to F52), the number of associated cuisines for recipes in our dataset are shown below.

In [3]:
Image(filename="./figures/num_cuisine_cats_dist.png",width=700)
Out[3]:

It would be a shame to discard the third of our data that is missing a cuisine label, so I trained a classifier to label this unlabeled data for the analysis.

For the recipes with too many cuisine labels, given that the cuisine categories occur at multiple granularities, I first performed some manual inspection to see whether some categories can be collapsed. A heatmap showing the overlap percentages (calculated as $\frac{N_{c1,c2}}{N_{c1}}$, where $N_{c1,c2}$ is the number of recipes tagged as both $c1$ and $c2$, and $N_{c1}$ is the number of recipes tagged as $c1$) is shown below.

In [4]:
Image(filename="./figures/cuisine_overlap.png",width=1000)
Out[4]:

Overall, the overlaps make intuitive sense: for example, there is a moderate degree of overlap between is_jamaican and is_caribbean. At the same time, it seems that most recipes do not get tagged as all cuisines that are applicable, i.e., there is imperfect overlap between cuisines and their hypernyms (like is_tuscan and is_italian) as one would expect.

To find potentially collapsible cuisine categories, I applied a threshold of $75\%$: if $75\%$ of recipes categorized as $c1$ are also categorized as $c2$, then I used a single label, $c2$, to label all recipes categorized as either $c1$ or $c2$. This scheme allows me to avoid counting a cuisine from a single macro-category twice, and resulted in the merging of the following categories:

  • Central-American, Latin-American, South-American
  • Northern-Italian and Italian

The resulting distribution of recipes over different cuisines is highly skewed, with recipes categorized as Italian, American, and French making up roughly 40% of all categorized recipes, as shown below:

In [5]:
Image(filename="./figures/dist_20_freq_cuisines.png",width=800)
Out[5]:

Now that Ie have collapsed cuisine categories where possible, for the remaining recipes with multiple cuisine labels, I'll use the following labeling scheme:

  • Case 1: majority vote with the threshold at $0.67$ (e.g., a recipe with $2/3$ or more non-Western associated cuisines would receive the label nw)
  • Case 2: tie or no two-thirds majority--assign the label f for fusion, which I will exclude from later analysis

The resulting distribution over macro cuisine category labels is shown below:

In [6]:
Image(filename="./figures/dist_macro_cuisine_cats.png",width=500)
Out[6]:

Comparing cuisine category-derived labels with demonym-derived labels

As a sanity check, let's compare our macro-category labels to explicit cues in the form of demonyms[5] that are present in recipe titles, e.g. "Moroccan Lamb Stew". About $\frac{2281}{53946}=4.2\%$ of recipes have a demonym in the title. I used a manually created mapping of demonyms onto the macro-categories of w and nw, as follows:

Demonym Category
Afghan non_west
Africa non_west
Alaskan west
Algerian non_west
America west
American west
Andalusian west
Argentine non_west
Armenian non_west
Asia non_west
Australian west
Austrian west
Baja non_west
Bangladeshi non_west
Bavarian west
Belgian west
Bengal non_west
Bengali non_west
Bohemian west
Bosnian west
Brazilian non_west
Breton west
Bulgarian west
Canadian west
Cantabrian west
Chilean non_west
Colombian non_west
Corsican west
Cretan west
Cuban non_west
Czech west
Dalmatian west
Dominican non_west
Ecuadorian non_west
Egyptian non_west
Ethiopian non_west
Galician west
Genovese west
Georgian west
German west
Ghanaian non_west
Guatemalan non_west
Haitian non_west
Hungarian west
Hyderabad non_west
Indian non_west
Indonesian non_west
Iranian non_west
Israel non_west
Italian west
Jamaican non_west
Japanese non_west
Javanese non_west
Jordanian non_west
Laotian non_west
Liberian non_west
Libyan non_west
Ligurian west
Lithuanian west
Macedonian west
Maharashtrian non_west
Majorcan west
Malagasy non_west
Malaysian non_west
Mexican non_west
Moroccan non_west
Nicaraguan non_west
Nigerian non_west
Norman west
Parisian west
Persian non_west
Peruvian non_west
Phoenician fusion
Pole west
Qatar non_west
Roman west
Russian west
Salvadoran non_west
Sardinian west
Serbian west
Sicilian west
Silesian west
Singaporean non_west
Slovak west
Somali non_west
Syrian non_west
Tahitian non_west
Texan west
Thai non_west
Tunisian non_west
Ukrainian west
Umbrian west
Uzbek non_west
Uzbekistani non_west
Valencian west
Venezuelan non_west
Victorian west

Of those recipes with a demonym, the majority are nw demonyms:

In [7]:
Image("./figures/dist_macro_demonym_cats.png",width=500)
Out[7]:

As we would expect, only a very small percentage of recipes with demonyms are categorized in the opposite category to their demonym: $1.8\%$ for recipes with non-western demonyms; $1.6\%$ for recipes with western demonyms.

In [8]:
Image("./figures/dem_cuisine_cat_confusion_matrix.png",width=600)
Out[8]:

Some examples of disagreement between demonyms in a recipe and a website's cuisine categorization for a recipe include the following:

Recipe title Demonym-based label Cuisine-category-based label
Hand Held Syrian Spinach Pies nw w
Roasted Japanese Eggplant With Crushed Tomato, Pecorino and Thyme nw w
Vincent Hodgins's Moroccan Spiced Gravlax nw w
Fried Rice with Canadian Bacon w nw
An American Vegetable Soup w nw

As these examples of disagreement show, there are genuine reasons for considering such recipes as both nw and w. However, for better consistency, I ignored the signal provided by the demonym and continued to use solely the cuisine-categories to label these recipes.

Finally, there are 267 recipes that are missing a cuisine category but which do have a demonym, so in this case, we may as well label these recipes based on the macro-category of their demonyms. The final distribution of recipes over macro-categories is shown below:

In [9]:
Image("./figures/dist_macro_cuisine_cats_final.png",width=600)
Out[9]:

I saved these labels for later use, including training the classifier.

Classifier

Training and eval data

I experimented with several sources of training data for a logistic regression classifier:

Name Description Size
recipes_labeled a random selection of $2/3$ the labeled data from recipes (labels from the macro_cuisine_cat column) $16018$
wiki data from the Wikipedia lists https://en.wikipedia.org/wiki/European_cuisine and https://en.wikipedia.org/wiki/Asian_cuisine to correspond to w and nw dishes, respectively[6] $8293$
recipes_labeled_nouns same as recipes_labeled, keeping just the nouns from the recipe names (after POS-tagging with SpaCy) $16018$
wiki_nouns same as above, but with wiki as the base dataset $8293$

I also created several different eval/dev sets:

Name Description Size
recipes_labeled the remaining $1/3$ split of recipes_labeled $7783$
demonyms a subset of recipes with demonyms in the titles $235$
manual a manually-labeled subset of recipes_labeled with recipes from BA $169$
<X>_nouns each of the 3 datasets above, restricted to noun lemmas only

Some notes on the data splits:

  • Sizes shown are after excluding recipes with a fusion label.
  • I did not include wiki data in any eval or dev sets because the ultimate goal is to label non-Wikipedia recipes.
  • Observations on wiki data:
    • There is some overlap ($N=113$) in w/nw dishes due to some common food terms as well as categories spanning Western and non-Western cuisine.
    • e.g.: "beer", "caviar", "fruits", "ice cream", "kasha", "noodle soups with meat"
    • I included the overlapping data in the training set (so, e.g., "ice cream" occurs twice, once with each label).
  • Since BA does not have any cuisine labels, I created labels for the eval/dev sets via manual annotation. For this reason, there is also no BA data in the training set.
  • I set all random seeds to 42.

And I applied the following pre-processing steps:

  • remove punctuation and non-alpha characters (keeping diacritics)
  • lowercase
  • deduplicate the training split of recipes_labeled from anything contained in manual and demonyms

Training a logistic regression classifier

Next, I trained logistic regression (LR) classifiers using the scikit-learn library, experimenting with all the training datasets and all possible combinations of word- and character-level features, with n-gram ranges of 1- to 4-grams for word-level features. I got the best cross-validated dev accuracy (across all dev sets) using recipes_labeled as training data and word-level features up to trigrams in combination with char-level features.

I then did a grid-search over the following hyperparameters, again with recipes_labeled as the eval and training sets:

Hyperparameter Values Best value
solver ['newton-cg', 'lbfgs', 'liblinear'] 'lbfgs'
c_value [100, 10, 1.0, 0.1, 0.01] 10
max_iter [100, 200, 500, 1000] 100
random_state [1639, 18024, 16049, 14628, 9144] (5 randomly generated seeds) 1639

A more detailed performance of the best-performing classifier is shown below (evaluated on the test split of recipes_labeled):

precision recall f1-score support
non_west 0.92 0.88 0.90 1958
west 0.92 0.94 0.93 2712
accuracy 0.92 4670
macro avg 0.92 0.91 0.92 4670
weighted avg 0.92 0.92 0.92 4670

Moreover, the classifier does pretty well on the manually-labeled test set, especially considering that this eval set comes entirely from BA-data not represented in the training data. Although performance on nw recipes is worse, the overall accuracy is still better than a majority-class baseline ($\frac{140}{169}=82.8\%$).

precision recall f1-score support
non_west 0.53 0.90 0.67 29
west 0.97 0.84 0.90 140
accuracy 0.85 169
macro avg 0.75 0.87 0.78 169
weighted avg 0.90 0.85 0.86 169

Qualitative error analysis

Next, I examined the incorrect predictions ($N=384$) and noticed some recurring misclassification types that largely make sense.

Error type Examples Justification
fusion recipes "miso pork bread pudding", "shichimi togarashi granola", "super easy naan pizza " there are conflicting signals, e.g. "miso", "shichimi", "naan" are nw but "pudding", "granola", and "pizza" are w
unigram recipes "ayran", "cholay" these recipes are not represented in the training data and contain no signal in the form of adjectives or ingredients for the classifier to use
vague recipes "chicken stew", "roast pork", "spring soup" these recipes are inherently impossible to classify without more context

We can also examine the feature weights to see which word- and char-features are most associated with each class. The top 15 features from each class are shown below.

In [10]:
Image("./figures/15_top_features.png",width=800)
Out[10]:

For the w label, we notice many demonyms and words associated with Spanish and Italian cuisine ("paella", "bruschetta"). This is unsurprising, given the frequency of Spanish and Italian recipes in the training data. For the nw label, we also notice many demonyms as well as some common dishes associated with the top-weighted demonyms ("tacos", "tamale", "chutney").

Considering the intuitive errors and feature weights, I was pretty happy with the classifier's performance, and proceeded to the analysis with the fully-labeled dataset of recipes.

Analysis

I first applied the classifier to the unlabeled portion of recipes in order to get a prediction for macro_cuisine_cat for all datapoints. I conducted analysis on the fully labeled recipes dataset (combining existing website cuisine-category- and demonym-based labels and model predictions), as well as the a priori labeled subset of recipes (i.e., excluding model predictions), finding that similar patterns hold. All subsequent plots and figures are thus based on the full dataset.

Next, I used SpaCy (https://spacy.io/) to tokenize, lemmatize, part-of-speech tag, and dependency parse all text data (recipe title, ingredients, directions, free text, and full text of the associated "featured in" article in the case of some NYT recipes).

We can now do a few different kinds of analysis:

  • General: log odds ratios to determine which words are most associated with w vs. nw recipes
  • Approachability analysis: Do nw recipes get described more often as "fast", "easy" etc. to make them more approachable, as Yewande and Priya hypothesized?

General words

Using the results from spaCy pre-processing, I calculated the log-odds-ratio (LOR) for each word occurring with a Western vs. a Non-Western recipe using an informative Dirichlet prior (following Monroe et al. (2009)[7]).

In [11]:
Image("./figures/gen_deltas.png",width=800)
Out[11]:

Even after filtering to words with a significant LOR value, there are many which reflect inherent differences in terms of ingredients, cooking techniques, and proper nouns that exist between W and NW cuisine (for example, "ghee", "wok", and "indian" are unsurprisingly more associated with NW cuisine, and "pasta", "italian", and "bake" are more associated with W cuisine).

Thus, I used WordNet (https://wordnet.princeton.edu/) to compile a set of ingredients (hyponyms of the synsets ingredient.n.03, food.n.01, food.n.02, carbohydrate.n.01, edible_fat.n.01, edible_seed.n.01, herb.n.01, herb.n.02), cooking-related terms (hyponyms of kitchen_utensil.n.01, tableware.n.01), and proper nouns (hyponyms and member meronyms of state.n.04, country.n.02) to additionally filter out. The resulting log odds ratio plot for words that remain is shown below.

In [12]:
Image("./figures/gen_deltas_post_filtering.png",width=800)
Out[12]:

Now that we have identified the words that are most associated with each cuisine type beyond these inherent cuisine characteristics, **what patterns do we see emerge among each cuisine type's words?**

To answer this question, I clustered the words into semantically coherent groups by retrieving word embeddings (GoogleNews, 300 dimensions) for each word, then applying KMeans clustering (I chose num_clusters=7 after manual inspection). The visualized results are shown below for each cuisine type:

In [13]:
Image("./figures/NW_clusters.png")
Out[13]:
In [14]:
Image("./figures/W_clusters.png")
Out[14]:

From manual inspection of the words contained in each cluster, I came up with over-arching semantic descriptions for each cuisine type's clusters. For both Western and non-Western recipes, POS-tag-related clusters emerge: both cuisine types have separate clusters for (what are predominantly) verbs and adjectives. Furthermore, both cuisine types have a cluster for proper nouns, as well as a cluster for common nouns that are unrelated to any of the other noun-clusters.

However, within similar clusters (e.g., adjectives, verbs, common nouns), we observe many contrasts between the two cuisine types, and we also observe that there are clusters unique to one cuisine type and not the other.

For example, one of the clusters unique to non-Western recipes consists of words relating to culture, family, and ethnic identity, e.g.: "mom", "family", "ancestor", "community", "paternal", "bride", "language", "indigenous", "colonial", "immigrant", "diaspora", "homesickness", "nationality". Many of the words in this cluster are suggestive of "immigrant story"-type narratives, suggesting that recipe writers tend to couch nw foods in terms of their ethnic symbolism.

On the other hand, Western recipes words have a cluster relating to terms indexing luxury ("gastronomic", "hotel", "butler", "gourmet") and rusticity ("countryside", "village", "shepherd", "farmhouse", "harvest", "forest", "hearth", "hunter", "peasant", "rustic"), and terms residing in the overlap ("manor", "heirloom") (it is perhaps also no coincidence that rusticity and the use of local, seasonal ingredients currently have significant cachet in fine dining).

Never mind that Western foods are also ethnic artifacts and intertwined in Western ethnic identities, or that non-Western recipes are also cooked by peasants and shepherds living in farmhouses and forests in their rustic countryside villages--it appears that for American food media, NW foods tend to be symbols of their cultures and the immigration that brings their cultures into relief, whereas W foods tend to be symbols of an upscale, farm-to-table aesthetic.

A similar bias is reflected in the Adjective clusters for each cuisine type: wherease Western recipes attract adjectives indexing fine dining such as "seductive", "stylistic", "grownup", "impressive", "timeless", "extravagant", "lavish", "fussy", "sexy", "sublime", "fashionable", "sleek", "fine", "fancy", "classic", and "elegant", non-Western recipes attract adjectives like "cheap", "affordable", and "everyday".

Also interesting is the greater presence of adjectives and lemmas involved in comparison ("different", "unfamiliar", "reminiscent", "exotic") within non-Western recipes--this suggests that Western recipes tend to be regarded as the default, against which recipe writers compare non-Western foods, ingredients, techniques etc.

Finally, it is interesting to note that adjectives relating to taste and smell (especially some that carry negative connotations) are more likely to be found in non-Western recipes (see the 'Foods and flavors' cluster, and some entries in the main Adjectives cluster, e.g. "bold", "funky"). Of course, this bias might be partially explained by the use of many flavorful ingredients and spices in non-Western cuisine that (stereotypically) blander Western cuisine lacks, but also by the subjective perception of non-Western cuisine as less bland.

NW clusters

Proper nouns Common nouns Specialized food terms Culture, family, nationality Foods and flavors Adjectives Verbs
dal steam wrapper southeast spicy fiery skewer
char heat oriental lunar grill thinly shred
turkish pressure kernel grocery marinate shredded ferment
shanghai dipping charcoal coal masa addictive soak
latin mortar fresco supermarket aromatic exotic stir
mam market matchstick mom glutinous grind sate
monterey many seaweed cultural crispy refreshing turn
msg lid rinse southwestern cashews authentic condense
tet traditional devein imperial smoky bright combine
taiwanese available asafetida comedian flavor favorite blacken
bangkok part pod indigenous vegetarian dim splutter
lee healthy fermented family garnish easy burn
taiwan through thigh eastern flavorful wisp squeeze
philippine pad pestle westerner boneless staple find
newman smoke lotus waterman tandoor comforting adjust
yuan own asafoetida colonial salty numbing sit
lakshmi national cobs polo shiitake milder darken
sin fire clay canton cob superman distil
beijing skirt cleaver monsoon delicious bold sputter
anaheim different slurry pasha microwave silken shun
amarillo distinct cashew maternal eat reminiscent enjoy
ras unique cellophane shark tangy colorful inspire
caribbean glove agave immigrant pungent amazing replenish
seoul section barberry religious cook funky ram
bombay super shimmer grocer tasty eyed subscribe
chi smoking bonnet dynasty nutritious incendiary prepare
muslim batch cactus pantry cuisine awesome zap
tokyo sum finely communal spiciness fantastic serve
fresno street blister diaspora slurp satisfying add
chen sticker drumstick pastor skinless script subdue
lexington online floret professor cooked steamy vaporize
agua smash tilapia tiger vegan tedious flee
delhi less ml editor broil ubiquitous impart
mumbai prep fungus canal sizzle familiar grow
rajas helpful stalk peninsula steaming fusion cater
pacific aside perilla household sodium picky omit
tam patrón massage subcontinent grilling mellow chronicle
istanbul everyday nigella trader sear subtle submerge
xian spam cassia refugee basting unseasoned toss
zhou vendor fragrance cub vinegary surprising mop
chang influence lacquer palace dhal vibrant meld
han contact bruise pup gingery palatable wipe
hass flushing protein lam momos squishy discard
latino carousel packet hawker accompaniment bumpy customize
arabic scramble pleat ancestor rotisserie shrimpy inoculate
lan trading valve coastal pomfret drunken take
krishna similar filé minister yummy unfamiliar buy
info leftover tan queen unsweetened snappy build
hindi indirect hatch paternal sambar auspicious encapsulate
hindu brand briquette archipelago seared chunky explode
mr lean fin unity marinating fleshed deteriorate
diaz once soman bride sourness moody emit
ron various musky prosperity grilled lech introduce
li commonly aguas uncle flavourful fun oversimplify
naomi able bead political aroma unrefined transliterate
negro can unripe phonetic baste opaque unzip
lin weeknight jar westernize charred crooked sell
ngo mesh acidic gringo soupy numb blot
buddhist cast ginkgo iguana gobi lively wear
vega preference mosquito eatery curried shriek drain
rick also medicinal immigrate minced intriguing sterilize
kyoto slow diagonally language sweetness blasphemous recombine
punjab win kelp student capsicum rare stimulate
zulu preserved berber census piquant cheery sweeten
mao depth strainer southeastern spicing intense swirl
cuon kick desiccate menopausal deveined pliable recreate
tao balance resin elderly spiced boring borrow
yang dice reddish immigration sweetened soaking scorch
t whistle container nationalist tang mystical skip
shabu roughly mezcal pueblo coarsely spattering gobble
ba general tropic smoker nourishing intricate pronounce
eden introductory teflon homesickness tasso zippy comprise
min share fuchsia screenplay crisping warmth migrate
wichita countless alkaline worker toasted lite squirt
bach towel subtropical nationality kadai wonderful meshed
kwai excited ceramic servant comal hungry baffle
kennedy immune cylindrical music juicy gracious slake
mei burner burdock outsider broiled sizzling vent
amazon indefinitely brazier racial chewy irresistible synchronize
sally typically hibiscus sorcery spicier powerful inveigle
bali full diagonal northeastern tenderize tingle disabuse
nu domino headspace exile dishes inauthentic retrain
raja tolerance afro ancestral piquancy popping rip
bai most husked mythological zesty perfect energize
xe morton osmanthus acreage broiling crappy define
ip basket mealie gambler culinary sourest enliven
causa complete serape conscionable palate hothouse translate
ka tempering chrysanthemum griot recipe interesting depart
hindus minimal lactic machete griddled lazy
atlanta integral frizzle shrimping jellied eclectic
mama fearing insulation community fatty omnipresent
wednesday like brine mason crosswise exciting
lama version saran ecological minty slimy
chennai plank laver arid puréed mojo
xi international slit caterer crispiness saucy
doc length smell festival julienned mildly
kampong mott bikini geography citric sourer
poon travel splatter ethnicity dusty
para dynamite probiotic ethnic junky
phoenix other octagonal strait playful
austin non loofah kin wilted
mara crunch conical consul crave
pronounced sick colander jungle unstoppable
chino numerous impurity modernist sterile
god podcast pistachio singleton kid
diana ceremonial restorative franc distinctive
ramadan rangoon pummelo author synonymous
toledo finishing shade hobo sloppy
goma level souk fertility sprite
ana dear oxidant brewer untraditional
pei visit detox harem literal
shang long eddy caravan ornate
dodo copyright caffeine ranch indestructible
gao fatima milky population explosively
tod gathering alkali civil counterintuitive
jain moderately bulbous doctor bliss
lucknow handful fiddlehead psychologist unlit
ali ½ gelatinous ecumenical blistered
ra here lacquered discernment massaged
ming systematically hydrolyze intermarried licked
btu uptown proton dance slanted
ting dress hydroxyl commander unobtrusively
pb explosive muslin idleness slippery
jakarta bicycle mottled cacique occasionally
pong stall spherical autobiographical innocent
bong manchurian scissor cremation saturated
bangalore promised bin ghillie heady
tum portable wisps avocation liking
kip ensure gunpowder prayer mighty
ho alongside mouth shrine butchered
mick welcome earthen ancestry potent
som below facial physician beefy
devi huge rasp martial unreal
himalayan grapevine bud mercantile vivid
bc vein papery adolescence frightened
afghani spike thong shire wildest
anu cornerstone eelgrass shrimper sniffy
akan manually fridge travelogue nebulous
kali excerpt lb gleeful
na then crosshatch voluptuousness
rama prospective brightness unmindful
maharashtra technique gingko jarred
bollywood type adventitious stormy
qi player longan craggy
hua stack burr feverish
maha changer carport iconic
senor arranged collagen contagious
sanchez tat penicillin pleasantly
boston firecracker repellent uncomfortable
epi known granule smacking
chick maximum bacteria artful
zee glazer halo glowing
gan flare rosebud authenticity
papas inherent antiseptic versatile
mona piper unwaxed devious
ji valuable incense obscene
ge block receptor hallowed
kashmir element krill envious
lu honored lung slaked
fen eurasian plaster umpteenth
buddhism step garland grandest
shah regular yellowfin incredible
pang timely lacy impatient
rivera brewing sumac moxie
l distinguished desiccated magical
santa uncommon gloved sheer
dana usually terraced characteristic
uma essential fuschia richly
shawn legitimate digestibility reflective
lutheran live silica radiant
tehran 2 cornhusk subterranean
moslem healthiest arco unturned
lei alternatively aldehyde senseless
perth blessing maroon effortful
dharma policy sapphire indiscernible
git discouraging floss aimless
karachi illness herbaceous symbiotic
imo seventh amino brutally
rani wrapping beany brighter
wu cart lengthwise exquisiteness
tung makeshift slime tangent
pachinko traditionally spire shimmering
n chase flaked superstitious
goldman fulfilling unruly
albuquerque speedy ravenous
cornish call subdued
peri starting rawness
david irreversible hedonistic
scum trip hypnotic
moore administrative lamentably
ko prepackaged faintly
inter lonely
unheeded swish
diplomatically authentically
including heaping
cleanup stoner
complexity gregarious
exclusively stinky
requisite fishy
partially
contrast
appear
aisle
hit
concession
extensive
shopping
consistency
carom
unit
honorary
8th
disposable
unsanitary
he
grouping
weill
infusion
originally
hundred
undisturbed
viral
leant
inadequately
ctc
consultation
attrition
unfolding
disrepair
piñata
fixing
optimal
blending
formerly
symbolic
anti
hung
coincidentally
cheap
overwhelming
stature
note
chopping
outgoing
underlying
young
easily
routinely
member
backyard
roadside
inaccessible
overcrowded
suspicious
courtesy
smashed
prefer
wish
ideal
curbside
nairobi
unbrushed
theoretical
favored
refining
wraparound
cassette
gong
intensive
roulette
ratio
rut
chipped
abide
absolute
option
latter
package
utmost
steep
function
ultimate
middle
deeply
eleventh
immeasurable
declaration
notation
homecoming
unopened
affordable
effect
lock
know
remiss
kit
drum
tenet
diversion
finally
exploratory
diagraming
deranging
schematic
unrestricted
adaptive
embedded
trapped
unconverted
gravely
lob
refusal
winning
asean
feasibility
registration
waterline
interchangeably
spoken
method
ploy
appreciative
overseas
dirt
selective
cognitive
trucking
levy
football
safety
important
permission
consecutive
composite
permissible
fruitful
finder
sometime
accordingly
basic
instantly
weeklong
propane
similarity

W clusters

Proper nouns Common nouns Core cooking terms Secondary cooking terms Luxury and rusticity Adjectives Verbs
june sufficiently unflavored amber cinchona amusing stiffen
moscow hawk kneading degrease reindeer weary tremble
freeman thicket wholemeal carmine manor heaped look
portland nature poussin pearled sweater childish elude
october excuse brininess braided skiing paean leaven
crosby tune roasting grating pagan charm bother
m narrower glug tentacle vulture toweling rearrange
sir pointer diced faucet conger seductive suckle
adam peaked oxidize unglazed deer clutch scraping
berkeley dispute calorie handkerchief dory frantic destroy
sunday counter cupful nitrate carnival witty divvy
f broke runny glazed bath stylistic lure
catherine perpetual blanch spacing countryside demeanor absorb
leicester seminar panini fahrenheit pâtisserie lightened reinvent
herman unreduced floury rubber gastronomic unstrained revive
modifier placement rennet wineglass auberge grownup mull
september trapping calvados doily prefect grinding weave
jerusalem cholesterol tannic blowtorch godmother contemporary tear
cornwall area tannin knuckle restaurateur cherished subside
c damage filleted bulky furniture magnificently seduce
dublin pleased gooey dense sommelier flashy connect
atlantic strive dollop sill kahlua impressive ignite
vienna tomorrow ripened centimeter comte timeless trace
v nonetheless camembert frosted rioja rosy govern
schmidt borough gouda textured merino extraneous distribute
joel forth moisten tumbler springtime extravagant thaw
christ longer crumbly spongy mixology uneven shiver
rockefeller rounded refrigerate sinew quay twist prick
arnold fee lycopene sculpin mollusk famed miss
soho report boil stainless housewife tad hide
amish active deglaze twine lyonnaise chilling render
anderson regulation emulsion envelope pope lavish fortify
nancy self syrupy translucent chablis festive weigh
copenhagen entry yeasty eggshell cheesemonger fussy adhere
br relation spritz molten distillery energetic bind
somerset shaping tablespoonful indent sable warmed push
harry is salted perforated dogfish homey convince
mary offer gizzard needle boletus pricey sever
baldwin amateur chilled flap corpse twisted mar
arthur stride creamy incision votive celebrated crushed
albert widest fruity cottony rouge bubbling trim
kaufman workable ungreased puffy tuxedo stuff unroll
scott accidental freezer genoa deli hardened dredge
fanny pay flaxseed nonmetallic benedict sexy attach
milano exceedingly briny cork duchess sensual ooze
mls immediately curdle pane grandson damp sifted
judith perhaps emulsify colored carol lent spiral
anne trash quart mallet monastery fiddle enrich
cr away unbleached milligram woman ultra wilt
dover shipment sauteed scape yule sublime overflow
andrew costly cutlet dye village fanciful shrink
maria test browned foam delis blond lighten
michael prize foamy feathery lenten sturdy shave
hamilton residual moist scalloped alp duo blossom
bernard conversion glace soil lodge indulgence drape
keller parameter frosting indentation brut musical unfold
noel blockbuster teaspoonful superfine tasting orthodox tap
sheeps lawsuit brie disc symphony fashionable resemble
da eight stilton greenish fisherman ethereal slip
monte cruise semisweet cavity pounder enthusiastic lift
dolce reasonably rind speck hermitage shower crumble
slater sorry buttery silicone grove lusty decorate
franco indeed sangria ink mare embellishment slide
welsh minuscule drizzle particle automat creamed crimp
brittany stamp floured hole hotel boning remove
oliver prestigious gruyere cutout sucre rocky deflate
teresa preliminary dijon snip chapel simplicity loosen
allen pain marsala hollow jean shiny gather
barcelona scored buttered liner shop airy come
wellington out knead serrated noma unpredictable hook
patrick nick balsamic sediment distiller blustery greased
devon council breadcrumb rosette shepherd cornucopia freeze
halle will hazelnut thickly horseback chic poach
jonathan prolonged unsalted sifter riviera generous flatten
guinness waning sprinkle oversize dining interior lynch
monsieur convinced yeast bra farmhouse forte insert
louis mistake preheat decorative harvest impeccable pour
olivier blind bake rim forest luscious rotate
parker question cardboard hearth classy pierce
europe preparation nest salmonella superb choke
beck thoroughly flowerpot robin airtight incorporate
jacob locally wheel manzanilla luxurious scatter
les commercial swivel swan admire hold
z consumer wreath cellar circus fill
nero partisan dimple brasserie elegance arrange
lyon pattern capri bellini whisking scrape
madrid selection hipbone cloudberry underdone fit
florence subject dome patisserie zigzag shape
yorkshire emmental braid framboise autumnal sift
milan finish gill butler tepid melt
paul owner clump linen stern beat
anna idea rimless valentine blend
finnish mini thickness piazza sleek
venice rapid fluted decoration grainy
pierre faced tubular greenmarket neatly
paris grant granulate tapa simmering
rome heavily tine salade luxury
christmas title inch waiter frothy
resilient decoratively château loosely
stroke casing russet waxed
tunnel bladed geese fashion
visible metal medieval fine
cumberland oval stein touch
careful tube hunter fragile
attention shallow boar wet
appeal roller nursery polish
demand rectangular flute flaky
eighth paddle saba excellent
vision grate opera cheesy
certain spray ski sparkle
modest scraper peasant decadent
interval lattice hill fancy
trend rope heirloom boozy
6th diameter sundried glossy
motor concentric winery oiled
relentlessly dot coupe generously
base grease trotter lukewarm
suddenly tin chef beater
dean tester cirque stirring
untouched frond apprentice zest
continually elastic pub classic
blight fingertip granny stale
exhibition frost gourmet chill
suggest bulb tavern elegant
quite ounce thanksgiving crusty
nothing gently farmer lightly
recently burgundy tapas stiff
internal foil sauterne warm
very disk tent freshly
structure brush chanterelle cool
tilt pipe calf
shrinkage sprig chianti
shipping removable vineyard
notion temperature chateau
charge thick roquefort
letter rimmed butcher
stable rectangle pizzeria
tough dust provencal
core coarse alsatian
presumably surface pinot
fora plastic caesar
xiv melted bakery
incorporated parchment nordic
defiance bartender
premier processor
speech mediterranean
allow armagnac
apprenticeship bistro
watchful lavender
nightcap bordeaux
halfway pint
run countess
scout beaujolais
male madeira
deputy provence
integrity bouquet
steady baker
success cognac
overlapping gruyère
mass rustic
tour morel
hunt confectioner
reputation virgin
alternately
blemish
assembled
rate
ramp
apart
career
door
force
nineteenth
inevitable
tension
book
job
crop
intact
aid
receive
age
reduced
seasonal
portion
almost
verge
midwestern
update
uniform
meanwhile
too
concord
reception
creation
manger
child
same
stress
boss
lorraine
there
total
vacation
producer
county
size
par
today
name
morale
recommendation
domestic
cauda
moderate
over
possible
ready
accumulated
closely
spoke
definitive
aged
have
slump
professional
upside
paring
reduction
inside
financier
recall
fourth
covering
row
vigorously
flow
guide
off
yukon
holiday
price
victor
say
reserve
seem
sided
offset
strata
reverse
not
topping
volume
overwork
individual
barely
correct
return
be
work
carefully
slightly
shaving
quarter
pass
stream
hour
equal
irregular
increase
addition
quality
tie
briefly
space
rapidly
constantly
bulk
wide
excess
stretch
necessary
overlap
enough
ray
reserved
skate
court
yield
do
least
additional
remain
border
slotted
scrap
import
approximately
sharp
time
still
completely
press
clean
ahead
log
evenly
cure
continue
attachment
about
stand
weight
position
tender
slowly
overhang
extra
circle
room
low
speed
half
fold
edge
form
rise
purpose
gradually
peak
prepared
degree

Yewande and Priya's hypothesis: describing NW recipes as "fast" and "easy"

Returning to one of the motivating questions behind this project--**whether there is empirical support for Yewande and Priya's suspicion that American food media tends to qualify non-Western recipes as "fast" and "easy"** (to make them more approachable for white audiences)--I will show that non-Western recipes are indeed more likely to be associated with adjectives indexing speed and ease compared to Western recipes.

However, I also find that non-Western recipes take less time to prepare than Western recipes using the provided estimated cooking times, and that non-Western recipes have fewer steps than Western recipes.

Nevertheless, when comparing recipes binned by similar preparation time and number of steps, non-Western recipes are still more likely to be associated with adjectives indexing speed and ease.

Cooking time analysis

Here is the distribution of (log) recipe preparation times for recipes with cooking time information available ($N=3326$):

In [15]:
Image("./figures/cooking_time_dist.png",width=600)
Out[15]:

Here is the distribution of (log) recipe preparation times, disaggregating w from nw recipes:

In [16]:
Image("./figures/cooking_time_dist_w_v_nw.png",width=600)
Out[16]:

Here are the cooking times broken down across the 15 most common cuisines:

In [17]:
Image("./figures/cooking_time_boxplots.png",width=700)
Out[17]:

From a one-tailed t-test, it looks like there is a meaningful difference between preparation time of w vs. nw recipes ($t=3.4, p<0.001$). But within each bin, are nw recipes still more likely to be described as quick? We bin recipes into one of 4 bins based on the cooking time provided in the recipe notes, as shown below:

In [18]:
Image("./figures/cooking_time_binned.png",width=700)
Out[18]:

We find that among the words indexing "fastness", only instant is significantly more likely to be applied to either w or nw recipes for certain bins. However, it is always more likely to be applied to nw recipes, even when the cooking time is similar to w recipes.

In [19]:
Image("./figures/lor_boxplots_cooking_time.png",width=400)
Out[19]:

Cooking ease

I used the number of steps as a proxy for ease of a recipe (the fewer steps, the easier the recipe). The distribution of recipes over (log) number of steps is shown below.

In [20]:
Image("./figures/cooking_ease_dist.png",width=500)
Out[20]:

Here is the distribution of (log) cooking ease, disaggregating W from NW recipes:

In [21]:
Image("./figures/cooking_ease_dist_w_v_nw.png",width=600)
Out[21]:

And the cooking ease broken down across the 15 most common cuisines:

In [22]:
Image("./figures/cooking_ease_boxplots.png",width=700)
Out[22]:

From a one-tailed independent t-test, it also looks like w recipes do have significantly more steps ($t=18,p<1e-71$). But again, we can ask--within separate bins, are nw recipes still more likely to be described as easy? I binned recipes into one of 4 bins based on the number of steps in the recipe directions, as shown below:

In [23]:
Image("./figures/cooking_ease_binned.png",width=600)
Out[23]:

Again, I found that among words indexing ease that are significantly more likely to be applied to either nw or w recipes, all are more likely to be applied to nw recipes, even though the number of steps are similar.

In [24]:
Image("./figures/lor_boxplots_cooking_ease.png",width=400)
Out[24]:

Footnotes

[1] http://www.intersectionalanalyst.com/intersectional-analyst/2017/1/7/who-gets-to-be-an-authority-on-ethnic-cuisines

[2] https://www.bonappetit.com/story/recipe-writing-whitewashed

[3] I adapted code from recipe-box and this recipe scrapers module to retrieve URLs associated with recipes from each site's archive, in addition to each recipe's title, author, ingredients, directions, free text, and cuisine category, when available. I regularized all author and title text by keeping only alphanumeric characters (i.e. removing punctuation), lowercasing, but crucially retaining diacritics (as I assumed diacritics might be a useful signal in classifying recipes to a cuisine category). I deduplicated recipes using regularized author and title strings, in addition to the recipe's source website (so no two recipes in my dataset were contributed by the same author, with the same name, from the same website). I further removed duplicate recipes from Bon Appetit, which were listed twice due to a separate listing with the photographer as the author, and "test" recipes from Food52, where users can create and upload recipes freely, and so often create submissions for the purpose of testing out the interface. The latter can be identified by manually compiling a set of frequently occurring recipe titles to toss out, such as "test" and "cake".

[4] Analysis data, figures, and code notebooks for this project are available at this repository.

[5] I compiled a list of demonyms from Wikipedia (https://en.wikipedia.org/wiki/Demonym#Suffixation), excluding the demonyms "hamburger" and "frankfurter" as these have more prominent culinary meanings.

[6] For each list, I used the process below to extract names of individual dishes:

    1. Get all hyperlinks with anchor text matching the pattern "X cuisine"
    1. For each cuisine hyperlink:
      • A. Grab all items occurring in a list -> individual dishes/ingredients
      • B. Add to dictionary tracking ingredients/dishes of a cuisine

[7] http://languagelog.ldc.upenn.edu/myl/Monroe.pdf

In [ ]: