Restaurant Point of Sale Analysis

An exploratory analysis on the point of sales order data from a restaurant chain with 10 locations across the region. This was the first time this information had been examined across the entire company since each restaurant’s data was housed locally on site. After gathering the data and normalizing the menu dictionaries, I was able to pull insights about sales trends over time, menu preferences, and order correlations. The video at right is a presentation of my findings, and the articles below detail the technical work behind the project.

Extracting the Data (MS Access, MS Excel)
Normalizing the Data (Python)
Visualizing Sales Trends Over Time (Tableau)
Analyzing Delivery Preferences (Independent Chi Square)
Analyzing Order Correlations (Market Basket Analysis)

EXTRACTING THE DATA

PROJECT: RESTAURANT POINT OF SALE ANALYSIS

GATHERING THE DATA

The point of sales data for each restaurant was housed locally on site at each restaurant. My first step was to visit the local branch to find out what it would take to export each system's information. While the software did have some export utilities, these were limited to exporting the contents of various built-in reports and did not give me access to the raw order-level data I needed. In the end, I went with a full database backup, which generated a copy of the entire Microsoft Access (.mdb) database that the system was built on.

FINDING A WORKAROUND

When I attempted to open the system backup directly in MS Access, I discovered that the file was in a protected format. This makes sense - if you could tap directly into database you could potentially manipulate your sales data. The POS system blocks access to prevent fraud.

But this meant that I needed to find a workaround if this was going to be a viable project. After some investigation, I found that you can open a .mdb file straight from Excel. Since a .mdb file contains many tables, Excel prompted me to choose which table to load...

...and then asked me how I wanted to load it.

This worked decently, and it allowed me to at least get started with poking around the data. But the load time was pretty long on some of the larger tables, and I was really getting more than I needed and in a pretty inefficient way.

STUMBLING ONTO A BETTER WAY

Sometime later I wanted to isolate some data to hone in on and I just happened to notice a tool under the "Data" tab in Excel called (appropriately) "Get Data." I gave it a click, and to my delight there was an option for importing from MS Access:

This brought me to a far more robust data import prompt. Now I could preview the first five records of any table, get a feel for what's inside, and even apply some data transformations during the extraction process.

TRANSFORMING THE DATA

I was rather impressed with Excel's suite of data transformation tools. I was able to identify which columns of data to keep and which to exclude from the import. It also identified the relational nature of the database and allowed me to perform some joins on connected fields.

The "Split Columns" option was particularly intriguing. For example, if I wanted to split the OrderDateTime column into two separate Date and Time fields, I could choose the "By Delimiter" option:

The space between the Date and Time suggested using the space character as the delimiter. Since there was another space between the numerical time and the AM/PM value, I chose to only split by the left-most delimiter and got exactly the result I was looking for, separation of Date and Time into two columns.

Once all transformations were complete, "Close & Load" finished the job by loading the reshaped data into an Excel spreadsheet. Most of the 10 database backups I was working with had about 300,0000-500,000 rows which Excel was able to load successfully. From there I could save as a CSV to prepare the data for the more complex data transformations I would complete in Python.

NORMALIZING THE DATA

PROJECT: RESTAURANT POINT OF SALE ANALYSIS

THE CHALLENGE

The Point of Sale data for this project was gathered from 10 individual branches of the restaurant chain. While each share the same menu, the POS systems were not in sync with regard to the naming of the menu items (or their numeric ID’s). Before I could do any analyses, it was necessary to first normalize each dataset’s menu dictionary.

THE STRATEGY

It was clear that most of the restaurants’ databases were set up by copying the menu schema from another, which meant that there was a good amount of overlap in general. I used the original restaurant’s menu as the starting point and created from that a menukey table that would serve as the master menu dictionary.

I then compared each other restaurant’s menu dictionary against this master and recoded each menu item into a new variable using the following methodology:

For each item in that restaurant’s menu dictionary, check it against the menukey master dictionary.
If there is an exact match, keep that name.
If there is a unique close match, use that close match as the recoded name.
If there is no adequately close match, keep the name and add that to menukey as a new item to update the master menu dictionary.

Here is what that looked like in my Python code:

def MenuRecode(MenuItemsCSV):
    recodetally = 0
    newkeytally = 0
    locationcode = MenuItemsCSV.at[0,'LocationCode']
    
    MenuItemsCSV['MenuItemText'] = MenuItemsCSV['MenuItemText'].str.lower()
    
    for itemID in MenuItemsCSV['MenuItemID']:
        name1 = MenuItemsCSV.loc[MenuItemsCSV.MenuItemID == itemID, 'MenuItemText'].values[0]
        name2 = menukey.loc[menukey.MenuItemID == itemID, 'MenuItemText'].values[0]
        rowindex = getindex(MenuItemsCSV, itemID)
                
        if itemID in menukey['MenuItemID'] and name1 == name2:
            MenuItemsCSV.at[rowindex,'MenuItemID_R'] = itemID
        else:
            a,b = findbestmatch(name1, rowindex, menukey, MenuItemsCSV)
            recodetally = recodetally + a
            newkeytally = newkeytally + b
    
    MenuItemsCSV['MenuItemID_R'] = MenuItemsCSV['MenuItemID_R'].astype(int)
    print(str(locationcode) + " | Recodes: " + str(recodetally) + " | New Keys: " + str(newkeytally))

Lines 13-14 deal with the simplest case when there is an exact match in the master menu dictionary (both the item name and the item ID had to match). The findbestmatch function is called to look for a reasonable match when an exact match is not available.

FINDBESTMATCH function

The get_close_matches function from the difflib package came in clutch here. Here is the syntax:

difflib.get_close_matches(word, possibilities, n=3, cutoff=0.6)

word - the string that you are looking for
possibilities - the list where you are looking for that string
n - how many results you want returned (max)
cutoff - how close a match you require (0-1, with 1 being an exact match)

The cutoff is where you can really yield the most control. Essentially it’s asking, how sensitive do you want your test to be? After a few test runs, I found that a cutoff of 0.8 was giving me optimal results. Anything less was giving me too many false matches, and anything above that failed to pick up desirable matches. I set n to 1 since I was only interested in the best match (if there were any matches at all). Here is what the whole function looked like:

def findbestmatch(MenuItemToFind, rowindex, menukey, MenuItemsCSV):
    
    bestmatch = list(difflib.get_close_matches(MenuItemToFind, menukey.MenuItemText, 1, 0.8))

    if len(bestmatch) == 0:
        newID = max(menukey.MenuItemID) + 1
        MenuItemsCSV.at[rowindex,'MenuItemID_R'] = int(newID)
        newkey = [newID, MenuItemToFind]
        menukey.loc[max(menukey.index) + 1] = newkey
        return(0,1)
    else:
        bestname = bestmatch[0]
        MenuItemsCSV.at[rowindex,'MenuItemText'] = bestname            
        MenuItemsCSV.at[rowindex,'MenuItemID_R'] = menukey.loc[menukey.MenuItemText == bestname, 'MenuItemID'].values[0]            
        return(1,0)

THE RESULTS

You can see the results below after running the MenuRecode function on each of the 10 restaurants’ datasets. The recode and new key tallies helped to verify that for about half of my datasets there was a relatively small amount of alteration to the menu dictionaries, meaning there was already a decent amount of overlap (each dataset had about 1000 menu items in total). R1 was used as the basis of the original menukey. I ran the MenuRecode function on this mostly as a control measure to make sure the function to verify my results. This helped me to troubleshoot some of my coding errors along the way since there should not be any changes to R1 if all is working properly.

After running the data normalization code above, there was still some manual work that I had to do to finish bringing these menu dictionaries into the best possible alignment, but I was pleased to find a solution that would do the bulk of the work for me.

VISUALIZING SALES TRENDS

PROJECT: RESTAURANT POINT OF SALE ANALYSIS

SALES TRENDS BY TIME

Before diving into product-level investigations, I wanted to get a sense for how the restaurants were doing as a whole, and particularly in relation to the impact of COVID on overall sales. The data available for most of the restaurants spanned from just before the beginning of 2019 up through the end of July 2021. This meant that I was fortunate enough to have 2019 as a full “normal” year prior to the onset of COVID in 2020. The Tableau visualizations below represent gross orders by volume for the top 100 selling entrees or main dishes.

SALES BY DAY OF THE WEEK

I first took a look at sales trends by day of the week. The three layered clusters of line graphs represent one year each, 2019, 2020, and 2021. Each restaurant branch’s data is represented by the variously colored lines (restaurant names/locations have been coded for anonymity).

For the most part, the gross sales trends tell us what we would generally expect for a mostly dine-in restaurant - weekday sales Monday through Thursday remain fairly steady and then rise substantially over the weekend. Most restaurants peak on Friday with Saturday being a close second.

Only M3 (in light teal) has a significant increase from Friday to Saturday. Since M3 is located in a popular weekend getaway destination, it makes sense that visitors would travel on Friday and use Saturday for their evening to dine out.

C1 in dark blue also catches the attention as the only branch that has better sales on Sunday than on Saturday. It would be interesting to explore this further to learn what is the reason for the increase.

SALES BY TIME OF DAY

Slicing the sales data from another direction, we can see similarly predictable results for sales volume by time of day, with the dinner and lunch rushes being represented by the two humps in the line graphs below.

A closer look at the shaded region shows some interesting changes between 2019 and the subsequent years. In 2019, the dinner rush had a clear peak at 6:00 PM for most of the restaurants. In 2020 and 2021, the sales volume at 5:00 PM is frequent comparable if not greater than the sales volume at 6:00 PM. What accounts for this shift to an earlier dinner rush?

One plausible theory has to do with the significant shift to delivery and take-out orders during various stretches of COVID shutdowns, when dine-in options were significantly restricted. If most people prefer to eat dinner at 6:00 PM, then perhaps they plan ahead by placing delivery and take-out orders earlier so that they can be ready to eat at 6:00 PM.

While the order data generally does not distinguish between dine-in vs. take-out, there are three restaurants that use a third-party service for deliveries. Because of this, we could identify delivery orders based on the delivery commissions coded into the sales. By charting the non-delivery orders (in gold) and delivery orders (in blue) against each other, the hypothesis seems to be validated - delivery orders peak at 5:00 PM rather than 6:00 PM. A similar pattern is now also visible for lunch orders, although lunch-time deliveries are significantly less popular.

SALES BY MONTH, YEAR-OVER-YEAR

We already got a glimpse of one way COVID-19 affected ordering trends by seeing its impact on delivery, but a much more dramatic picture emerges when we look at overall sales trends from 2019 through mid-2021, represented by the three line graphs below. In the first of these we see a very flat 2019 - not a lot of fluctuation up or down except for a normal summer surge in the resort-town M3 restaurant.

The attention-grabber is, of course, the roller coaster of sales in 2020. Each dip corresponds to various shut-down orders throughout the year (March, July, and November). While 2020’s graph catches the attention, what happened in 2020 is ultimately not the most interesting part of the story, but it is what we see after 2020.

When we put these graphs side-by-side, we see that there is real progress being made in terms of sales volume in 2021. The shaded band represents the range of monthly sales volumes for all of 2019. Four restaurants have beat 2019’s top performer for monthly sales at least once in 2021, and nearly every restaurant displays consistent upward growth.

As hard hit as restaurants were in 2020, this chain at least is coming out of that tumultuous year stronger than before! Here is a closer look how each restaurant’s 2021 mid-year sales volumes stack up against their 2019 numbers.

WHERE DO THINGS GO FROM HERE?

Things are clearly moving in a positive direction for this restaurant chain. Oftentimes we can get overly focused on diagnosing problems, but right now is a perfect time for them to be asking the question, “What’s going right and why?” COVID-19 pushed all of us outside our comfort zones and forced businesses to come up with creative solutions when the old way of doing things just wasn’t an option anymore. As the pandemic restrictions begin to lift and we return to some sense of normalcy, we have the benefit of hanging onto these new toolsets and strategies going forward. It is important that we identify what they are instead of simply defaulting back to what was once familiar.

ANALYZING DELIVERY PREFERENCES

PROJECT: RESTAURANT POINT OF SALE ANALYSIS

After looking at overall sales trends, I turned my attention to order-level data and menu preferences. I started by running some queries on the top-sellers for each of the three years. When looking at the results for 2019, combos (in yellow) stand out as the strong favorites, making up 8 of the top 13 top-sellers with the “Grande 1” and “Poco 1” leading the way.

Given the disruptions of COVID-19 shutdowns in 2020, I was interested to see how menu preferences might have shifted. When we stack up 2019’s top sellers against 2020’s, we can see combos losing some of their dominance, now making up just 5 of the top 13.

Order volume as a whole was understandably down from 2019 to 2020, as shown below.

But not all menu items were affected equally. Three items in particular saw minimal decreases in sales from 2019 to 2020. Remarkably, all three were some kind of burrito.

So what accounts for the strong performance of burritos in a year when overall sales dropped roughly 25% as a whole? One theory is that ordering habits shifted as a result of the restrictions on in-restaurant dining and the increased reliance on delivery and take-out. Perhaps burritos are seen as a more take-out friendly option among the restaurants’ selections. This was the theory that I set out to test.

FORMATTING THE DATA

There were some obstacles to testing this theory since the Point of Sale data generally did not distinguish between take-out and dine-in orders. Three restaurants, however, utilized a third-party service for deliveries, which meant that delivery orders at these restaurants included a delivery commission that could be used to identify them as deliveries.

And so my first task was to flag delivery vs. non-delivery orders. The delivery commission could be found as a particular discount type for these orders. After loading in the data, I ran a function to recode all discounts into a new “Delivery” variable with Y / N values.

col_list = ["OrderID","DiscountID"]
M1Deliver = pd.read_csv("Data/M1/M1OrderHeaders.csv", usecols = col_list)

def deliveryM1 (series):
    if series == "8.0":
        return "Y"
    elif series == "1.0":
        return "Y"
    else:
        return "N"

M1Deliver['Delivery'] = M1Deliver['DiscountID'].apply(deliveryM1)

The next step was to flag each time a particular item was ordered for delivery. In the original relational database, each order was tracked in the OrderHeaders table, and the individual items appearing on that order were housed in a separate OrderTransactions table, with the two being related by the OrderID variable in a one-to-many relationship.

Merging these two tables on OrderID (equivalent to an inner join) added the Y / N values from the Delivery variable to each OrderTransaction, thereby making it possible to identify every time an individual item was ordered for delivery. Finally, I isolated just those variables that would be needed to perform my analysis.

M1 = pd.merge(M1OrderTransactions, M1Deliver, on='OrderID')

M1 = M1[['LocationCode','MenuItemID_R','MenuItemText_R','Delivery']]

INDEPENDENT CHI-SQUARE ANALYSIS

The goal for this portion of the project was to identify if certain menu items were more or less likely to be ordered for delivery vs. dine-in orders. I chose to employ an Independent Chi-Square test for this analysis since it is suited for testing correlations between two categorical variables. In my case, those variables were MenuItemText_R, or the name of the menu item ordered, and Delivery, which marks each order with a Y / N value.

The first step was to use the Pandas crosstab function to create a cross-tabulation table summarizing the number of Y and N responses for each menu item.

M1_crosstab = pd.crosstab(M1['MenuItemText_R'], M1['Delivery'])

As you can see in the table preview below, each menu item is listed with two columns indicating the number of times it was ordered for delivery (Y) vs. non-delivery (N).

From here, the crosstab was loaded into the chi2_contingency function from the Stats package, which gives the following output:

There are two figures that we are interested in from these results. The first is the p-value for the Independent Chi-Square test, here showing as “0.0”. This points to our test having significance; that is, it indicates that there is a true difference in ordering habits between delivery and non-delivery orders.

The second is the array of expected values for N and Y responses. This is a wonderfully simple calculation that looks at the overall proportion of delivery vs. non-delivery orders for all menu items combined, and then applies that proportion to each menu item to calculate how many delivery vs. non-delivery orders we should expect for that item if it behaved just like the overall trend.

For instance, the first line of the array shows expected N and Y values of 289.17 (or 2.8917e+02) and 6.82, corresponding to the first menu item in the crosstab, “arroz c/ pollo.” If we view the number of actual orders from our previous crosstab, we see that there were 289 non-delivery and 7 delivery orders for this item. In other words, this particular item performs much like we would expect for delivery compared to the overall trend.

With a reported p-value of “0.0” we should expect to see some menu items significantly bucking the overall trend, so the final step was to compile these results into a more digestible format and take a closer look at the results. The table below takes the crosstab results with actual Y / N results for delivery orders and combines them with the expected Y / N figures from the Chi-Square contingency table.

Finally, I divided the actual number of delivery orders (Y column) by the expected number of delivery orders (Exp Y column) to get a quick reference for how much each item under- or over-performed expectations with regard to deliveries. Values less than 1 represent items that are ordered less than expected for delivery, and values greater than 1 represent items that are ordered more than expected for delivery.

VISUALIZING THE RESULTS

Loading the above table into Tableau allows us to observe delivery ordering trends visually. Each dot on the scatter plot below represents one menu item from one of the three restaurants where delivery data was available: C1, M1, and M5 for short. The x-axis represents the expected number of delivery orders for that item based on the contingency table from our Chi-Square test. The y-axis represents the actual number of delivery orders for that item. The further to the right (that is, along the x-axis) an item appears, the more popular that item is in general.

The dashed line, a line with slope of 1 starting at (0,0), establishes our baseline. This represents where we would expect a dot to fall if that menu item performed exactly as expected. Dots falling above this line represent items that were ordered for delivery more often than the overall trend, while dots falling below this line represent items that were ordered for delivery less often than the overall trend. The size of the dot represents to what degree that item over- or underperformed on delivery orders, or its relative distance above or below the baseline.

OBSERVATION 1: GRANDE VS. POCO COMBOS

As was seen in the first part of this article, the Grande 1 and Poco 1 combos are the dominant favorites for overall sales. But when we isolate our data, we can see that these two behave very differently when it comes to delivery orders. The Grande 1 combo is ordered significantly more often for delivery than expected, while the Poco 1 combo is a much less popular delivery choice.

To put concrete numbers to it, the Grande 1 combo is ordered for delivery 1.79 times more often than expected, while the Poco 1 combo is ordered for delivery 0.53 times less often than expected for all three restaurants combined.

What accounts for the difference in performance for the two most popular menu items? One plausible theory is that perhaps the Grande 1, a large combo, is more likely to be shared when ordered for delivery.

OBSERVATION 2: BURRITOS, FAJITAS, AND TACOS

After the Grande 1 and Poco 1 combos, the next most popular menu items are the various burrito, fajita, and taco dish options. The chart below zooms in on the lower left quadrant of our scatter plot to get a closer look at these. When we highlight the burrito, fajita, and taco items, we see right off the bat that these are generally outperforming expectations for delivery orders.

Particularly noticeable are the Macho Burrito orders for C1 and M5 standing alone near the top of our chart. The Macho Burritos stand out even against the Fajitas and Mexico City Tacos, which have roughly equivalent numbers of total orders (as represented by where they fall on the x-axis). Even so, the Macho Burritos are ordered for delivery between 22% and 38% more often than the Fajitas and Mexico City Tacos.

This observation helps to validate the theory that sent us down this line of inquiry. The question we were asking was why did the order volume for burritos hold steady in 2020, a year when overall sales were down significantly, compared to 2019. The evidence suggests that burritos are viewed as a more delivery-friendly dish, and so the shift to delivery and take-out orders during the COVID-19 shutdowns meant that burrito orders took less of a hit in 2020 than the other menu-item types.

IMPLICATIONS

What impact might these observations have on business operations? Broadly speaking, I think that it is worth paying attention to delivery trends even for a predominantly sit-down restaurant such as this client. The shutdowns of 2020 were disruptive across many industries and the impact of those disruptions is likely to persist into the future. Much like remote work became viewed as a more viable employment arrangement for many companies post-pandemic, I suspect that a large segment of the population grew accustomed to delivery and take-out in a way that they weren’t before 2020. This might open new possibilities for restaurants that don’t traditionally cater to the delivery market.

Insights like the ones above can therefore be harnessed to more effectively target delivery customers. For example, the current online menu for this restaurant mimics the physical dine-in menu in terms of layout, with burritos, fajitas, and tacos appearing in the blue menu categories shown at right. This means that potential customers have to scroll through half or even two-thirds of the menu before seeing the burrito and fajita options, respectively. It may be desirable to feature these popular delivery items more prominently on the online menu so as to draw the customers’ attention when ordering.