Evaluating Market Size: TAM, SAM, SOM

18 min readNov 11, 2021

Preface

In this article, I’m going to share a methodology for evaluating the market size for a fictitious B2B SaaS product named “Geospatial Crop Intelligence” (GCI for short). This same approach can be used to evaluate the opportunity for launching a new product at an established company or for a startup that has a single product and needs to quantify the market size as part of a pitch to VC’s or Angel Investors.

Before we start, it’s crucial that I make an important point: every market-sizing exercise leads to numbers that are a wag. Nobody has a crystal ball. Successful companies usually start with a single product that, upon seeing some success, leads to an expanded product line. As that product line grows, it leads to ancillary products and services that significantly expand the market size for the company overall. Often, the follow-on products were not anticipated in the early days and would never have been taken into account when evaluating the market. A good example of this is Amazon’s expansion into web and data services, which now generates 15% of their revenue.

A second word of caution: EVERYBODY that you show this to will disagree with some aspect of it… or they’ll suggest that there’s a better way of doing it for your specific product. That may be true. Thus, it’s critical that you carefully think through and consider all of the options; compile a pros and cons list for each of the different methodologies that make sense. That way, you can decide how best to proceed. And, if you do use this approach, you explain your reasoning and why you chose it over the alternatives when the conversation inevitably comes up (usually at the worst possible time in the middle of your pitch or presentation).

Introduction

Our objective is to evaluate the size of the market opportunity, expressed in dollars (USD) of potential revenue, for the Geospatial Crop Intelligence (GCI) product line (including existing products and those that will be developed in the near-term). “Near-term” is defined here (for convenience) as the remainder of 2021 and the entire calendar year of 2022.

We’ll use a common framework for doing so, starting with an approximation of the Total Addressable Market (TAM), then derive our Serviceable Addressable Market (SAM) from our TAM, and finally quantify our Serviceable Obtainable Market (SOM). We will estimate TAM and SAM using a top-down approach and then utilize a bottoms-up strategy to quantify SOM in order to create realistic estimates and to ensure that the two approaches agree (within a reasonable amount of error) with each other.

The SOM will be evaluated using two different approaches. The first approach is based on the obtainable market via Direct Sales (aka “Inside Sales”). The second method uses the Bass Diffusion model, which is applicable to forecasting new product and technology adoption in a market; this approach was included because it is demonstrative of potential growth using a channel distribution model (as opposed to exclusively relying on a Marketing Funnel and Direct Sales). Ultimately, the actual/realized obtainable market opportunity will be a function of how suitably our products are built for large-scale distribution and consumption, how fervently we pursue channel distribution versus other go-to-market options, the level of product-market fit that we achieve, the rate at which we develop and deploy value-added products, our marketing mix (STP), the annual growth rate of the overall Geospatial Intelligence market, the competitive landscape, and myriad other factors.

Keep in mind that the result of any modeling exercise results in an estimation of the target, the accuracy of which is dependent on many things including but not limited to: the quality of our data (GIGO principle), the rigor of our analyses, and the assumptions that we make. Given that assumptions are subjective, they tend to be a point of contention. This can be handled by building flexibility into the model and expressing assumptions as parameters that can be adjusted in order to create modeling scenarios such as: baseline, optimistic, and pessimistic. This analysis will attempt to use conservative assumptions so as not to overestimate the market size.

Step 1: Total Addressable Market (TAM)

In general, the Total Addressable Market is meant to quantify the total demand for a product from all potential purchasers of that product in a one-year period. It is the highest level of market sizing metric. Think of TAM as the theoretically maximum amount of revenue that a product could generate annually in a monopolized market.

TAM is generally expressed as an amount of currency spent on the product of interest annually. In some cases, it may be more appropriate to express TAM in other units (e.g. potential number of users of a social network) but for our purposes, expressing it as $USD spent per year is appropriate.

TAM necessarily includes a geography consideration. A best practice is to include all geographies in which the product is (or could be) sold. However, due to practical considerations (data availability, sales and marketing plans, IP export laws, etc.) it may be more appropriate to select a specific geography for which the evaluation will be made.

In this evaluation, we are going to define GCI’s TAM as the following:

All companies and business entities for which Earth Observation (EO) and Geospatial Intelligence (GI) data and/or derivations (of EO/GI data) could provide value or a competitive advantage.

“Value” can take many forms including: increased revenue, improved margin, reduced COGS, strategic knowledge of competitors’ actions or inventories, ability to forecast something (weather, market prices, bottlenecks, etc.), supply-chain insight, and many more. The key thing here is that it doesn’t matter exactly what the value is, it just matters that it is derived from EO/GI and is a core competency of the GCI team, which includes Product, Engineering, and Data Science personnel.

Having defined GCI’s TAM, our goal now is to quantify it. Given that GCI is a B2B product, we are going to start with a universe of companies, each of which will then be qualified in order to remain in our TAM. Our universe will be based on the Russell 3000 index, which includes 3000 of the largest publicly-traded companies incorporated in the United States. The combined market capitalization of the companies in the index comprises roughly 98% of the public equity market in the U.S. and the constituents represent a wide cross-section of industries, sectors, and company sizes. Wikipedia: Russell 3000.

Ideally, our analysis would include all public and privately held business entities worldwide. However, this exercise is meant to be practical in nature and thus we are balancing our level of effort (LOE), cost, and data availability with the desired fidelity of the result. By using the selected universe, we are being conservative in our overall estimate, which can be revised/expanded in the future if necessary.

Our methodology for quantifying TAM will be to estimate how much is spent annually on EO and GI data by companies fitting our definition above. We start with a list of all companies in our universe and then iteratively sort and filter it by Sector Name, Industry Name, and GICS Name while removing all companies that do not fit our TAM definition; i.e., we remove all companies that we don’t think could gain anything by using EO/GI data. After doing so, we are left with a list of 1003 companies, or roughly 30% of what we started with.

Next, we move on to estimating the annual spend on EO and GI by each of the companies remaining on our list. This data isn’t readily available from most companies’ SEC filings or annual reports (10-K reports) so we must back-it-out from the data that we can acquire. Many companies do report their annual R&D spend; however, this number often includes M&A activity as well as new product development budgets, and a host of other expenditures. Thus, it isn’t reliable or useful for our purposes.

We will perform the following calculations:

Calculate Gross Profit as: Gross Profit = Revenue - COGS
Calculate Operating Expense (Opex): Opex = Gross Profit - EBIT
Estimate IT Spend as being a percentage of Opex: IT Spend = %IT * Opex
Estimate 3rd-Party Data Sourcing Spend as being a percentage of IT Spend: 3rd-Party Data Sourcing Spend = %Data * IT Spend
Estimate GI Data Spend as being a percentage of 3rd-Party Data Sourcing Spend: GI Data Spend= %GI * 3rd-Party Data Sourcing Spend

Figure 1: ADM’s Consolidated Earnings Report from their 10-K for FY2020. Downloadable from ADM’s website here: ADM SEC Filings

From ADM’s Consolidated Earnings Report (or database containing such info):

Gross Profit = 64,355 - 59,902 = 4,453
Opex = 4,453 - 1,883 = 2,570
IT Spend = 0.05 * 2,570 = 128.5
3rd-Party Data Sourcing Spend = 0.28 * 128.5 = 36.0
GI Data Spend = 0.05 * 36.0 = 1.8

Note: Units are millions of USD.

Figure 2: Spreadsheet calculation of ADM’s GI Data Spend

Thus, we arrive at a final estimated value, $1.8M, of ADM’s spend on 3rd-party GI data for 2020. This estimation methodology (as well as the %Data value being estimated as 0.28) is based on an article published by McKinsey. Note that we estimated the %GI value as 0.05, which equates to a company allocating their total 3rd-Party Data Sourcing budget to twenty different providers (5% of which is assumed to go towards EO/GI data).

This methodology is repeated for all companies in our list/spreadsheet in order to estimate each of their annual GI Data Spend using financial records and data from the same year (2020) for each company. Then, we sum the GI Data Spend for all companies in our TAM universe and arrive at the following: $1.5B.

A reasonable objection at this point may be “how do you know that all companies in the list actually spent money on GI Data in 2020?” The answer is that we don’t know that. However, our list has already been filtered down to 1003 companies (from the original 3000) based on the likelihood that they are using GI data and that it provides value to them. A sanity check of this number can be performed by comparing it to publicly available data on total annual GI Spend, which for 2020 was approximately $60B (references: BusinessWire, Globe NewsWire, PR NewsWire). So, our $1.5B TAM estimate seems reasonable; logically, it’s conservative due to our universe selection and the assumptions that we made.

Geospatial Crop Intelligence’s TAM is $1.5B

Step 2: Serviceable Addressable Market (SAM)

SAM can be thought of as the intersection of our products and the addressable market; it’s where product-market fit exists -the portion of the TAM that our suite of products are uniquely suited to address or to “service.” It is generally a filtered selection of the TAM and should be represented in the same units: annual dollars spent by all potential customers contained within it. In the literature, SAM is also referred to as “Served Available Market” and “Serviceable Available Market,” all of which mean the same thing.

In this evaluation, we’re going to define GCI’s SAM as the following:

The entities within our TAM that have a pain point or business-area of optimization that is specifically targeted or improved by the GCI team’s core competencies or existing products.

For consistency, “existing” will be inclusive of planned releases in our near-term product development roadmap. The GCI team’s core competencies are considered to be the following:

Supply & demand modeling for agricultural commodities
Agricultural crop insights (e.g. vegetation indices; crop acreage, yield, and health)
Supply chain and logistics: monitoring, modeling, and optimization
Extreme weather modeling and forecasting

In order to perform this evaluation, we will start by applying a categorical flag, “GCI_SAM,” to each of the companies in our TAM. The flag will be represented as an integer value and its purpose is to indicate the level of product-market fit between the existing GCI product and the needs of each of the TAM constituents. GCI_SAM is defined as follows:

After applying the flag, we then filter the results and retain only companies which have an GCI_SAM flag equal to 1 or 2. In essence, we’re selecting only those companies for which we already have an applicable product or for which we could easily deploy a product based on something very similar that we have done in the past for other clients. The result is a list of 219 distinct companies, each being a potential future customer.

Figure 3: Application of GCI_SAM flag to downselect from companies in TAM.

The next step is to refine the revenue estimation for the 219 companies remaining in our SAM. This is an optional step (hence “refine”) that can be done to improve our estimation. We didn’t perform this step on our TAM because the constituent list (of 1000+ companies) was too large to make it feasible.

We will apply a factor called “GI_RELEVANCE_SCORE” to each of the companies in our SAM. This factor is meant to be a relative measure of how relevant our products are to each company within our SAM. The factor will then be used as a multiplier to the baseline GI_DATA_SPEND value in order to capture the reality that companies within our SAM will have a dispersion in the amount that they will spend on EO/GI-derived insights based on how relevant it is to their business fundamentals and outcomes. GI_RELEVANCE_SCORE is defined as follows:

Table 2: GI_RELEVANCE_SCORE key-value pairs.

To look at it another way, previously we assumed that each company in our TAM spent 5% (= %GI) on EO/GI data. Now that we have down-selected to a much smaller number of companies for which this type of data is especially applicable, it’s logical that those companies might allocate a larger percentage of their total 3rd-Party Data Sourcing budget to EO/GI. As a simple approximation of this, we’ll simply multiply %GI by GI_RELEVANCE_SCORE (for each company), which means that there will be three categories of budget allocated to EO/GI data by companies within our SAM: 5% (baseline), 10%, and 15% (maximum allocation).

Figure 4: Application of GI_RELEVANCE_SCORE and consequent refinement of GI Data Spend.

This computation is performed for all companies in our SAM in order to refine the estimate of their annual GI Data Spend. Then, as we did for TAM, we sum the GI Data Spend for all companies in our SAM and arrive at the following: $817M. This is our SAM.

Geospatial Crop Intelligence’s SAM is $817M.

Step 3: Serviceable Obtainable Market (SOM)

SOM is meant to describe the potential market share for our products subject to the real-world constraints that we have with respect to sales, marketing, and distribution. It would be unrealistic to assume that we could capture 100% of our SAM in one year (or even in five years) because there are costs associated with marketing and selling a product; moreover, there is competition in the marketplace and our potential customers have alternatives.

Whereas TAM and SAM leveraged a top-down sizing approach, evaluating SOM requires a bottoms-up approach and will use our recent historical sales performance and market share to estimate future performance. Like TAM and SAM, the units of SOM will be in dollars; however, it is customary to sum the estimates for the next five years (rather than one year) and provide this cumulative value as the estimate of SOM. Also, the value of SOM will take on a new meaning: it’s an estimate of OUR potential revenue from products sold rather than an estimate of total potential spend of an aggregation of similar companies. One of the latent goals of taking a bottoms-up approach to evaluating SOM is to provide a sanity-check on the top-down sizing approach; i.e. we’d like for the SOM estimate to be a reasonable portion/percentage of our SAM.

We’re going to employ two approaches to estimating SOM. Each is meant to quantify a different method of sales/distribution and growth engine.

First (in Step 3a below), we’ll estimate our SOM using a Direct Sales approach. This will be based on historical sales and existing sales-pipeline data. Second (in Step 3b below), we’ll estimate our SOM using a product/technology adoption framework that utilizes the Bass Diffusion Model and may be more indicative of potential growth under a channel partnership/distribution scenario.

Step 3a: SOM — Direct Sales

For this step in the process, I’m going to use some fictitious sales data that I generated. In real-world practice, you should pull this data from Salesforce, Zoho, SAP, HubSpot, or whatever CRM tool your company uses.

First, we acquire a list of all leads for the GCI product from our CRM tool. Second, we look up (via web search), last year’s revenue for each of the companies in the list of leads. Note: this is the revenue that each of those companies generated for themselves last year. As an example, if Whole Earth Brands was on the list of leads, I’d Google “Whole Earth Brands revenue 2020” and find that their revenue in 2020 was approximately $275M. Third, we sum last year’s revenue for all of the companies on our list of leads. For this example, let’s set that value equal to $2.712B.

Next, we need to convert the overall summed revenue for the companies in our list of leads to an estimate of their annual spend on EO/GI products. We do this by creating a conversion factor based on our SAM spreadsheet (Figure 4 above). The conversion factor is equal to the total revenue of all companies in our SAM ($2,490,188M, which is computed as the sum of the “revenue” column in Figure 4), divided by the total estimated GI Data Spend from our SAM analysis ($817M, which is computed as the sum of the “gi_data_spend” column in Figure 4). Basically, we’re just trying to correlate total corporate revenue to GI Spend in an easy way. The conversion factor’s value is 0.0328%.

Now, we multiply the summation of our leads’ revenue in 2020 ($2.712B) by our conversion factor (0.0328%) in order to arrive at a value of $890M, which represents the extrapolated value of GI Data Spend that was in our pipeline in 2020. We’ll call this value our GCI pipeline value.

Next, we need to estimate our sales conversion rate. For this, we can use historical sales data for this product (if some version of it has been sold in the past) or we can use sales data from similar products that we’ve sold in the past. For this exercise, let’s assume that we’ve been selling a simplified version of this product for the past few years and that last year, the revenue generated from sales of the GCI product was $2.785M. Dividing our actual revenue last year ($2.785M) by our GCI pipeline value ($890M) gives us an estimate of what percentage of the potential revenue in our pipeline was closed due to direct-sales activity (= $2.785M / $890M). This equates to approximately 0.313%.

At this point, we can do a quick sanity check on our previously calculated SAM. If we sum the revenue for all companies in our final SAM list, we find that it’s equal to $2.490B, which is comparable to the summation of our leads’ revenue of $2.712B (from our CRM tool).

Figure 5: Calculation of conversion factor (red), percentage of sales pipeline closed (blue), and sanity check of SAM vs pipeline (green).

Next, we create a pro-forma table (columns by year) for the next five years (2022 to 2026) with three rows including: 1. Estimated GI Data Spend for all companies in our SAM, 2. Estimated percentage of pipeline deals closed, and 3. Estimated revenue from deals closed (equal to the product of the first two rows). We’ll also include a column for 2021 as a baseline for future years’ estimates.

Figure 6: Pro-forma table of revenue estimates for Direct Sales approach over a five-year period from 2022 to 2026.

We start 2021 with the total Estimated GI Spend, and each year thereafter we multiply the previous year’s GI Spend by the expected CAGR of the GI market. Note that this value, 14.8%, is the average of three sources (references: BusinessWire, Globe NewsWire, PR NewsWire). We do this to reflect the fact that the market is growing quickly and is projected to continue growing at a high rate over the next five years; accordingly our TAM, SAM, and SOM are likely to grow over this period. We also start 2021 with our actual deal-close-percentage (0.313%) and assume that it will grow slightly over the years as we refine/improve the mechanics of our sales process. Note that we have assumed a CAGR for direct sales close-rate of only 1.25%, which is meant to reflect improvements but not additions; e.g. if we increased the size of our sales force (by hiring more Sales Executives), this number should be increased to reflect that we will have the potential to close a larger percentage of our pipeline annually.

In order to compute our SOM from the pro-forma table, we simply sum the estimated revenue from 2022 through 2026; this amounts to $20.5M over the next five years. Thus, our SOM for the direct-sales approach is $20.5M, which represents approximately 2.5% of our SAM (a realistic portion when comparing SOMs and SAMs).

Geospatial Crop Intelligence’s SOM via the Direct Sales approach is $20.5M

Step 3b: SOM — Adoption Model

The Bass Diffusion Model provides an intuitive forecasting methodology to describe and quantify how new products and technologies are adopted by a market. It’s based on the premise that potential adopters can be classified into two groups: innovators and imitators. The innovators start the process of adoption and contribute to growth in the early stages of a product release, and the imitators gradually begin adopting and contribute to growth as the original population of innovators wanes (due to them having already adopted); eventually, when there are a negligible number of innovators or imitators remaining, the growth of the product in the market declines.

A simple description of the model and the concept of adoption via diffusion is the following:

“The probability of adopting by those who have not yet adopted is a linear function of those who had previously adopted.”

A mathematical derivation yields a first-order ordinary differential equation, which has an explicit solution. It can be expressed in many different forms including the following:

S(T) = P(t)[M - Y(t)] where: 
P(t) = p + (q*(Y(t)/m))
P(t) is the probability of adoption at time t, given not yet adopted
[M-Y(t)] gives the number of potential adopters remaining in the market at time t
Y(t) is the cumulative number of adoptions at time t
p = coefficient of innovation
q = coefficient of imitation
M = total number of potential adopters

The expression is sensitive to the values chosen for p and q (coefficients of innovation and imitation, respectively); thus, ideally, these coefficients should be determined using regression techniques on empirical data of historical adoptions for similar products. Having said that, because the Bass Diffusion Model has been widely used for several decades, the values of p and q can generally be ascertained by searching through existing literature, and then adjusting for the desired level of conservativeness in the evaluation. The table below lists typical values for the coefficients as well as the values chosen for our evaluation.

Table 3: Bass Diffusion Model coefficient values.

References: Bass-3, Bass-4, Bass-5, Bass-6. The Visual Basic script(s) used to compute the diffusion model adoption values were used from the open-source code found here: Bass Model Excel Visual Basic Functions

Similar to our approach for Direct Sales, we can create a pro-forma table that computes the estimated revenue by year (from 2022 to 2026) according to the Bass Diffusion Model. The Estimated GI Data Spend is the same for each year as it was in our Direct Sales approach but instead of simply using this value in the Bass Model, we have debited cumulative past sales from Direct Sales to be consistent with our real-world operating assumption that we will continue to perform direct selling activity, which will reduce the potential pool of adopters available to our Bass Model. To be conservative, we have debited the Direct Sales up to and including the present year, which essentially means that we assume those sales happened immediately at the start of the year. Similarly, we have reduced the pool of adopters in each year by the cumulative number of past adoptions that occurred due to the Bass Diffusion Model itself.

The SOM estimated from the Bass Diffusion Model is simply the sum of the five year revenue estimates, which amounts to $90.0M. In the table below, the estimates for both approaches -Direct Sales and Bass Diffusion -are shown both individually and combined; combined, they amount to $110.6M. This value would be most representative of our SOM if we were to pursue both distribution methods in earnest simultaneously. It is thought that the most likely outcome will be the continuation of Direct Sales, and a gradual introduction of adopters/imitators due to channel partnership distribution (“MCDA”). Thus, the most likely revenue estimate will be somewhere in between the values forecasted for Direct Sales and the Bass Diffusion Model.

Figure 7: Pro-forma table for Bass Diffusion Model adoption estimates (red); and combined Bass Diffusion and Direct Sales revenue estimates (blue) over a five-year period from 2022 to 2026.

The charts below graphically depict estimated revenue over the five-year forecast horizon (2022 to 2026) for both approaches.

Figure 8: Charts of pro-forma values for Bass Diffusion Model adoption estimates (red); and combined Bass Diffusion and Direct Sales revenue estimates (blue) over a five-year period from 2022 to 2026.

Step 4: Summary of Results

Everything above can (and should) be summarized in a single graphic like Figure 9 below, which can be created in Google Slides, KeyNote, or PowerPoint. Stakeholders, business owners, and investors only care about the takeaways; they’re not interested in the minutiae and details of the calculations. So, summarize it all on a single slide, and put the supporting material in appendices that you can refer to if need be.