Segmentation of the trading space

Model of retail space segmentation

Automatically determines to which business segments a particular outlet (checkout) belongs to

  • Implemented into the OFD data processing loop
  • Allows you to create panel samples of outlets with controlled quality
  • Allows you to analyze individual markets dynamics (FMCG, Beauty, Fashion, etc.)
  • 0,75

    guaranteed accuracy of the model, which is checked by Algorithmics specialists

  • 10 000+

    marked sales outlets - training sampling size

  • ~140

    segments in the finished single-leveled list that we use to work with the model

We have a system for preparing, managing and balancing training samples, a staff of specially trained markers and a distributed computing infrastructure - this allows us to separate large data volumes of hundreds of millions of lines.

  • At the input

    data for a specific month

  • Segmentation model

    data for a specific month
  • At the output

    monthly determination of sales outlets segments

Our model uses many features for prediction:

  • Sales outlet’s econometric indicators (average receipt, number of receipts per day, sales value, etc.)
  • Sales semantics in receipts (clearing, lemmatization, text corpus, working with tokens, etc.)
  • OKVEDs* of sales outlets (main and additional)

*Russian Classification of Economic Activities or Russian National Classifier of Types of Economic Activity

How do we solve problems?

The “Smart Cashiers” company has a large amount of cash receipts data (hundreds of thousands of cash registers, daily data updates, billions of transactions).

ПCompany sets us the following task: to determine the sales outlets types (grocery store, car service, spa, etc. - 150 types in total) based on sales data from cash receipts in several outlets.

We conducted an EDA and found a number of limitations that we were able to overcome:

It is impossible to understand from sales outlets which segment of the business it belongs to.
To do this, we have developed an ML model that determines the type of outlet based on econometric data and the textual name of goods in receipts.
But the names of the goods in the receipts are indicated by the sellers themselves, the names are not uniform, not structured, in 50% of the positions there is no name at all. How does the model get such high accuracy?
Sales data in receipts are firstly carefully prepared, the semantics are analyzed for various problems (a variety of descriptions of goods and services, incorrect data entry, anomaly detection, etc.), and then the data goes through a text cleaning algorithm (cleaning texts from noise digits, forming "white" list of words, etc.)
And in the obtained data, the sample of outlets is not balanced relative to the general population, the regions’ indicators are shifted relative to the whole country
To increase the data representativeness and accuracy, as well as to eliminate possible errors, we calibrate the data (adaptive RIM-weighting), extrapolate sales on sampling (city, region, country) and apply our dynamic data quality management system before running the model.

As a result, our segmentation model for the “Smart Checkout” company showed the segment determination accuracy from 86% to 97% depending on the sales outlets’ type (150 types of outlets in total).

And now, with the help of sales outlets segmentation, we can build sales analytics in the channel, category trends, analyze the individual markets dynamics (FMCG, Beauty, Fashion, etc.) in order to solve new problems.

We will solve your problem too!