How I Used K-means clustering at Invoicemate

HOW I USED K-MEANS CLUSTERING at INVOICEMATE TO INCREASE CONVERSIONS?

After the launch of the invoicing platform for Invoicemate (https://www.invoicemate.net/),

I needed to understand the user behaviors that led some buyers to opt for the ‘Invoice Financing’ platform as well. Also, the intent was to decipher the use of some important Invoicemate features.

The Problem

Segmentation of important metrics by particular characteristics (e.g. industry or portfolio size) is easy and facilitates answering some product-related questions. Nevertheless, the product teams can face “analysis paralysis,” when some get complicated and narrowly defined (e.g. average daily invoicing/transaction volume by industry, business size, and city).

The Solution

To develop a sound understanding of the user, I built a K-means clustering model. This helped me pinpoint categories of users based on their similarities. I chose the “representative” from each category and tried to build an ‘archetypal customer’ for a better understanding of my customer. This helped to improve the user experience accordingly.

The Method, or as I say the ‘Modus Operandi’

An often-used method to deploy unsupervised learning to any set of data is ‘centroid-based clustering’. Clustering takes an input of mass sets of data/observations and divorces them into groups based on commonalities.

The data executed for the K-means model comprised the seller-level information (number of locations, number of employees, number of invoices, etc.). Additionally, it included the product usage trend (invoice submissions, transactions, registration, and creating/editing/deleting items). Data was included from both ‘the invoicing’ and ‘invoice financing’ verticals with a platform usage time of at least 30 days.

The numerical data was accumulated in the following way:

Average over the lifetime (in invoicing and invoice financing modules ) of a seller
The sum of the seller’s first 30-day period (for all motives and purposes, their trial period)
The maximum amounts over the lifetime (in a free trial or paid subscription state) of a seller.

Using the Elbow Method to determine the number of clusters

What should be the number of clusters when deploying the K-means? I used the Elbow method to serve that purpose. It is the percentage of variance explained as a function of the number of clusters. According to the Elbow method, the number (k) of clusters at which adding another cluster (k+1) only results in a small marginal gain in the percentage of variance explained.

While Figure 2 above showed that 3 clusters would likely be ideal, I considered both three and four clusters for my model. After comparing variance for individual features in a three and four-cluster model, I ultimately decided four was the most representative.

I cleaned my dataset by removing strongly correlated features (such features make it difficult to interpret the model’s results), and the features that were free of variance between clusters.

Interpreting the Results

One important requirement of K-means clustering is that data needs to be sorted in a set of clusters. Nevertheless, after executing the model and getting the clusters, I used categorical data (industry, business size) for a better contextual understanding of clusters. It helped me understand the reason for data segregation in particular patterns.

My team used Python (pandas) to filter the data, construct the model, and accumulate the final feature set of clustering data, cluster assignment, and demographic data. The finalized dataset was imported into Tableau for better visualization.

From then onwards, I studied both the numerical and categorical characteristics of each cluster to figure out the ‘seller archetype’ from each cluster.

Final Results

These are the results of the ‘Invoice Financing’ data (example data)

A major chunk of Invoice Financing sellers are in Cluster 1, while other clusters represent small percentages. For an improved understanding of how these clusters are segregated, I segmented the dataset into the aforementioned demographic data.

The below examples depict the reasons for variations in the clusters in terms of some demographics.

Size of Business

The Cluster 3 holds larger sellers. This exercise helped develop product features

for such a complex user base.

Invoicing Conversion

It is pretty apparent Clusters 3 and 4 stand a fair chance to convert to ‘Invoice

Financing’ after trial. So, in the context of feature value, they are getting the most

value from a free trial of Invoice Financing. I got 47% higher conversion through

targeted marketing campaigns for this cohort of sellers.

Sellers in clusters 3 and 4 show the highest conversion rates for ‘Invoice Financing.’

3. Feature Interaction

I wanted to understand how the cluster interacted with different features of the

platform. It was revealed that the cluster 2 sellers interacted with the items daily.

4. Average Invoice Submission Per Day

Cluster 3 took the lead when it came to average invoice submission per day.

Archetype Sellers

With the above example data, we can highlight key qualities observed in each cluster when explaining these mock findings to business stakeholders: With the help of the above sample data, I highlighted the important traits observed in each cluster for presenting them to the leadership and stakeholders.

Cluster 1 contains smaller retailers distinguished by their interest in using Invoice Financing. However, they had the lowest levels of feature interaction and conversion.

Cluster 2 had more than average levels of interactions with features, which is commonplace for a garments store (common vertical in this cluster).

3. Cluster 3 is representative of only 11% of Invoice Financing sellers, but they were the most active in terms of invoice submission perspective. They might have a dedicated resource justifying more product adoption.

4. Cluster 4 had 27% of sellers, with electronics & accessories being the most common vertical. This cluster had the highest engagement with key product features.

Conclusions

Based on this mock data, the clustering model gave four distinct clusters of Invoice Financing Verticals. The results gave important leanings to impact the product team’s decision-making and roadmap:

Cluster 3 has larger sellers who are frequently using the Invoicemate platform at different locations. This could explain:

⏯️The product should be differentiated enough from Invicemate’s free offering to an extent to attract more complex sellers who wouldn’t have considered Invoicemate in the past.

⏯️More resources need to be allocated for ‘Invoice Financing’.

Cluster 4 contains almost twice the sellers as Cluster 3. Amid their high engagement with features, high average invoice submission size, and high conversion rate from free trial, Cluster 4 should dictate the feature prioritization for the future.

In a nutshell, using the K-means clustering for a new product feature like ‘Invoice Financing’ helped my product team better understand the customer. An added advantage of this technical analysis is that once this model has been constructed, it can be repurposed in the future for new product features/enhancements as well and get more conversions.

Please feel free to give suggestions/feedback. I look forward to it. 😀

Page updated

Google Sites

Report abuse