Database analysis

In addition to looking at external sources of information, Market Intelligence can also be gathered by linking and cross-analysing existing database information. Many companies now have vast stores of information in their databases that covers customer behaviour over time, but often this is locked away and difficult to use.

Database analysis is the process of extracting this data, cleaning it, potentially merging it with other data and performing a statistical analysis to better understand customers and customer behaviour.


Database analysis extends across a wide range of skills from basic analysis to understand who your customers are and what they buy, to more complex statistical analysis, data-mining or using tools such as Geographic Information Systems (GIS). Generally we are trying to understand customer information in terms of  recency, frequency and value, looking to identify key groups of customers and mechanisms for increasing sales.

The basic procedures for analysing database information are:

For on-going database analysis, automating as many of these tasks as possible is important, both to ensure that the data is of the same quality for each run of analysis and to save time and effort repeating the same work with each data snapshot.

Once you have analysed the data, it is then a question of what response you should make, but these are more issues about developing market strategies.


Extracting information

Many databases grow and develop through use and contingency and consequently the first stage of extracting the data can be complicated, not just from the scale of the task, but because the database itself is poorly documented, data is missing or has been moved.

Surprisingly enough, obtaining database information can be one of the most time-consuming tasks involved in database analysis. This isn't necessary because it is difficult, but most information resides on transaction databases that are continually being used, updated and changed as orders come in and go out.

Often obtaining the data needs time to be scheduled and a spec to be written to pull the necessary information out without interfering with the day-to-day running of the database.

Once the data has been extracted it is normally held in another database for analysis. In sophisticated systems this second database is also known as a datawarehouse, or for smaller amounts of data a datamart. However, it is not necessary to have a datawarehouse in order to perform database analysis and much can be carried out on a PC using simple tools.

Much database analysis is on an ad hoc basis, so a single extract from one point in time is analysed and used for modelling. However, it is important that models and analysis is reviewed regularly. Using out-of-date models may be worse than using no models at all.


Cleaning

Once data has been obtained, it normally has to be cleaned for statistical analysis. Many databases tend to build up inaccuracies over time. For instance as addresses change, postcodes are entered incorrectly, or there may be duplication of records again often caused by mistaken data entry.

It is also often the case that the data has to be simplified or re-grouped. For instance each product on a database will have a separate product code, but for analysis several individual products will need to be grouped together.

The process of cleaning can be automated to a certain degree including de-duplication and cleaning up missing or bad data, but often there is an element that needs doing by hand. Typically changes done for analysis should not be repeated on the main transactional database to avoid losing data integrity.

Where data extraction and analysis is a routine task as is often the case for large datawarehousing projects, cleaning will need to be fully automated to ensure accurate repetition of the cleaning process and for speed purposes.


Merging

Once the data is cleaned it can be merged with other data sources. For instance, many organisations actually have a number of different databases that need to be combined before analysis can take place. Alternatively, external sources can be used in order to allow for analysis by industrial classification for instance.

Merging again is not entirely straightforward as some allowance may be necessary for the same customer having a different name on different databases. For instance Acme Building Contractors might also be known as ABC. Consequently, there may also be a second period of cleaning necessary once the data has been merged.

It is common on consumer sets to add classification data from external agencies such as the ACORN or MOSAIC geographic classification systems or to link in external data from consumer data companies such as Experian.


Analysis

There are many different types of analysis that you can carry out on the data from the database. The most common analysis is normally to rank customers by value to see who the most valuable customers are and what their purchasing characteristics are (also known as pareto or decile analysis).

The standard database measures are recency, frequency and value. So who bought in the last month, 3 months, 6 months? Who has bought once a year, twice a year etc? How much did they spend?

The aim of the analysis is to try and identify those people who need contacting, or those people who would be most likely to respond to a particular communication. For instance, chasing up lapsed customers or making more offers to frequent buyers. In particular you want to identify who are the core purchasing groups and who are those that are most likely to respond to advertising or communications.

More sophisticated analysis starts to look at cross-sale rates (the % of buyers of screws that also bought nails), basket size and common product bundles. The aim being to increase the amount each customer buys. Where there are large numbers of products (eg supermarkets), the range and number of combinations can make such analysis extremely complicated.

At a more sophisticated level, the data can be used to attempt predictive modelling. For instance looking to see which customers would be most likely to respond to a piece of direct mail or a new product launch. The aim is usually to reduce the size of your mailing whilst maximising the number of people who will respond and so reducing the cost per response and increasing the value per response.

In addition, the database data can be used for segmentation. A major difference between segmentation via the database as opposed to market research segmentation is that the results can be marked back onto the database, so that each customer is labelled with their segment. This means that if you need to mail a particular segment from the database this is entirely possible, whereas for market research you are typically taking a second level guess. The difficulty is you are normally limited in the number of variables you have available for segmentation.

For help and advice on the effective use of database marketing contact info@dobney.com