Database analysis
In addition to looking at external sources of information, Market Intelligence can also be gathered by linking and cross-analysing existing database information. Many companies now have vast stores of information in their databases that covers customer behaviour over time, but often this is locked away and difficult to use.
Database analysis is the process of extracting this data, cleaning it, potentially merging it with other data and performing a statistical analysis to better understand customers and customer behaviour. In very large businesses such as retailers, there are often teams of analysts drilling into the database and producing live data for market segmentation, targeting of offers and a range of other business intelligence uses.
Database analysis extends across a wide range of skills from basic analysis to understand who your customers are and what they buy, to more complex statistical analysis, data-mining or using tools such as Geographic Information Systems (GIS). Generally we are trying to understand customer information in terms of recency, frequency and value, looking to identify key groups of customers and mechanisms for increasing sales, looking to identify clusters of customers who might have similar needs and looking for predictors of behaviour.
Many businesses have customer and contact databases, but it is not uncommon for there to be many separate databases in the organisation each holding different information. In some cases a unified database system is used which makes analysis easier, but this is counterbalanced by the occasional need to pull and use subsets of data which might be quicker and faster done separately from the main customer files, particularly for marketing departments or where responses are being measured. The IT overhead of maintaining the database sometimes slows down the actual use of the database for practical tasks.
For an analyst, the basic procedures for analysing database information are:
For on-going database analysis, automating as many of these tasks as possible is important, both to ensure that the data is of the same quality for each run of analysis and to save time and effort repeating the same work with each data snapshot.
Once you have analysed the data, it is then a question of what response you should make, but these are more issues about developing market strategies.
Extracting information
Many databases grow and develop through use and contingency and consequently the first stage of extracting the data can be complicated, not just from the scale of the task, but because the database itself is poorly documented, data is missing or has been moved.
Surprisingly enough, obtaining database information can be one of the most time-consuming tasks involved in database analysis. This isn't necessary because it is difficult, but most information resides on transaction databases that are continually being used, updated and changed as orders come in and go out.
Often obtaining the data needs time to be scheduled and a spec to be written to pull the necessary information out without interfering with the day-to-day running of the database.
Once the data has been extracted it is normally held in another database for analysis. In sophisticated systems this second database is also known as a datawarehouse, or for smaller amounts of data a datamart. However, it is not necessary to have a datawarehouse in order to perform database analysis and much can be carried out on a PC using simple tools.
Much database analysis is on an ad hoc basis, so a single extract from one point in time is analysed and used for modelling. However, it is important that models and analysis is reviewed regularly. Using out-of-date models may be worse than using no models at all.
Cleaning
Once data has been obtained, it normally has to be cleaned for statistical analysis. Many databases tend to build up inaccuracies and duplications over time. For instance as addresses change, postcodes are entered incorrectly, or there may be duplication of records sometimes caused by mistaken data entry, but more often than not, because customers have changed and duplicate records have been created (in a typical business-to-business database 20-25% of the data will be out of date after a year just because of people changing jobs).
Deduping is an important, but sometimes challenging task. Usually data is checked by name, address (eg postcode) and telephone number to see if there are duplications or matches between records. This task can be automated, but even automated systems are not 100% perfect and some level of decision is required to decide which record should be kept and which discarded. In smaller databases of up to 20,000 records some level of 'eyeballing' can be done to check the quality of the dedupe. In larger datasets this is much more difficult.
When looking at aspects such as purchase histories, it is often the case that the data has grouped up and reclassified. For instance each product on a database will have a separate product code, but for analysis several individual products may need to be grouped together.
The process of cleaning can be automated to a certain degree including de-duplication and cleaning up missing or bad data, but often there is an element that needs doing by hand. Typically changes done for analysis should not be repeated on the main transactional database to avoid losing data integrity.
Where data extraction and analysis is a routine task as is often the case for large datawarehousing projects, cleaning will need to be fully automated to ensure accurate repetition of the cleaning process and for speed purposes.
Merging
Once the data is cleaned it can be merged with other data sources. For instance, many organisations actually have a number of different databases that need to be combined before analysis can take place. Alternatively, external sources can be used in order to allow for analysis by industrial classification for instance.
Merging again is not entirely straightforward as some allowance may be necessary for the same customer having a different name on different databases. For instance Acme Building Contractors might also be known as ABC. Consequently, there may also be a second period of cleaning necessary once the data has been merged.
It is common on consumer sets to add classification data from external agencies such as the ACORN or MOSAIC geographic classification systems or to link in external data from consumer data companies such as Experian. These provide an additional layer of classification or segmentation data on top of the existing data.
Analysis
There are many different types of analysis that you can carry out on the data from the database. The most common analysis is normally to rank customers by value to see who the most valuable customers are and what their purchasing characteristics are (also known as pareto or decile analysis).
The standard database measures are recency, frequency and value. So who bought in the last month, 3 months, 6 months? Who has bought once a year, twice a year etc? How much did they spend?
The aim of the analysis is to try and identify those people who need contacting, or those people who would be most likely to respond to a particular communication. For instance, chasing up lapsed customers or making more offers to frequent buyers. In particular you want to identify who are the core purchasing groups and who are those that are most likely to respond to advertising or communications.
More sophisticated analysis starts to look at cross-sale rates (the % of buyers of screws that also bought nails), basket size and common product bundles. The aim being to increase the amount each customer buys. Where there are large numbers of products (eg supermarkets), the range and number of combinations can make such analysis extremely complicated.
At a more sophisticated level, the data can be used to attempt predictive modelling. For instance looking to see which customers would be most likely to respond to a piece of direct mail or a new product launch. The aim is usually to reduce the size of your mailing whilst maximising the number of people who will respond and so reducing the cost per response and increasing the value per response.
In addition, the database data can be used for segmentation. A major difference between segmentation via the database as opposed to market research segmentation is that the results can be marked back onto the database, so that each customer is labelled with their segment. This means that if you need to mail a particular segment from the database this is entirely possible, whereas for market research you are typically taking a second level guess. The difficulty is you are normally limited in the number of variables you have available for segmentation.
For help and advice on the effective use of database marketing contact info@dobney.com
