What is the best online digital business

IT for marketing

Basically, decisions based on data have the potential to be better than those made by people based on their individual experiences. However, this only applies if the amount of data recorded is large and relevant enough. Despite all belief in data, it is easy to forget that people have a great data advantage over machines: It consists in contextual knowledge and semantic understanding.

People know the connections. When a person sees a number of items of clothing, he can say with a fairly high degree of accuracy whether they are being bought by young women or by older men - without having observed a single direct purchase process.

If the amount of data swells, the machine gets better. People tend to have prejudices, they are still convinced of their ideas even when the statistics are clearly pointing in a different direction. Ideally, objective machine evaluations should therefore go hand in hand with context-related human interpretation. In any case, the basis is a well-maintained database.

Master data and transaction data

As is well known, a distinction is made between two types of data that are relevant for analyzes: Transactional data (formerly also called movement data) and master data (master data). Transaction data is all data from individual processes: an invoice, a purchase process, but also a single click of the mouse. They document a process that has been completed. They are therefore not subject to any changes after they have been recorded. A single mouse click does not change, the next mouse click on another product is another data record (see also: This is how master data management works).

Transaction data refer to master data. These describe objects or subjects. A product in an online shop is described by a master data record, as is a customer. The transaction data record mouse click in the online shop describes, for example, that customer no. 4711 (Heinz Mustermann) clicked on product 123456 (iPhone cover leather black for € 19.90) on June 17, 2015 at 2:51 pm. He previously searched for "iPhone cases" and found the product in question in fourth position.

In addition to a description, the master data for product 123456 contains, for example, the color, the manufacturer name, the price and the stock level, as well as often additional information. In addition to name and address, the master data for customer 4711 may contain the date of birth, bank details and email address.

In contrast to transaction data, master data can change. The shop can increase or decrease the price of product no. 123456, the stock changes after each purchase. Customer 4711 can move house or change bank. While the amount of transactional data in the big data arena can swell, the amount of master data is rarely more than a few million.

The master data is crucial

So if master data is so clearly in the minority numerically - why is it so important? Because transaction data always relate to master data. If there are errors in the master data, these are multiplied by the amount of these relationships. If the wrong color is stored in the master data for the above-mentioned iPhone cover, then all clicks on this product will determine the wrong color preferences of the respective customer in an analysis.

Quite apart from that, the return rate would increase massively. If Mr. Mustermann is incorrectly listed as Ms. Mustermann in the database, an increased interest of women in razor blades and technical toys would be recognized - with corresponding consequences for the automated recommendation engine.

Normalizing the product data is important

Manufacturers like to present their products as something very special by describing them in a cloudy way. Colors are no longer red or blue, but "Volcano" or "Deep Ocean". The material cotton becomes "Fil d’ecosse" or "pure cotton" - sounds a lot more valuable, doesn't it? However, so that the personalization engine in the online shop can recognize that the customer likes to wear cotton, you have to enter "cotton" as the product attribute everywhere. That sounds boring, but it works better.

  1. Daten_Wildwuchs_Shutterstock_naqiewei copy
    When it comes to data quality, many companies feel guilty. That smells like a big project with an uncertain return on investment. But the effort pays off if you keep a few rules in mind.
  2. 1st commandment: You should recognize that you are affected!
    Databases are not a static structure. They are subject to constant change. If they are not maintained, wild growth creeps in - through incorrect or duplicate filing of information, for example, through different spellings, through uncontrolled merging of databases, etc. Every company is affected.
  3. 2nd commandment: You should name those responsible for data quality!
    Data quality can only be had if there are employees who are aware of the importance of data maintenance and who take care of this task on a permanent basis. For this purpose, a person primarily responsible must be appointed who takes a look at the data quality at regular intervals, evaluates the reports from data quality tools and, if necessary, takes action.
  4. 3rd commandment: You should guard and enrich your data treasure!
    The cleaned database must be protected from new contamination. Data quality tools that check every new database entry can help with this. This makes it possible to find out whether data records have already been created (error-tolerant duplicate comparison), whether name, address, etc. are correct and the information is plausible (comparison with reference databases) or whether customers or suppliers violate compliance regulations (comparison with sanction lists).
  5. 4th commandment: You should make your data accessible and easy to find!
    Even the best-maintained database is of no use if the information hidden in it cannot be found quickly enough if necessary. In order to ensure that data records can be found promptly, a fault-tolerant search function is required that is able to quickly locate the desired information even in huge amounts of data.
  6. 5th commandment: You should automate data quality processes!
    Databases often contain hundreds of thousands or even millions of records. It would be inefficient to try to control data cleansing and ongoing quality maintenance tasks manually. Many of the processes and tasks mentioned can run automatically with the appropriate software in service-oriented architectures (SOA).
  7. 6th commandment: You should understand data quality as an international task!
    Data quality is increasingly becoming a cross-border challenge. In the case of mergers and acquisitions, international master data must be related to one another. In addition, more and more companies are expanding their purchasing to global markets.
  8. 7th commandment: You should rely on expert knowledge!
    There is no point in simply running data through an analysis tool. Know-how is required when dealing with master data. This concerns the basic objective and approach, the parameterization of the operative processes, the evaluation of the results and the installation of automatisms for sustainable quality maintenance.
  9. 8th commandment: improve the quality of your data step by step!
    It is best to start data quality processes in just one area, namely where the benefit is greatest. This procedure has proven itself many times in practice. This results in measurable small-scale success in a short time, for example in the CRM system. In addition, the strategy of small steps ensures planning security.
  10. 9th commandment: You should always keep the goals of your data quality activities in mind!
    Ultimately, data quality serves the one major goal of making all processes in the company more efficient in order to maximize profit. So that this big goal does not get out of sight in the small everyday data quality, it is advisable to define company-specific measurement parameters (Key Performance Indicators = KPIs).
  11. 10th commandment: You should reap the benefits of high data quality!
    Those who address their customers without errors convey professionalism and competence, avoid complaints or even dismissals and ensure manageable process costs. Anyone who has clear accounts payable and material master data reduces the administrative effort and is able to optimize purchasing processes and, for example, consistently exploit quantity advantages.

Enter this normalization in a separate field in addition to the imaginary names. For the conversion, however, the emotionally charged fantasy name certainly works better, so the product description should definitely include a sentence such as "Available in the colors Volcano, Deep Ocean and Spring Blossom".

Make sure it is complete!

If nothing is entered in a field, nothing can be analyzed. Product characteristics are the most important influencing factor for personalization. If product features are missing, personalization is no longer possible or only possible to a limited extent. In actual practice, however, information is often lacking. Many fields are not filled because the information is not provided by the supplier.

It is often worth taking a closer look here. It is not uncommon for relevant information to be contained in the running text: "This beautiful summer shirt made of pure cotton has a tailored shape without patch pockets ..." and the person reading this already knows: Material: cotton, fit: tailored, number of pockets: 0. Machines can that today too. But more on that later (see also: Ten Commandments for More Data Quality).

Which data are relevant?

Which data is relevant depends on the application. For personalization in the online shop (or in in-store solutions or customer-specific printed catalogs) there are a few central features that indicate special preferences of the user. "I only buy underwear from Schiesser" or "We take computer accessories from Logitech" are typical statements that some users would make. These are preferences on the brand attribute.

Similar to the "price segment", these are relevant characteristics that are relatively independent of the range. In addition, some users only buy a very specific product segment, for example only accessories for an electrical device, but not the device itself. If shop operators want to increase the share of wallet here, they should not leave the control at this point to the self-learning machine.

  1. Product groups 2025
    eCommerce Shares 2025; Source: GfK, July 2015
  2. Product groups 2014
    eCommerce shares 2014, with and without food; Source: GfK, July 2015
  3. Product groups 2014-2014
    Online sales shares per product group; Source: GfK, July 2015
  4. Diffusion model as a theoretical framework
    Consideration of innovations and imitation effects; Source: GfK, July 2015
  5. Share of product groups in%
    Assortment-related purchasing power; Source: GfK, July 2015
  6. Online sales shares 2008-2025
    The weight of the assortments in the total online trade volume is shifting; Source: GfK, July 2015
  7. Buying online satisfies rational needs
    Offline buying is emotional; Source: GfK, July 2015
  8. Saturation tendencies
    Online trading on the way to maturity; Source: GfK, July 2015
  9. Sociodemography
    Single men buy most online; large families prefer the PoS; Source: GfK, July 2015
  10. eCommerce sales 2009-2014
    Different developments depending on the industry; Source: GfK, July 2015
  11. Growth driver in online retail
    Cycle with decreasing dynamics; Source: GfK, July 2015

Mind you, the above applies to personalization. Other measures require different data: For the recommendation of complementary products by a recommendation engine (cross selling), it makes sense to let the product segment learn as well. This is often not done and then leads, for example, to the fact that a customer is offered another television after buying a television. Some recommendation engines are not up to this challenge, or the product data are not assigned to clear product categories.

Manual or automated maintenance? Both!

What to do if, as listed, a lot of data is not properly entered in the fields of the product master data, but only in the running text? Manual maintenance is recommended if the database is small (a few thousand products and only a few hundred with inadequacies) and does not change often. Then this is the cheapest way.

Check which fields are empty (this works with Excel) and whether you can find the information elsewhere. For example, copy the field with the color, make a table of all the colors and run "Find & Replace". Do the same with all other fields that you want to normalize. Check the results again manually. Don't underestimate the effort. This procedure can take ten minutes per data set. If data changes or new ones are added, repeat the procedure accordingly.

Data quality software can be used to automate processes that extract information from the text, normalize colors and sizes, convert millimeters into centimeters, etc. This automated method is recommended when large amounts of data are involved, especially when data changes frequently. In addition to cost savings, speed is the second major advantage: New data is optimized immediately and new products are optimally taken into account. It helps here to have a suitable advisor at your side, as many processes can be more complex than it initially appears.