Outliers are Special
There have been tons of articles
written about the cleansing the data. I do not want to re-iterate here that why
the data needs to be cleansed, what techniques are used for it and so on. There
is a general perception (and rightly so) that the outliers sway the future
predictions in wrong directions and hence they should be handled carefully. The
outliers are often demonised and advised to be get rid of. But in my point of
view the Outliers are not always bad. They sometimes help the planners in more
ways they can think.
Depending upon the outliers, one
can remove them completely from the historical data or replace them with an
appropriate number but MUST NOT ignore them. Every outlier must be analysed
properly and considered for future predictions. These outliers are more than
just an anomaly. They sometimes give a hint of the things that are going to
come in future -
Point Outliers
When a data point is entirely out
of context, it can be termed as Point Outliers. For example, all the stores of
brand XYZ are doing a certain average business of say $10,000 a month. Last
year, for the month of June one particular store reports $20,000 a month sales and
$10,000 at an average thereon. It definitely is a Point Outlier and maybe
ignored while taking this as historical data for future predictions. But before
ignoring it one must dig deeper and see the reasons behind it. Was it the case
of mis-reporting? Were there some store level promotions on offer? Or Was there
some Corporate order that the store manager managed from some organization? If
later then there maybe the cases of some repeat orders. Getting more details
about this outlier (instead of simply ignoring it) will help the planner
predicting and planning better for the future season.
Group Outliers
When a group of continuous data
points are significantly out of the pattern, they can be termed as Group
Outliers. For example, total returns of goods sold by brand XYZ is say 5% items
per week. However, for all the weeks of month November and December this
increased to 20%. Again, these are Group Outliers for sure, but the planner
must analyse if this spike was just because of the rush shopping during the
Holiday Season? Or there is some issue with their winter clothing? If former,
then they have to account for a similar returns next year as well. It stays an
anomaly but no more it can be ignored. And if later, then they can ignore the
numbers but better buckle up their suppliers for winter clothing.
There can be multiple such
examples to prove that these outliers are not ‘just another’ abnormality in the
pattern, but they sometimes worth much more than that. Hence, you may still
like / dislike the outliers but must not make the mistake of
ignoring the outliers!!!

No comments:
Post a Comment