Search This Blog

Saturday, March 4, 2017

Rebels and Transformation

As an independent consultant the biggest challenge I face is inability to see the big picture and being patient towards building a logical workflow that involves people, process and technology.
Due to the lack of patience clients often resort to "distress sourcing": simply defined as "sourcing of all my distresses to a vendor" or finding a technology that will take over the communication issues that are part of your people problem.
I came across the following paragraph while reading a book on spirituality:
"The people who cannot rebel ask for guidance, want to be followers. Their psychology is that to be a follower relieves them of all responsibility: the guide, the master, the leader, the messiah become responsible for everything. All that is needed of the follower is just to have faith, and just to have faith is another name of spiritual slavery" - The Rebel
I believe the above paragraph aptly summarises the issue. The following are some interpretations I made out of above excerpt.
  • Organization which love buzzwords in the boardroom but have not assessed the cultural fabric of their organization look for guidance from consulting firms/vendors. Which in itself is a fast way of learning; provided that the people have ability and motivation to learn. 
  • Some of the organisations I have worked with focus on hiring resources with exact skill set rather than evaluating learnability. This then poses a threat for their ability to cope with changes.
  • Before starting the journey of being a rebel/transformation/change it is very critical that an organization knows clearly the "for what?". That is, for what are they taking a particular journey. 
  • I usually find that the management to be very clear about their goals for starting a particular journey. The real challenge is downward communication to the people in the trenches and ability of people at each level to translate the bigger goal in their own goals (a.k.a. metrics/KPIs/KRAs)
If you agree with the above two then I am sure you agree with a change/transformation journey to have the following four actors and not just the vendor with a contract:
  • The Guide: The vendor or the consulting help who is hired to avoid the usual pitfalls and help morph the change as a transformation and not a trial.
  • The Master: The internal owner trusted by the management for her/his judgement and who in turn banks on the learnability of the people in the organization and steers the transformation.
  • The Leader: I believe that it is not just one person who is the leader but the complete management team. The role of the leaders is to make sure that they prepare their own divisions and teams to digest the change.
         This not only gives their own divisions and teams the confidence to accept the change but also gives the master ability to have small failures and course correct open heatedly the change that will work for the people in the organization. This allows for a way to adapt the change that fits right into the current processes and works for the people. This also avoids an implementation of the blueprint as suggested by the vendor.
  • The Messiah: Most people think that an evangelist plays this role or even worse some people outsource this responsibility, thinking that it can be quantified in person-hours.
       I believe that its not merely the responsibility of certain individuals to become the messiah. It is the responsibility of everyone in the organization to become the messiah of the change and thus welcome the transformation. When everyone in the organization welcomes a change it makes every other colleagues life easier and thus the organization assimilates the change much faster.
If you expect to embark on a journey of transformation and expect the vendor make it happen by themselves then you are forcing your people becoming a slave of the change.

Friday, April 15, 2016

Learning so called Hadoop: where to start?

It is confusing what book to read, what tutorial or courses to take. Right now the system has been split in different modules. Picking material on Hadoop will often straight away take you to HDFS and MapReduce/Yarn and programming. This can be confusing for analyst community as you are trying to learn more about analysis and not system/infrastructure maintenance.

So then the question is where does an analyst start? In my opinion you can look at the following blocks and start on any query tools.

In the current Hadoop ecosystem, HDFS is still the major storage option. On top of it snappy, RCFile, Parquet and ORCFile could be used for storage optimisation. Core Hadoop MapReduce released a version 2.0 called Yarn for better performance and scalability. Spark and Tez as solutions for real-time processing are able to run on the Yarn to work with Hadoop closely. Base is a leading NoSQL database, especially when there is a NoSQL database request on the deployed Hadoop clusters. Swoop is still one of the leading and matured tools for exchanging data between Hadoop and relational databases. Flume is matured distributed and reliable log-collecting tool to move or collect data to HDFS. Impala and Presto query directly against the data on HDFS for better performance.

So if you are an analyst like me then Hive, Pig, Impala, Presto, Sqoop and HBase can be a good flow to start taming the beast. Just like in the good ol days you can become an analyst first and then depending on your interest in infrastructure and admin side you can jump into other systems.

To start learning Hive - one needs to install it. So I would recommend following this URL (this one is the best of the couple available out there)

Friday, March 11, 2016

Customer Loyalty

What does loyal customer mean?
  - Someone who makes repeat/regular purchases
  - Someone who purchases across product/service lines/categories
  - Someone who refers others
  - Demonstrates immunity from going to competition.


  • R: Recency
  • F : Frequency
  • M: Monetary Value
And then Customer Life Time Value (LTV)

Value Pyramid of Customers/ Where is the opportunity to create loyal customers?
  • Know your best customers: Who buys high order value? Who is a repeat buyer?
  • What is the expected value in your segment?
  • What is it about your service and product right now that makes the customer buy?
  • Where does top percentage of your revenue come from? (Ticket size, geo, category, sub-cat, brand)
  • What are they buying, When are they buying, How are they buying?
  • When looking at top buyers - do look into returns and other data silos.
  • Has the value of purchase/order value grown over the time?
  • A) What are their unsolved problems? B) What are their headaches? C) What keeps them up at night?
  • Make it easy for them to try or buy your new products and services.
  • Are you doing Birthday/New Year promotions card? Do you use this opportunity to force feedback?
  • "What is one thing we could have done better?"
  • Seek out employee feedback. Make sure you empower employees.
  • Communicate the vision of promotion to the front line.
Create such a visualisation:

Total Revenue                         80% of Revenue
XXXX                                      0.8R

CUST ID         REVENUE        80% REVENUE
8                         R                        0.8R

Strategic Planning: Steps

These are notes from my scribe. Wont make sense to most readers who have reached here randomly. If they do - well and good!

Steps involved in strategic planning

Communicate and prepare
       - Announce process
       - Identify resource -> set out the work they do

Meeting #1
      - Explain process
      - Work on the content
                 - SWOT
                 - Mission
                 - Vision
                 - Principles
                 - Goals
                 - Strategic filters

Homework Assignment
       Comeback with which initiatives they are going to work on

Meeting #2
     - Why everyone rated the initiatives the way they did as per the strategic filters
     - Initial prioritisation list
     - Initial Owners assigned

Homework Assignment 2
      Analyse high priority
            - Market Validation
            - Financial Analysis
            - Execution Considerations

Meeting #3- Resource Planning
      - Validate priorities
      - Identify resource
      - Allocate resources

Resource Matrix: Initiative| Cost| Resource

Tools for analysis
- Five Forces

Good matrix:


Thursday, March 10, 2016

Why analytics, data science, big data and digital transformation initiatives fail?

Excerpt from the book The Rebel - Osho does explains nicely why most analytics, data science, big data and digital transformation initiatives fail? Its just because data alone can't change the organization, it has to be the culture of the organization that needs to be changed - an no, not after the new is built.

Old and New: Such is human mind

"I have heard about an old church: it was so ancient that people had stopped going there because even strong wind and the church would start swaying. It was so fragile, any moment it could fall. Even the priest had started giving his sermons outside the church, far away in the open ground.

Finally, the board of trustees had a meeting; something had to be done. But the trouble was that the church was very ancient - it was the glory of the town; their town was famous far and wide because of the old church: perhaps it was the oldest church in the world. It was not possible to demolish it and to make a new one. But it was also dangerous to let it remain as it was - it was going to kill someone. Nobody had been going in for years; even the priest was not courageous enough to go in because who knew at what moment the church would simply collapse? So something had to be done.

The board was in a very great dilemma: something had to be done, and nothing should be done because that church is so ancient, and man has been in such deep attachment with things that are ancient. So they passed a resolution with four clauses in it. The first was: "We will make a new church, but it will be exactly the same as the old. It will be made of the same material the old is made of - nothing new will be used in it, so it remains ancient. It will be made in the same place where the old church stands because that place has become holy by ancientness."

The last thing in their resolution was, "we will not demolish the old church until the new is ready." They were all happy that they had come to a conclusion. But who was going to ask those idiots, "how are you going to do it?" The old should not be demolished till the new was ready. And the new had to be made of everything the old was made of, in the same place where the old was standing, with exactly same architecture the old had. Nothing new could be added to it: the same doors, the same windows, the same glass, the same bricks - everything that needed to be used had to be of the old church.

And finally, they decided that the old should not be touched till the new was ready. "When the new is ready, then we can demolish the old."

Such is the humans mind: it clings to old, it also wants the new, and then it tries to find some compromise - that at least the new should be like the old. But a few things are impossible, nature just won't allow them.

Wednesday, October 14, 2015

My 2¢s worth on Big Data

I recently left my role at an Internet market research company. As this company I was helping manage the pre-sales and post-sales for enterprise web-analytics platform business. I worked with unstructured data (collected from web using GET request) for last 9 years and understand the business need and data collection methods in depth. In the past few years “Big Data” has become a buzzword in the field of technology.

To keep this post relevant – I am going to avoid writing things, which you can read on other sources.

Genesis: The idea of big data and projects in this area got popular after Google published a paper on their distributed file system and how it could be used to collect, store and analyze large volumes of data on commodity hardware.

Purpose: The need to store data in large quantities has been around since banking, telecommunications, airline and power transmission have digitized their data on computers. Typically these cash rich companies would spend on mainframes and ensure high availability costly machines to host this data. These were costly machines and could result in couple of million-dollar worth of hardware, software license and personnel cost. What made it still okay to spend so much was that the data was essential to be stored as each transaction had a commercial value or was recorded for regulatory compliance and thus missing the data was not an option. (Short history of the IBM Mainframes)

Early 2000s: Saw the rise of the internet and online applications where the end user actually was interacting with computers. Thus the purpose of having an computer went beyond record keeping. This also resulted in explosion on the volumes of data generated. While the data in the logs was useful but every single action wasn’t of commercial value; instead there was value in understanding what collection of these logs would tell more about the customer and their behavioral journey.

So this was the challenge that large internet companies like Yahoo!, Google, MSN et. al. were trying to solve. This resulted in creation of systems similar and including the GFS. These systems allowed using commodity hardware for store and querying of data. Thus reducing the cost of maintaining a data collection and analysis system.

My encounter with big data and challenge in learning: As a web analytics consultant I helped companies to collect, ingest and analyze the web traffic logs using software built by companies like Adobe (Omniture/WebSideStory/Visual Sciences), WebTrends, Coremetrics, comScore and Google (analytics). These applications worked nicely to satisfy the reporting needs of the executives and worked as system on the side without interfering the primary ways in which the main core of the web services would work.

Change in recent years: In the recent years internet has become inevitable part of the lifestyle and thus making companies like Facebook, Twitter, Linkedin, Google major part of ones day. This also means that these companies have access to 1bn+ online audiences who can be fed online advertising and thus fueling the online commerce channels. Instead of paying web analytics companies for an analytics system the engineers at these tech companies have resorted to the use of Hadoop(a.k.a. big data) systems to collect, store and analyze the traffic logs.

What it sparked: Since now there is a way to collect, store and analyze hoards of data the application engineers also figured out other ways to store data from clinical research, operations or any other activity which could result in collection of data which was purely logging activity. This data is then mined to perform statistical analysis, predictive analysis, natural language processing, artificial intelligence and machine learning. Such applications provide data analysts with a magnifying glass to look at large volumes or data and find out macro trends and insights, which weren’t possible earlier as there weren’t cheaper ways of performing the analysis and the value of the insights, didn’t generate savings/profits greater than the cost of the systems.

Where its going: Internet of things (IOT) and Mobile technology has facilitated automation of collection of data and thus further fuelling a growth in the collection of more data.

What exactly is big data? There is lot of hoopla about what big data is and what it is not. In simple words it’s a way to store large amount of data, process, query and analyze it using cost efficient hardware system. The software that has become unanimous with big data is Hadoop and other utilities that allow manipulating or querying the data.

How do you explain Hadoop? Hadoop is sort of a misnomer for collection of software and anyone who is knowledgeable about the components will actually be willing to speak specifics about the components. People who are bullshitting their way around will stop at the keyword ‘hadoop’.

    Hadoop Common: The common utilities that support the other Hadoop modules.
  Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
    Hadoop YARN: A framework for job scheduling and cluster resource management.
    Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.

Once you have installed the HDFS you have a cluster where the file system looks like one big volume/drive but is actually sharded across various units that form your cluster. The tasks written, usually in Java or Python that allow querying the shards and then aggregating the results are called as MapReduce programs. One may say that before the writing the MapReduce logic there is no datamodel to the data – it’s the MapReduce that defines the data model and query model for the underlying data.

Besides this there are couple of other utilities that help you manage the big data system. They can be listed as follows:
·      Ambari: A web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters which includes support for Hadoop HDFS, Hadoop MapReduce, Hive, HCatalog, HBase, ZooKeeper, Oozie, Pig and Sqoop. Ambari also provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually alongwith features to diagnose their performance characteristics in a user-friendly manner.
·      Avro: A data serialization system.
·      Cassandra: A scalable multi-master database with no single points of failure.
·      Chukwa: A data collection system for managing large distributed systems.
·      HBase: A scalable, distributed database that supports structured data storage for large tables.
·      Hive: A data warehouse infrastructure that provides data summarization and ad hoc querying.
·      Mahout: A Scalable machine learning and data mining library.
·      Pig: A high-level data-flow language and execution framework for parallel computation.
·      Spark: A fast and general compute engine for Hadoop data. Spark provides a simple and expressive programming model that supports a wide range of applications, including ETL, machine learning, stream processing, and graph computation.
·      Tez: A generalized data-flow programming framework, built on Hadoop YARN, which provides a powerful and flexible engine to execute an arbitrary DAG of tasks to process data for both batch and interactive use-cases. Tez is being adopted by Hive™, Pig™ and other frameworks in the Hadoop ecosystem, and also by other commercial software (e.g. ETL tools), to replace Hadoop™ MapReduce as the underlying execution engine.
·      ZooKeeper: A high-performance coordination service for distributed applications.

If you are non-technical, data savvy business person then more than likely the “how do you explain Hadoop?” section is where you loose the interest and ignore the details as jibersh. The next thing you will want to do is hire a person who takes care of all the details and run the big data project for you. And now you have a requirement out there in the market looking for 50 business analytics skills with all the technical skills and a person who is in touch with your business for last 10 years. Well if you believe its one person who could do this – then you have it wrong.

In general from my understanding this is how I would divide the big data team:
1)    Make the system work: Traditionally these people have job titles of UNIX system administrators. These people will make the basic infrastructure work and will make the so-called Hadoop file system work with other applications. Their KPI of these resources is ‘system availability’
2)    Business analyst: These people were called business analysts. The key skills for these social people is to find out all the data sources and detail the information contained in these data sources. These people also have to be tech savvy to understand the APIs and data models that allow marrying the datasets for a holistic view of the KPIs on which the organization is run. These resources can usually be the old hands who have been in your company for a while and understand the political boundaries and can negotiate their way to make things happen. People like me who have worked with web data and integrated offline source to create meaningful reporting frameworks can be bucketed here.
3)    Team of analysts: These are the set of people who can write SQL, VBA scripts and excellent skills with creating spreadsheet dashboards and power point presentations.

I have spent some time trying understand the mystery systems and will continue to read more… Like my post is titled, this is my 2 cents worth. Hopefully you enjoyed this post.

Monday, August 18, 2014

Lessons from my motorcycle ride in Ladakh

I just came back from 900KM+ motorcycle ride trip in Ladakh. The route were various hills and passes in the western Himalayas (Ladakh range). The roads are said to be amongst most dangerous roads to drive in the world and Kardung La is the highest motor able pass whereas Chang La is the third highest motor able pass in the world. 

Few things that I learnt through this trip are:
1) Learn to breathe: Breathing happens unconsciously and we don't realise the importance of breathing. At high altitude breathing is a task by itself and has to be performed as a conscious activity and it doesn’t just happen. So before you set to do any other activity you have to remember to breathe! If you breathe (both inhaling and exhaling are important) right then the activity becomes easy. I’m sure this is very important at places with normal altitude as well and it focusing on breathing may allow controlling yourself in emotional, stressful or situation of rage.
On the flip side at 18k feet when breathing was really difficult I also realized what old age must feel like. Breathing becomes difficult task at that altitude and helps understanding the importance of enjoying life while young. Old age is tough and don’t wait to enjoy until you turn old – because the most difficult thing then might be to just breathe ☺

2) Focus on your goals: Motorcycle riding has to start early in the morning to avoid high level of water from the melting snow. Sound sleep is important and the neighbors in tents or hotel rooms may not always agree with your schedule and be noisy. It’s important to focus on your own goals and let others do what they want. They may not always block you from things that you want to do but may be deterring you from focusing on your goals. 
You’ll also find lots of inconsiderate drivers who share road with you and might annoy you and make you lose your temper but if you focus on enjoying your ride and making it to your destination then it becomes a whole lot easier and enjoyable. So if you just focus on your own goals and help colleagues with similar goals to focus then the noise disappears and goals seem feasible.

3) Sometimes journey is equally or more important: Most of the days we went riding there was little to do at the destination. The destination often had awe-inspiring scenery but not activity after dark. I immediately realized that it wasn’t just the destination that was beautiful – the whole journey offered such awe-inspiring views and the motorcycle riding offered freedom to stop anywhere and enjoy and thus making the journey more important than just reaching the destination.

So here are some photos from the trip for you to enjoy and if you agree with above things without going through the same experience then please learn to breathe, stay focused on your goals and enjoy the journey of life!