Big Data Approach

In the new world of business, enterprises embark on the journey to become truly data-driven. According to Forbes, by 2020 a whopping 1.7 megabytes of new information will be created every second for every human being on this planet, with one-third of it passing through the cloud. This massive data proliferation is challenging enterprises to explore new ways to effectively monitor and manage their Big Data strategy. Ref: Big data stats by Forbes.

  • 83% of businesses say data and analytics are making existing services and products more profitable
  • 60% of businesses claim their data is generating revenue within their organizations
  • 59% of businesses consider data and analytics to be “vital” to the running of their organizations

There are multiple products and open source project those are making big data more big in terms or benefits.Apache hadoop project is one of them that normal IT person know about it but there are more inter related projects running under open source domain like Apache Spark,Apache Kafka,Pig,Hive,Sqoop,NiFi,Ambari those are helping in getting results from big data platform.

Even there are enterprise those are using hadoop framework to provides product or services in market and some of well known are

Recently HartonWorks launched their product HDP on AWS where you can deploy their cluster within minutes using AWS clouformation templates. They have designed those templates in way that you can directly start using them instead of designing and implementing by your internal staff that might take time to make to market ready even in a week. They have customized hadoop framework for better results and designed cloudformation template that automatically creates or modify security group, AWS IAM users and groups as per requirement. Hartonworks has 2 product that handles data-in-transit and data-at-rest.

Hortonworks powers the modern data architecture to manage data in the cloud and in the data center. The Hortonworks DataFlow (HDF) and Hortonworks Data Platform (HDP) deliver a connected data suite to gain insight from data-in-motion and data-at-rest. Hortonworks Data Platform addresses the needs of data-at-rest. It
provides an enterprise-ready data platform that enables organization to build modern data applications.

Hortonworks DataFlow addresses the needs of data-in-motion, such as data ingestion and real-time streaming capabilities. It is a cornerstone technology for the Internet of Things. Together, HDP and HDF offer the maximum business value from data in motion and data-at-rest as a complete solution. HDP and HDF provide an industry leading, open, innovative and enterprise-ready connected platform so you can access your data anywhere – on-premises or in the cloud.


Fig 1: Hartonworks HDF and HDP

As we enter the Internet of Anything age, applications such as edge analytics, stream analytics, historical analysis or machine learning will be scattered everywhere. You want the applications to act on, analyze, and drive value from the data in the most opportune place, wherever it happens to be.The Connected Data architecture manages the data flow and logistics, the metadata, and the corresponding security and governance policies. Digital transformation is fueled by the connected data logic to deliver continuous insights.

One of the small IT service company named progressive doing good quality work on big data using AWS Kinesis where they are using different design and techniques to handle data in motion and data-at-rest. AWS Kinesis help organization to do data analytics on the fly like getting stream of transnational data and then doing processing over it and finally transforming into final out put that can be stored on AWS storage (S3). I got this information by attending one of their webinar on AWS and their service offering using it.


Fig 2: Processing of data using different AWS services


Fig 3: AWS Kinesis and further services under it.

There are many consulting companies those are getting benefited using availability of big data handling tools in market. Like Mckinsey is having their own team to handle advanced analytics where they are using different set of tools to handle big data like Tablaue for reporting, SAS 9.2 and SPSS for various internal consulting purpose where they gets TB’s of data from concerned client for analysis.

One of the retail consulting company named Revionics utilizing their own written algorithms and tools to do least cost optimization, markdown and promotional services on retail products to help their clients to gain bigger market share as compare to their competitor to gain more profits. They even have their own data science team to provide science services on adhoc basis to help client in getting more insight on transaction data to help them in setting prices on their product. They have deployed Highly Distributed Computing (HDC) using internal built grid system that help in running batch jobs to release new prices.


Fig 4: Big data use cases as per progressive info system.

The organization that we mentioned are one of them but today all companies are relying on big data to grow their business and now days people starting getting analytic services to help them boost their business and there are amazing facts mentioned by above forbes post that shows impressive result and proves that using analytic service can help their business to grow in leaps and bound in short period of time. Big data and related tools helping people to find answers and results to get more insight like NASA recent project generating 100TB of data per day that helping scientist to analyze and provide results. There are more IT companies like Cloudera coming in this market to gain market share and they are heavily investing in tool set development and such products expertise is huge in demand in IT professional market.

Data analytic profile is top most among highly paid profiles in IT industry and highly technical skilled people are in huge demand. Countries like India where data science role was not present in last 3 to 4 year back but now we can see much people with this skill set but still there is huge shortage for such skilled people as per industry requirement.

Apache is contributing a lot in big data domain by creating many projects under this domain like

Apache Kafka serving as single cluster to server as the central data backbone for a large organization.

Apache Mahout a scale able machine learning library.

Apache NiFi a easy to use powerful and reliable system to process and distribute data.

Apache Spark is fast and general engine for large scale data processing. It offers high level API of Java,Scala,Python and set of libraries for stream processing and machine learning graph analytics.

Apache pig consists of compiler that produces sequences of map reduce program

Storage cost is going down day by day and transition from tapes to magnetic storage and then from magnetic storage to SSD and flash drives happening very fast and cost is going down by multiple factor. You can assume that AWS Glacier is charging .2 Paisa (INR) for 1GB a month for data archival and AWS S3 object storage charging $0.0314 per GB a month that comes to around Rs 2.10 INR a month and they are taking down their prices day by day and have multiple options to store your data with different rates that helping companies to store more and more data on cloud and perform their analysis and can be destroyed once done.Similarly other cloud vendors like VMware, Microsoft Azure, Google are other market players where they have their own offering in this domain and further consulting can be taken before taking their services.

A study by IDC found the following results on how Amazon Web Services (AWS) impacts organizations from across section of industries.

  • It found that organizations of all sizes are realizing:   560% ROI over five years
  • 1.5M Dollars in benefits per application over 5 years (average)
  • 64% Reduction in Total Cost of ownership

Source: Amazon Web Services

There will be more technical in depth over these product usage and implementation and i will be putting more blogs under this category so that system admins or solution architects can get better practical implementation approach for these product and services while consulting clients.

Leave a comment