Big Data Analytics - Methodology-alljchome-开发者的教程家园

Big Data Analytics Tutorial

Big Data Analytics Project

Big Data Analytics Methods

Advanced Methods

Big Data Analytics Useful Resources

Selected Reading

Big Data Analytics - Methodology

In terms of methodology, big data analytics differs significantly from the traditional statistical approach of experimental design. Analytics starts with data. Normally we model the data in a way to explain a response. The objectives of this approach is to predict the response behavior or understand how the input variables relate to a response. Normally in statistical experimental designs, an experiment is developed and data is retrieved as a result. This allows to generate data in a way that can be used by a statistical model, where certain assumptions hold such as independence, normapty, and randomization.

In big data analytics, we are presented with the data. We cannot design an experiment that fulfills our favorite statistical model. In large-scale apppcations of analytics, a large amount of work (normally 80% of the effort) is needed just for cleaning the data, so it can be used by a machine learning model.

We don’t have a unique methodology to follow in real large-scale apppcations. Normally once the business problem is defined, a research stage is needed to design the methodology to be used. However general guidepnes are relevant to be mentioned and apply to almost all problems.

One of the most important tasks in big data analytics is statistical modepng, meaning supervised and unsupervised classification or regression problems. Once the data is cleaned and preprocessed, available for modepng, care should be taken in evaluating different models with reasonable loss metrics and then once the model is implemented, further evaluation and results should be reported. A common pitfall in predictive modepng is to just implement the model and never measure its performance.

Big Data Analytics - Methodology

友情链接