I agree with Edmon. I view automated scaling as a requirement. Automated scaling requires accurate predictive knowledge of how the big data system reacts at network scale (e.g., city traffic over a work day). Bottlenecks are a design concern. Maximum throughput saturation must also be estimated (e.g., Google has hit the wall several times and re-architected each time). Model before build.
Professors Flajolet and Sedgewick have developed the required mathematics: Analytic Combinators. This goes well beyond big “O” order of magnitude predictions to use generating functions and asymptotic analysis. This modeling predicts the circulatory flows and hard limits.
Lets not use marketing buzz words. Buzz words have little engineering significance and harm meaningful dialog with CIO, CTO, program manager level employees. A vetted, controlled vocabulary is important for communications with customers globally. Big data is networked data at potentially global scale.
On Jul 25, 2013, at 8:51 AM, “Edmon Begoli” <firstname.lastname@example.org> wrote:
> I think that horizontal vs. vertical scaling are terms that had importance in three-tier days when
> it was a big deal that one could cluster a system.
> Today we are really talking about linear scaling although I would argue that it might be more realistic to think
> of some general metric, almost like asymptotic notation (Big-Oh) for big data and assigning desired ratio
> for scaling of big data systems.
> As we like to keep things between logN and nlogN for algorithms, maybe we can
> say that ‘Big Data’ architectures should sustain unlimited growth in volume while retaining the linear or even
> logarithmic ratio with performance.