Month: May 2016

The Lambda Architecture and Big Data Quality

  In my previous post about data quality in the Big Data era, we’ve seen some of the challenges raised by the recently born data operating system that came with Hadoop 2.0 and YARN . In Part 2 of this series, I’d like to explore how this new framework changes the traditional landscape of the data quality dimensions.  […]

Artificial Intelligence is no Longer Science Fiction, It’s a Reality

  Artificial intelligence (AI) is one of the most evocative and confusing terms in technology. It seems there are new announcements almost every day about the advancements of machines and their ability to ‘think’. We have seen a machine master the complex game of Go, previously thought to be the most difficult challenge of artificial processing. […]

Talend and “The Data Vault”

  In my previous blog “Beyond ‘The Data Vault’” I examined various data storage options and a practical architecture/design for an Enterprise Data Vault Warehouse.  As you may have realized by now I am quite smitten with this innovative data modeling methodology and recommend to anyone who is developing a ‘Data Lake’ or Data Warehouse […]

Stop Chasing Perfection in Analytics. Here’s Why

I wrote a blog around another favorite topic of mine, DevOps, a while back and in it I discussed the notion of perfection being the enemy of ‘good enough’. After some conversations these last few weeks, I have reaffirmed my stance and broadened it to include everything, especially analytics.  The things I hear time and […]

Introduction to Apache Beam

  This blog is the first in a series of posts explaining the overarching goal and purpose of the Apache Beam project. In the future blogs, we will explain how to use Apache Beam to implement data processing jobs.  When you have an existing big data platform, the continuous evolution of that platform is important. […]