Month: December 2015

Software Development’s Fountain of Youth

You can view software development as being very similar to any other highly mechanized process designed to build things. Rather like a factory with all of the interrelated processes that go into making the final product. Like any manufacturing process, the software development life cycle starts with analysis and requirements and then moves on to […]


Letting Your Data Quality Software Understand Your Data

  When profiling your data, sometimes you just don’t know which columns you should look into or what you should validate in them. How much would it help if your data profiling software understood your data and helped you select the relevant quality indicators? This is exactly what the new Talend 6 semantic wizard is […]


Spoiler Alert! Talend 6.1 Hits the ‘Big Screen’

Back in September I talked about how excited I was about the new Star Wars movie coming out. Well, that day is upon us. Yes, Thursday, Dec 17, is the pre-ordained day in my part of the world (Ireland).  Have you ever been hit by a spoiler? You know, an older sister who breaks the news about […]


When it Comes To Big Data – Speed Matters

  Talend vs Informatica – The Big Data Benchmark If you’ve spoken to a Talend sales representative or read some of my team’s marketing material, then you’ve undoubtedly heard our claims that when it comes to Big Data, Talend offers some significant speed advantages over the competition. Concerned that some folks might dismiss this content […]


What’s Next for IoT: 4 Things to Watch

What’s next for IoT? There’s no doubt that there’s a lot more “connected things” these days and that means a lot more data. Specifically, technology is moving out of the consumer’s hands and into Healthcare, Oil & Gas, Transportation, Aviation and more. The spread of smart devices and sensors creates new forms of value and […]


Talend“作业设计模式”和最佳实践

  作为 Talend 开发人员,不管是入门新手还是资深人士,常常要面对同一个问题:“在编写这项作业时,哪种方式最好”?我们知道,通常应当高效、易读易写,并且尤其(多数情况下)要易于维护。我们也知道,Talend Studio 好比自由形态的“画布”,有全面而丰富的组件、存储库对象、元数据和链接选项,我们可以运用这些来“绘制”代码。那么,如何确定在创建作业设计时使用的是最佳实践? 作业设计模式 自 Talend 版本 3.4 起,在每次使用时我都体会到作业设计对我的重要性。起初我在开发作业时并未思考模式。之前我用过 Microsoft SSIS 和其他类似工具,所以像 Talend 这样的可视化编辑器对我来说并不陌生。相反,我关注的主要是基本功能、代码可重用性;其次是画布布局;最后是命名规则。现如今,针对各种用例我已经开发了数百项 Talend 作业,我发现代码变得更加精巧,可重用性更高,一致性也更好,此时,模式的意义才逐渐显露出来。 今年一月份加入 Talend 后,我有很多机会看到由客户开发的作业,得以证实自己的看法:对于每位开发人员,每个用例都有多种解决方案。 我认为这一点是不少人的问题所在。我们作为开发人员的确都这么认为,但是在开发特定作业时,往往认为自己的方式是最佳或者唯一选择,但实际上也知道“或许还有更好的方式”,这种声音反复萦绕于耳际。在此情况下,我们期待或寻觅最佳实践,也就是作业设计模式! 制定基本规范 考虑实现最佳作业代码所需元素时,通常用到一些基本法则。这些法则源于多年来从失败中汲取的教训以及积累的成功经验。它们至关重要,为构建代码奠定坚实基础。我个人认为应该引起高度重视。我认为这些法则包括(重要程度不分先后): – 可读性:创建明白易懂的代码 – 可写性:在最短时间内创建简洁明了的代码 – 可维护性:确定适当的复杂性,同时最大限度减少变更带来的影响 – 功能性:创建满足要求的代码 – 可重用性:创建可共享对象和原子工作单元 – 符合性:创建跨团队、项目、存储库和代码的真正规则 – 易适应性:创建可以变通而不致破坏的代码 – 可扩展性:创建可根据需要调整吞吐量的弹性模块 – 一致性:确定所有内容之间的共性 – 效率:创建优化的数据流和组件利用率 – 分区:创建服务于单一目标的原子化重点模块 – 优化:使用最少代码创建最多功能 – 性能:创建提供最快吞吐量的有效模块 重中之重是如何真正平衡这些法则,特别是前三条,因为这三者总是相互矛盾,满足其中两条往往要牺牲另外一条。rd如果可以,尝试按重要性对这些法则进行排序。 指南并非硬性标准,主要是为了有章可循! 在真正深入研究作业设计模式之前,结合刚刚阐述的基本法则,我们首先要确保了解一些其他值得考虑的细节。我发现很多时候标准过于严苛,并未针对与其相悖的非预期情况留出余地。而另外一些时候则相反。不同开发人员如出一辙地遵循刻板粗糙、有失协调的规范,更有甚者在作业设计中不连贯、缺乏规划甚至毫无章法,形成不良风气。坦率说来,我认为这样过于草率并会造成误导,其实想要避免这些并不困难。 出于上述以及其他相当明显的原因,首先要制定成文的 […]


IT stuff for free! – 3 Zero-Cost Integration Projects

According to Gartner forecasts, IT spend is likely to be close to $4 trillion in 2015. It’s probably very welcoming then to hear that you can still complete some IT projects for free. Case in point: Data Integration. There are free open solutions available that are a great alternative to either hand coding all your […]