Month: March 2016

Talend 作业设计模式和最佳实践 ~ 第 2 部分

  很高兴看到上篇博文《Talend“作业设计模式”和最佳实践》收到不错的反响,在此向各位读者表示感谢。如果您尚未读过,建议先行阅读,因为本篇作为第二部分,是在前文基础上展开的后续探讨。现在似乎时机成熟,可以适当接触更深层次的主题。让我们稳扎稳打,现在开始!您将看到一些更有趣的情况。 作业设计入门 作为有经验的 Talend 开发人员,我总是好奇他人如何创建作业。他们是否正确使用各项功能?用到的样式我是否了解,抑或从未见过?想出的解决方案是否独到而精巧?又或者,鉴于画布/组件数据/工作流程本身的抽象性质,接下来是否愁眉不展,不知所措…无论这些问题的答案如何,我觉得使用专门设计的工具都非常重要。为此,我开始着手研究“作业设计模式”及与之相关的最佳实践。在我看来,即便已了解 Talend 的所有特性和功能,但是根本需求仍然不变,那就是探索构建作业的最佳方法。 从逻辑上讲,“业务用例”是任何 Talend 作业的关键基本驱动因素。事实上,我在同一工作流和各类不同工作流中看到各种不同的情况。这些“用例”大多数从基本前提出发,也即最简单的数据集成作业形式是从某个源提取数据并进行处理;在此过程中可能进行数据转换,最终将数据加载到某个目标位置。因而 ETL/ELT 代码不可或缺,Talend 开发人员也正致力于此。这一点我们就不再赘述,接下来我们放宽眼界,扩大探讨面。 毋庸置疑,最新版 Talend(6.1.1 版)是我用过的最好的版本。各个新组件都极具优势,比较典型的有大数据、Spark、机器学习、现代化 UI 和自动化持续集成/部署,该版本代表当今市场最为强大、功能最丰富的数据集成技术。这么说或许有点偏私,但我们始终设身处地为客户着想,希望您也能透过表象发现价值,作出自己的判断。 奠定 DI 项目成功的三大基础 我们都认同圆凳需要三条腿才能站稳,对吧?软件开发也是如此。构建并交付成功的数据集成项目需要三个基本要素: 用例 – 明确定义的业务数据/工作流要求 技术 – 创建、部署和运行解决方案的工具 方法 – 业界公认的行事方式 考虑这些要素,加上完善的“开发指南”文档(您是否读过我以前的博文?是否已经为自己的项目编制此文档?),我们以此为前提展开探讨。 扩展基本理论 如果说 Talend“作业”在“用例”工作流中包含了技术,那么“作业设计模式”就是构建它们的最佳实践“方法”。若我在这些博文中分享的其他内容于您没有价值,但请至少在作业构建方式上保持一致。如果您找到了更好的方法,认为非常有效,那再好不过,不必做出改变。但是,如果您在性能、可重用性和可维护性方面备受困扰,或者需要反复调整代码以适应不断变化的需求,那么这些最佳实践对于您,Talend 开发人员,将大有裨益! 可供考虑的另外 9 项最佳实践: 软件开发生命周期 (SDLC) “人员、产品和流程”视为决定任何业务成败的三个关键因素。对此我非常赞同。SDLC 流程对于任何软件开发团队都是殊为关键的环节。正确处理非常重要,而忽视则可能导致项目严重受阻,甚至造成灾难性后果。Talend 的 SDLC“最佳实践指南”针对 Talend 开发人员可用的持续集成和部署功能,深入研究相关概念、原则、规范和细节。强烈建议软件开发团队将 SDLC 最佳实践纳入“开发指南”文档(我在本系列上篇博文中有述),然后照此实施。 管理工作区 当您在笔记本电脑/工作站安装Talend Studio 时(假设您拥有管理员权限),通常会在本地磁盘驱动器上创建默认的“工作区”目录,并且与许多软件安装一样,此默认位置位于可执行文件所在的目录中。我认为这么做确实不妥。原因何在? 项目文件(作业和存储库元数据)的本地副本存储在此“工作区”中,如果是通过 […]


Talend Integration Cloud Spring ‘16: Making Leaps with Spark, Amazon Redshift, and EMR Integration

Several years ago, the cloud was a concept that many forward looking businesses were just beginning to think about and which many feared. In fact, even as few as three years ago —particularly for enterprises—there were looming concerns around security, apprehension to relinquish one’s data to a third party, periodic incidents of massive cloud provider […]


The Five Phases of Hybrid Integration—Part II

The challenges emerging from digital business, and demand for greater agility, are forcing changes to integration approaches. Integration leaders can assess technology providers by understanding the five key phases of hybrid integration projects and their needs as an organization.  In our last installment, we looked at the very first two phases of hybrid integration, which by […]


Why Marketing Teams Need Data Prep Tools!

Guest blog by Charles Parat, Strategy & Innovation Director, Micropole Group In today’s world, business workers must utilize a range of enterprise applications in order to complete some of the more routine tasks associated with their job title. When these applications work well and help streamline repetitive tasks, workers have time to focus on more strategic endeavors […]


Apache Solr High Speed Data Integration Plugin

Guest blog post from The Digital Group. T/DG, stands for The Digital Group, has been working on Talend based data source integration for quite some time. The Digital Group recently launched 3RDi (Third Eye) Enterprise Search Discovery and Analytics Platform that utilizes capabilities of Talend for all data integration layers. 3RDi is a comprehensive suite […]


The Five Phases of Hybrid Integration—Part I

  Today’s digitally-driven businesses intensify the data integration challenges that IT departments face. With a greater number of SaaS and on-premises applications, machine data, and mobile apps, we are seeing the rise of  complex value-chain ecosystems that are proliferating to support these digital business initiatives. IT leaders need to incorporate a portfolio-based approach and combine […]


Big Data: Why You Must Consider Open Source

  Guest Blog by Bernard Marr, Founder and CEO of The Advanced Performance Institute A quiet revolution has been taking place in the technology world in recent years. The popularity of open source software has soared as more and more businesses have realized the value of moving away from walled-in, proprietary technologies of old. And […]