Enhance Data Lake Creation with Talend Big Data

Enhance Data Lake Creation with Talend Big Data

The ideas driving an information lake appear to be basic: safely store every one of your information in a crude configuration and apply a pattern on read. In fact, the main portrayal of an information lake contrasted it with a ‘huge waterway in a more regular state’, while an information shop could be considered as a ‘store of filtered water – scrubbed and bundled and organized for simple utilization’.

An information lake is a wagered against the future – you don’t know what examination you might need to do, so why not simply continue everything to give the most obvious opportunity you can fulfill any necessity that goes along?

On the off chance that you invest some energy perusing about information lakes, you rapidly uncover another term: the information overwhelm. A few associations discover their lakes are loaded with unregulated and obscure substance. Keeping an information bog may appear impossible―how do you gather all of information that your organization produces and keep it sorted out? In what manner will you ever discover it again? How would you keep your information lake clean?

This story discusses Apache Hadoop, its capabilities as a data platform and how it can integrate with Talend Big Data to deliver integration projects 10 times fast than manually doing MapReduce.

Talend simplifies the integration of big data so you can respond to business demands without having to write or maintain complicated Apache Hadoop code. With Talend Big Data, you can easily integrate all your data sources for use cases including data warehouse optimization, sentiment analysis, web log analysis, predictive analytics, fraud detection or building an enterprise data lake.

An enterprise data lake provides the following core benefits to an enterprise:

  • New efficiencies for data architecture through a significantly lower cost of storage, and through optimization of data processing workloads such as data transformation and integration.
  • New opportunities for business through flexible “schema-on-read” access to all enterprise data, and through multi-use and multi-workload data processing on the same sets of data, from batch to real-time.

Apache Hadoop provides these benefits through a technology core comprised of:

  • Hadoop Distributed Filesystem. HDFS is a Java-based file system that provides scalable and reliable data storage that is designed to span large clusters of commodity servers.
  • Apache Hadoop YARN. YARN provides a pluggable architecture and resource management for data processing engines to interact with data stored in HDFS.


Talend Big Data generates native and optimized Hadoop code and can load, transform, enrich and cleanse data inside Hadoop for maximum scalability. Its easy-to-use graphical development environment speeds design, deployment and maintenance. Support is provided for simple transformations, advanced transformations and custom transformations. Talend Big Data is the only solution to natively run data quality rules on Hadoop at infinite scale to parse, cleanse and match all of your data.

Features and benefits of Talend:

• 800+ components and connectors to all data sources and applications including big data and NoSQL

• Support for ETL and ELT, real-time delivery and event-driven delivery

• YARN and Hadoop 2.0 support for better resource optimization

• Talend code generation for better scalability and portability

• Visually optimize MapReduce jobs before production for faster development

• A large collaborative community for support


“With Talend and Hadoop the online retailer can predict with 90% of certainty about customer interest, conversion and whether customer will shop or abandon their shopping cart”.  

A global retailer with 12 billion euros in annual turnover was looking for a way to improve revenue. The firm was experiencing a high rate of shopping cart abandonment and could not quickly adjust prices based on demand, inventory and competition. In the highly competitive online retail sector, buyers can easily compare prices and the competition is just one click away. The retailer needed to have a better understanding of consumer online activity and correlate their behavior to historical buying patterns.

To do this however, required analyzing terabytes of data in real-time and the ability to act before the buyer left the website. The retailer selected Talend Big Data and Hadoop to glue all of their 2 applications, data silos and data formats together to gain new insight into their business and online buyer behavior. With Talend the retailer is now able to analyze live and historical clickstream data (over 5 terabytes) and provide sub-second responses, such as advertisements or dynamic price changes, while customers are shopping online. They can predict with 90% certainty whether someone will abandon their shopping cart. Additionally they are able to reduce the amount of leftover merchandise by 20% through more thorough historical analysis and better forecasting techniques