Taking full advantage of a cloud data lake approach is a process that involves moving data and ensuring data quality.
For the global payment and money transfer service provider Western Union, data quality is a central element of its cloud data lake efforts.
Over the past two years, Western Union has embarked on an effort to consolidate its data warehouses in the cloud.
With locations and customers all over the world, Western Union has accumulated a large volume of data which it uses to improve its own business. The Corporate Data Manager, who helps lead Western Union’s data efforts, Thomas mazzaferro.
“We’ve done a lot of consolidation over the past 18 months,” Mazzaferro said, noting that the company has migrated over 20 petabytes and now has over 90% of its data in the cloud.
Western Union’s path to data quality
Mazzaferro explained that the company consolidated multiple data warehouses on a single data lake. Western Union uses AWS as its data lake, with Snowflake to enable its cloud data architecture.
Helping move data has been a task Mazzaferro and his team use Talend and its suite of data tools for data ingestion, extraction, transformation and loading, as well as data quality.
Mazzaferro noted that because Western Union operates internationally, it must be able to understand the quality of data wherever it is located.
With Talend, Mazzaferro said Western Union has the ability to bring data metrics to a centralized location to visualize data quality results.
Thomas mazzaferroData Officer, Western Union
“Talend makes data visible because if you don’t have it in the right place, you can’t visualize it correctly,” Mazzaferro said. “Talend really helps us streamline and optimize our processes and capabilities to support our customers.
Define data quality
For Mazzaferro, the first element of an effective data quality strategy is the ability to actually measure the data. With this capability comes a need for metrics to understand the data used by the organization’s processes and applications.
Beyond the ability to measure data, an effective data quality strategy also involves accountability. Mazzeferro emphasized the importance of being able to align accountability and ownership of data with the people who are responsible for a given set of information or processes.
Finally, when issues arise with data quality, he said it’s important to develop a plan designed by both business and technology teams to correct the process or improve the overall data flow.
Data quality is not a real-time measurement that Western Union takes.
Mazzeferro said his team didn’t want to slow down production data flow or take actions that could decrease real-time performance for users.
That said, he noted that Western Union examines data quality in what it calls “near real-time,” which can be minutes or hours after the data has moved. According to Mazzeferro, this allows the company to quickly resolve potential data quality issues.
“Our goal for the new year is to scale, extend, modernize and improve our trade policies through data-driven insights and data quality,” he said .