Extract, load, transform

Extract, load, transform (ELT) is an alternative to extract, transform, load (ETL) used with data lake implementations. In contrast to ETL, in ELT models the data is not transformed on entry to the data lake, but stored in its original raw format. This enables faster loading times. However, ELT requires sufficient processing power within the data processing engine to carry out the transformation on demand, to return the results in a timely manner. Since the data is not processed on entry to the data lake, the query and schema do not need to be defined a priori (although often the schema will be available during load since many data sources are extracts from databases or similar structured data systems and hence have an associated schema). ELT is a data pipeline model.^[1]

Cloud data lake components

Common storage options

AWS
- Simple Storage Service (S3)
- Amazon RDS
Azure
- Azure Blob Storage
GCP
- Google Storage (GCS)

Querying

AWS
- Redshift Spectrum
- Athena
- EMR (Presto)
Azure
- Azure Data Lake
GCP
- BigQuery

References

↑ Using Redshift Spectrum to load data pipelines Archived 2021-01-18 at the Wayback Machine Published by deductive.com on January 17, 2018, retrieved on April 3, 2019.

External links

Dull, Tamara, "The Data Lake Debate: Pro is Up First", smartdatacollective.com, March 20, 2015.
ELT: Extract, Load, and Transform A Complete Guide | Astera Software

This article is issued from Wikipedia. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.

[1] Using Redshift Spectrum to load data pipelines Archived 2021-01-18 at the Wayback Machine Published by deductive.com on January 17, 2018, retrieved on April 3, 2019.

[1]