Forem

# dataengineering

Posts

👋 Sign in for the ability to sort posts by relevant, latest, or top.
Migrate the legacy Greenplum to Apache Cloudberry with cbcopy

Migrate the legacy Greenplum to Apache Cloudberry with cbcopy

Comments
7 min read
Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Dec 9th - Dec15th, 2025)
Cover image for Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Dec 9th - Dec15th, 2025)

Apache Dev List Digest: Iceberg, Polaris, Arrow & Parquet (Dec 9th - Dec15th, 2025)

Comments
7 min read
Building a dbt-UI I Wish Existed

Building a dbt-UI I Wish Existed

Comments
3 min read
Unpacking the Google File System Paper: A Simple Breakdown

Unpacking the Google File System Paper: A Simple Breakdown

Comments
3 min read
Day 15: Running Spark in the Cloud - Dataproc vs Databricks
Cover image for Day 15: Running Spark in the Cloud - Dataproc vs Databricks

Day 15: Running Spark in the Cloud - Dataproc vs Databricks

Comments
2 min read
How to Sync Data from an Oracle Table to Elasticsearch using Kafka Connect

How to Sync Data from an Oracle Table to Elasticsearch using Kafka Connect

1
Comments 1
5 min read
The Myth of Distributed Computing as a Silver Bullet for Big Data

The Myth of Distributed Computing as a Silver Bullet for Big Data

5
Comments
10 min read
Rethinking Stream-Batch Unification: Real-Time Processing with Incremental Materialized Views in Apache Cloudberry

Rethinking Stream-Batch Unification: Real-Time Processing with Incremental Materialized Views in Apache Cloudberry

Comments
5 min read
Data Engineering Processes: From Raw Data to Cleaned, Processed, Analytics-Ready Data.
Cover image for Data Engineering Processes: From Raw Data to Cleaned, Processed, Analytics-Ready Data.

Data Engineering Processes: From Raw Data to Cleaned, Processed, Analytics-Ready Data.

Comments
5 min read
Navigating the Future: Top Data Engineering Trends Shaping 2024 and Beyond

Navigating the Future: Top Data Engineering Trends Shaping 2024 and Beyond

Comments
4 min read
Day 14: Building a Real Retail Analytics Pipeline Using Spark Window Functions
Cover image for Day 14: Building a Real Retail Analytics Pipeline Using Spark Window Functions

Day 14: Building a Real Retail Analytics Pipeline Using Spark Window Functions

Comments
1 min read
Day 13: Window Functions in PySpark
Cover image for Day 13: Window Functions in PySpark

Day 13: Window Functions in PySpark

Comments
2 min read
Why Idempotency Is So Important in Data Engineering
Cover image for Why Idempotency Is So Important in Data Engineering

Why Idempotency Is So Important in Data Engineering

Comments
6 min read
REST API Calls for Data Engineers: A Practical Guide with Examples
Cover image for REST API Calls for Data Engineers: A Practical Guide with Examples

REST API Calls for Data Engineers: A Practical Guide with Examples

Comments
3 min read
Is CsvPath an easy or hard language?
Cover image for Is CsvPath an easy or hard language?

Is CsvPath an easy or hard language?

Comments
16 min read
Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile
Cover image for Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Understanding Salesforce Data 360 Objects: The Core of the Unified Customer Profile

Comments
3 min read
Day 12: UDF vs Pandas UDF
Cover image for Day 12: UDF vs Pandas UDF

Day 12: UDF vs Pandas UDF

Comments
2 min read
The Data Engineers Descent Into Datetime Hell

The Data Engineers Descent Into Datetime Hell

1
Comments
5 min read
Day 11: Choosing the Right File Format in Spark
Cover image for Day 11: Choosing the Right File Format in Spark

Day 11: Choosing the Right File Format in Spark

Comments
2 min read
Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Navigating the Future: Key Data Engineering Trends for 2024 and Beyond

Comments
6 min read
Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations
Cover image for Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations

Day 7: Mastering Joins, Unions, and GroupBy in PySpark - The Core ETL Operations

Comments
2 min read
map

map

Comments
1 min read
Data Engineering in 30 Days - Day 2

Data Engineering in 30 Days - Day 2

Comments
2 min read
Why Frontend Teams Should Care About Data Modeling for Real-Time Dashboards
Cover image for Why Frontend Teams Should Care About Data Modeling for Real-Time Dashboards

Why Frontend Teams Should Care About Data Modeling for Real-Time Dashboards

Comments
2 min read
Shine in Your Next Data Engineering Interview with Pandas

Shine in Your Next Data Engineering Interview with Pandas

Comments
10 min read
loading...