Describe what you want
In Copilot chat or the dedicated Generate pipeline flow, write what the pipeline should accomplish. Effective briefs include:- Sources (systems, buckets, tables, APIs)
- Transformations (parsing, deduplication, typing, aggregations)
- Destination and load strategy (truncate, merge, append)
- Freshness (batch vs near-real-time) if it changes node choice
Generated nodes and edges
The assistant proposes a node set wired in order: extract, flatten, validate, load. It may add:- Parser nodes for semi-structured formats
- Column mapping or type enforcement nodes
- Data quality rules (null checks, uniqueness)
- Basic orchestration placeholders (manual trigger until you attach a schedule)
Generate the draft
Submit your brief and wait for the graph preview. Decline and rephrase if the topology misses a critical step.
Run preview
Execute a limited preview to validate parsing and schema. Adjust SQL or mappings based on errors.
Enhance existing pipelines
Open an existing pipeline and ask Copilot to add or refactor sections:- “Insert a null-safe email normalization step before the warehouse load.”
- “Split the JSON array
line_itemsinto a child table load with keys from the parent.” - “Add a failure branch that writes bad rows to a quarantine bucket.”
Quality checklist
Keys and idempotency
Keys and idempotency
Confirm merge keys and incremental cursors so reruns do not duplicate production data.
Cost
Cost
Large cluster defaults may be oversized for dev; downscale compute in Compute settings for tests.
Documentation
Documentation
Add a short pipeline description for teammates; Copilot text is not a substitute for owned runbooks.