Pipeline
Pipeline manages a directed acyclic graph of primitive PTransforms and the PCollections that they consume and produce. Each Pipeline is self-contained and isolated from any other Pipeline.NewPipeline
Creates a new empty pipeline.A new pipeline instance ready for transform construction
Root
Returns the root scope of the pipeline.The root scope used for inserting transforms
Build
Validates the pipeline and returns a lower-level representation for execution. Called by runners only.Graph edges, nodes, and any validation error
Scope
Scope is a hierarchical grouping for composite transforms. Scopes can be enclosed in other scopes and form a tree structure.Type Definition
IsValid
Returns true if the Scope is valid. Any use of an invalid Scope will panic.Whether the scope is valid for use
Scope
Returns a sub-scope with the given name.Name for the sub-scope (may be augmented for uniqueness)
A new sub-scope nested within this scope
WithContext
Creates a named subscope with an attached context for the represented composite transform.Context to attach to the scope
Name for the sub-scope
A new sub-scope with the attached context
PCollection
PCollection is an immutable collection of values. A PCollection can contain either a bounded or unbounded number of elements.Type Definition
IsValid
Returns true if the PCollection is valid and part of a Pipeline.Whether the PCollection is valid
Type
Returns the full type of the elements in the PCollection.The concrete type of elements (e.g., int, string, KV<int,string>)
Coder
Returns the coder for the collection.The coder used for encoding/decoding elements
SetCoder
Sets the coder for the collection.The coder to use for this PCollection
Error if the coder type doesn’t match the PCollection type
Pipeline Execution
Run
Executes the pipeline using the selected registered runner.Context for pipeline execution
Name of the registered runner (e.g., “direct”, “dataflow”, “flink”)
The pipeline to execute
Pipeline result with metrics and job ID, or error if execution fails
PipelineResult
Interface returned by pipeline execution containing metrics and job information.Metrics
Returns the metrics collected during pipeline execution.JobID
Returns the job identifier from the runner.Create
Inserts a fixed set of values into the pipeline.Create
Creates a PCollection from a fixed non-empty set of values.The scope to insert the transform into
One or more values of the same type
A PCollection containing the specified values
CreateList
Creates a PCollection from a slice or array. Supports empty collections.The scope to insert the transform into
A slice or array of values
A PCollection containing the list elements
Best Practices
Pipeline Construction
Pipeline Construction
- Create one pipeline per program execution
- Use meaningful scope names for debugging and monitoring
- Build the entire pipeline before calling Run
- Pipelines can safely be executed concurrently
Scope Organization
Scope Organization
- Use Scope() to create logical groupings of transforms
- Scope names appear in monitoring and visualization tools
- Nested scopes form namespaces for transform identification
PCollection Usage
PCollection Usage
- PCollections are immutable - transforms create new ones
- Check IsValid() when handling PCollections conditionally
- PCollections cannot be shared between pipelines