The Pipeline API provides the foundation for building Apache Beam data processing workflows. A Pipeline manages a directed acyclic graph of PTransforms and the PCollections they consume and produce.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/apache/beam/llms.txt
Use this file to discover all available pages before exploring further.
Pipeline Class
ThePipeline class represents a data processing workflow.
Creating a Pipeline
create()
Creates a pipeline from default PipelineOptions.create(PipelineOptions)
Creates a pipeline from the provided PipelineOptions.Configuration options for the pipeline, including runner specification
Applying Transforms
apply(PTransform)
Adds a root PTransform to the pipeline.The root transform to apply (e.g., Read or Create)
The output PCollection or POutput from the transform
apply(String, PTransform)
Adds a named root PTransform to the pipeline.The name for this transform node in the pipeline graph
The root transform to apply
The output PCollection or POutput from the transform
Running the Pipeline
run()
Runs the pipeline using the PipelineOptions specified during creation.The result of the pipeline execution
run(PipelineOptions)
Runs the pipeline using the specified PipelineOptions.The pipeline options to use for execution
The result of the pipeline execution
Other Methods
begin()
Returns a PBegin for this pipeline, used as input for root transforms.A PBegin owned by this pipeline
getOptions()
Returns the PipelineOptions for this pipeline.The pipeline options
getCoderRegistry()
Returns the CoderRegistry for this pipeline.The coder registry for encoding/decoding data
getSchemaRegistry()
Returns the SchemaRegistry for this pipeline.The schema registry for schema-aware transforms
PipelineResult Interface
Represents the result of a pipeline execution.Methods
getState()
Retrieves the current state of the pipeline execution.The current state (UNKNOWN, STOPPED, RUNNING, DONE, FAILED, CANCELLED, UPDATED, UNRECOGNIZED)
cancel()
Cancels the pipeline execution.The state after cancellation attempt
waitUntilFinish()
Waits until the pipeline finishes and returns the final status.The final state of the pipeline
waitUntilFinish(Duration)
Waits until the pipeline finishes with a timeout.The maximum time to wait (less than 1ms means infinite wait)
The final state of the pipeline, or null on timeout
metrics()
Returns metrics from the pipeline execution.The metrics from pipeline execution
PipelineOptions Interface
Configuration options for pipeline execution. Created usingPipelineOptionsFactory.
Example: