Skip to main content
The Pipeline API provides the foundation for building Apache Beam data processing workflows. A Pipeline manages a directed acyclic graph of PTransforms and the PCollections they consume and produce.

Pipeline Class

The Pipeline class represents a data processing workflow.

Creating a Pipeline

create()

Creates a pipeline from default PipelineOptions.
public static Pipeline create()
Returns: A new Pipeline instance Example:
Pipeline pipeline = Pipeline.create();

create(PipelineOptions)

Creates a pipeline from the provided PipelineOptions.
public static Pipeline create(PipelineOptions options)
options
PipelineOptions
required
Configuration options for the pipeline, including runner specification
Returns: A new Pipeline instance Example:
PipelineOptions options = PipelineOptionsFactory.create();
Pipeline pipeline = Pipeline.create(options);

Applying Transforms

apply(PTransform)

Adds a root PTransform to the pipeline.
public <OutputT extends POutput> OutputT apply(
    PTransform<? super PBegin, OutputT> root)
root
PTransform<? super PBegin, OutputT>
required
The root transform to apply (e.g., Read or Create)
output
OutputT
The output PCollection or POutput from the transform
Example:
PCollection<String> lines = pipeline.apply(
    TextIO.read().from("gs://bucket/dir/file*.txt")
);

apply(String, PTransform)

Adds a named root PTransform to the pipeline.
public <OutputT extends POutput> OutputT apply(
    String name,
    PTransform<? super PBegin, OutputT> root)
name
String
required
The name for this transform node in the pipeline graph
root
PTransform<? super PBegin, OutputT>
required
The root transform to apply
output
OutputT
The output PCollection or POutput from the transform
Example:
PCollection<String> lines = pipeline.apply(
    "ReadInputFiles",
    TextIO.read().from("gs://bucket/dir/file*.txt")
);

Running the Pipeline

run()

Runs the pipeline using the PipelineOptions specified during creation.
public PipelineResult run()
result
PipelineResult
The result of the pipeline execution
Example:
Pipeline p = Pipeline.create();
// ... build pipeline ...
PipelineResult result = p.run();

run(PipelineOptions)

Runs the pipeline using the specified PipelineOptions.
public PipelineResult run(PipelineOptions options)
options
PipelineOptions
required
The pipeline options to use for execution
result
PipelineResult
The result of the pipeline execution

Other Methods

begin()

Returns a PBegin for this pipeline, used as input for root transforms.
public PBegin begin()
pbegin
PBegin
A PBegin owned by this pipeline

getOptions()

Returns the PipelineOptions for this pipeline.
public PipelineOptions getOptions()
options
PipelineOptions
The pipeline options

getCoderRegistry()

Returns the CoderRegistry for this pipeline.
public CoderRegistry getCoderRegistry()
registry
CoderRegistry
The coder registry for encoding/decoding data

getSchemaRegistry()

Returns the SchemaRegistry for this pipeline.
public SchemaRegistry getSchemaRegistry()
registry
SchemaRegistry
The schema registry for schema-aware transforms

PipelineResult Interface

Represents the result of a pipeline execution.

Methods

getState()

Retrieves the current state of the pipeline execution.
State getState()
state
State
The current state (UNKNOWN, STOPPED, RUNNING, DONE, FAILED, CANCELLED, UPDATED, UNRECOGNIZED)

cancel()

Cancels the pipeline execution.
State cancel() throws IOException
state
State
The state after cancellation attempt

waitUntilFinish()

Waits until the pipeline finishes and returns the final status.
State waitUntilFinish()
state
State
The final state of the pipeline
Example:
PipelineResult result = pipeline.run();
State finalState = result.waitUntilFinish();
if (finalState == State.DONE) {
    System.out.println("Pipeline completed successfully!");
}

waitUntilFinish(Duration)

Waits until the pipeline finishes with a timeout.
State waitUntilFinish(Duration duration)
duration
Duration
required
The maximum time to wait (less than 1ms means infinite wait)
state
State
The final state of the pipeline, or null on timeout

metrics()

Returns metrics from the pipeline execution.
MetricResults metrics()
metrics
MetricResults
The metrics from pipeline execution

PipelineOptions Interface

Configuration options for pipeline execution. Created using PipelineOptionsFactory. Example:
PipelineOptions options = PipelineOptionsFactory.create();
options.setRunner(DataflowRunner.class);
options.setTempLocation("gs://bucket/temp");

Pipeline pipeline = Pipeline.create(options);

Complete Example

import org.apache.beam.sdk.Pipeline;
import org.apache.beam.sdk.io.TextIO;
import org.apache.beam.sdk.options.PipelineOptions;
import org.apache.beam.sdk.options.PipelineOptionsFactory;
import org.apache.beam.sdk.transforms.Count;
import org.apache.beam.sdk.transforms.MapElements;
import org.apache.beam.sdk.values.TypeDescriptors;

public class WordCount {
    public static void main(String[] args) {
        // Create pipeline options
        PipelineOptions options = PipelineOptionsFactory
            .fromArgs(args)
            .create();
        
        // Create the pipeline
        Pipeline p = Pipeline.create(options);
        
        // Build the pipeline
        p.apply("ReadLines", TextIO.read().from("input.txt"))
         .apply("CountWords", Count.perElement())
         .apply("FormatResults", MapElements
             .into(TypeDescriptors.strings())
             .via(kv -> kv.getKey() + ": " + kv.getValue()))
         .apply("WriteResults", TextIO.write().to("output"));
        
        // Run the pipeline
        p.run().waitUntilFinish();
    }
}