Overview
Apache Flink is a distributed stream processing framework that excels at:- Stream Processing: True streaming with low latency and high throughput
- Exactly-Once Processing: Strong consistency guarantees with checkpointing
- State Management: Distributed stateful computations
- Event Time Processing: Native support for event time and watermarks
When to Use FlinkRunner
Best For
- Real-time streaming applications
- Stateful stream processing
- Low-latency requirements
- Exactly-once processing semantics
- Complex event processing
- Existing Flink infrastructure
Consider Alternatives
- Simple batch jobs (DirectRunner)
- GCP-based workloads (DataflowRunner)
- Existing Spark clusters (SparkRunner)
- Local development (PrismRunner)
Setup and Configuration
Prerequisites
- Apache Flink cluster (1.15 or later recommended)
- Java 8 or later
- Network access to Flink JobManager
Dependencies
- Java
- Python
- Go
Add to your For Gradle:
pom.xml:Replace
1.18 with your Flink version. Supported versions include 1.15, 1.16, 1.17, and 1.18.Flink Cluster Setup
Local Flink Cluster
Flink on Kubernetes
Running a Pipeline
Basic Example
- Java
- Python
- Go
Execution Modes
Local Mode
Run with an embedded Flink cluster:Cluster Mode
Submit to an existing Flink cluster:Auto Mode
FlinkPipelineOptions
Core Options
Address of the Flink JobManager. Can be:
host:port- Connect to remote cluster[local]- Start local embedded cluster[auto]- Auto-detect based on environment
Degree of parallelism for the pipeline. -1 uses Flink’s default.
Maximum degree of parallelism. Sets upper limit for dynamic scaling.
Checkpointing Options
Interval in milliseconds for triggering checkpoints. -1 disables checkpointing.
Checkpointing mode:
EXACTLY_ONCE or AT_LEAST_ONCE.Maximum time in milliseconds for a checkpoint to complete.
Minimum pause in milliseconds between checkpoints.
Maximum number of concurrent checkpoints.
State Backend Options
State backend for storing state. Options:
filesystem, rocksdb, memory.Directory for storing checkpoints.
Streaming Options
Enable streaming mode for unbounded sources.
Interval for automatic watermark emission in milliseconds.
Advanced Configuration
Exactly-Once Processing
Configure for exactly-once semantics:State Management
Savepoints
Start from a savepoint:Window Configuration
Batch vs Streaming
Batch Pipeline
Streaming Pipeline
Monitoring and Debugging
Flink Web UI
Access the Flink web UI athttp://jobmanager-host:8081:
- View running and completed jobs
- Monitor task metrics
- Inspect checkpoints and savepoints
- View logs and exceptions
- Track watermarks and event time
Metrics
Beam metrics are exposed as Flink metrics:Logging
- Flink Web UI (per task)
- TaskManager log files
- Configured log aggregation system
Performance Tuning
Parallelism
Memory Configuration
Network Buffers
State Backend Tuning
Best Practices
Checkpointing Strategy
-
Enable checkpointing for production
-
Use appropriate intervals
- Shorter intervals: Lower data loss, higher overhead
- Longer intervals: Higher data loss, lower overhead
-
Configure timeout appropriately
State Management
-
Choose the right state backend
- Memory: Small state, fast access
- Filesystem: Medium state
- RocksDB: Large state, slower access
-
Use incremental checkpoints
-
Monitor state size
- Check in Flink UI
- Set up alerts for growth
Resource Management
-
Right-size workers
-
Use appropriate parallelism
- Generally: slots_per_tm * num_tm
- Consider data skew
-
Configure backpressure handling
- Monitor in Flink UI
- Increase parallelism if needed
Troubleshooting
Job fails to start
Job fails to start
Check:
- Flink cluster is running
- JobManager address is correct
- JAR contains all dependencies
- Sufficient resources available
Checkpoint failures
Checkpoint failures
- Increase checkpoint timeout
- Check state backend configuration
- Verify checkpoint directory is accessible
- Monitor state size growth
Out of memory errors
Out of memory errors
- Increase TaskManager memory
- Reduce parallelism
- Use RocksDB state backend
- Enable incremental checkpoints
High latency
High latency
- Check for backpressure in UI
- Increase parallelism
- Optimize transforms
- Reduce checkpoint frequency
Watermark not advancing
Watermark not advancing
- Check for stuck sources
- Verify watermark generation
- Look for slow tasks in UI
- Check for data skew
Runner Capabilities
Supported Features
- ✅ Batch and streaming
- ✅ Exactly-once processing
- ✅ Event time processing
- ✅ State and timers
- ✅ Side inputs
- ✅ All window types
- ✅ Custom triggers
- ✅ Savepoints
Limitations
- Limited support for some Beam transforms
- Requires Flink cluster management
- State size limited by backend choice
Next Steps
Flink Documentation
Learn more about Apache Flink
SparkRunner
Alternative distributed runner
State & Timers
Advanced stateful processing
Windowing
Learn about windowing strategies