
1.ApacheStreamPark是什么?
ApacheStreamPark是流处理极速开发框架,流批一体 & 湖仓一体的云原生平台,一站式流处理计算平台。
2.介绍
2.1 特性

2.2 架构

2.3 Zeppelin和StreamPark的对比
3.相关连接
ApacheStreamPark官方文档
https://streampark.apache.org/zh-CN/
flink1.14.4官网
https://nightlies.apache.org/flink/flink-docs-release-1.14/zh
streampark2.1.0的gitHub地址
https://github.com/apache/incubator-streampark/tree/release-2.1.0
本地调试启动、编译指南
https://z87p7jn1yv.feishu.cn/docx/X4UfdZ8cdoeK8ExQ7sUc1UHknps
https://mp.weixin.qq.com/s/N1TqaLaqGCDRH9jnmhvlzg
4.部署
4.1 二进制包编译构建
tar -zxvf incubator-streampark-2.1.0.tar.gz

执行:
./build.sh
4.2 镜像构建
Dockerfile文件
FROM alpine:3.16 as deps-stage
COPY . /
WORKDIR /
RUN tar zxvf apache-streampark_2.12-2.1.0-incubating-bin.tar.gz \
&& mv apache-streampark_2.12-2.1.0-incubating-bin streampark
FROM docker:dind
WORKDIR /streampark
COPY --from=deps-stage /streampark /streampark
ENV NODE_VERSION=16.1.0
ENV NPM_VERSION=7.11.2
RUN apk add openjdk8 ; \ # 这里会报错,在windows环境用;在linux上使用&&
apk add maven ; \
apk add wget ; \
apk add vim ; \
apk add bash; \
apk add curl
ENV JAVA_HOME=/usr/lib/jvm/java-1.8-openjdk
ENV MAVEN_HOME=/usr/share/java/maven-3
ENV PATH $JAVA_HOME/bin:$PATH
ENV PATH $MAVEN_HOME/bin:$PATH
RUN wget "https://nodejs.org/dist/v$NODE_VERSION/node-v$NODE_VERSION-linux-x64.tar.gz" \
&& tar zxvf "node-v$NODE_VERSION-linux-x64.tar.gz" -C /usr/local --strip-components=1 \
&& rm "node-v$NODE_VERSION-linux-x64.tar.gz" \
&& ln -s /usr/local/bin/node /usr/local/bin/nodejs \
&& curl -LO https://dl.k8s.io/release/v1.23.0/bin/linux/amd64/kubectl \
&& install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
RUN mkdir -p ~/.kube
EXPOSE 10000
构建命令:
docker build -f Dockerfile -t my_streampark:2.1.0 .
#推送阿里云镜像仓库(略)
这里给大家提供了我自己构建的镜像如下:
registry.cn-hangzhou.aliyuncs.com/bigfei/zlf:streampark2.1.0
4.3 初始化sql

执行的过程会碰到两个错误:
-- 1.Unknown column !launch' in 't flink_app'
alter table "t flink_app'
-- drop index“inx state": 2.注释这个一行
-- 这个是在2.1.0的版本里面的flink_app这个表里面缺少的字段和索引,可以或略,或者是在表里加上launch字段,不影响我我们下面部署2.1.0来使用这个库里的sql数据的
streampark库如下:

4.4 部署
4.4.1 Docker-compose.yaml部署脚本
version: '2.1'
services:
streampark-console:
image: my_streampark:2.1.0
command: ${RUN_COMMAND}
ports:
- 10000:10000
env_file: .env
volumes:
- flink:/streampark/flink/${FLINK}
- /var/run/docker.sock:/var/run/docker.sock
- /etc/hosts:/etc/hosts:ro
- ~/.kube:/root/.kube:ro
privileged: true
restart: unless-stopped
jobmanager:
image: apache/flink:1.14.4-scala_2.12-java8
command: "jobmanager.sh start-foreground"
ports:
- 8081:8081
volumes:
- ./conf:/opt/flink/conf
- /tmp/flink-checkpoints-directory:/tmp/flink-checkpoints-directory
- /tmp/flink-savepoints-directory:/tmp/flink-savepoints-directory
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
taskmanager:
image: apache/flink:1.14.4-scala_2.12-java8
depends_on:
- jobmanager
command: "taskmanager.sh start-foreground"
volumes:
- ./conf:/opt/flink/conf
- /tmp/flink-checkpoints-directory:/tmp/flink-checkpoints-directory
- /tmp/flink-savepoints-directory:/tmp/flink-savepoints-directory
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
volumes:
flink:
4.4.2 配置文件准备
deplay文件夹下:

conf文件夹如下:


把官方的那个拿出来改一改然后挂载,我这个好像是没有生效的,
相关资料会在文末分享的。
4.4.3 flink启动配置
flink官网内存配置
https://nightlies.apache.org/flink/flink-docs-release-1.14/zh/docs/deployment/memory/mem_setup_tm/
4.4.4 streampark启动配置
flink-conf.yaml文件配置
jobmanager.rpc.address: jobmanager
blob.server.port: 6124
query.server.port: 6125
state.backend: filesystem
state.checkpoints.dir: file:///tmp/flink-checkpoints-directory
state.savepoints.dir: file:///tmp/flink-savepoints-directory
heartbeat.interval: 1000
heartbeat.timeout: 5000
rest.flamegraph.enabled: true
web.backpressure.refresh-interval: 10000
classloader.resolve-order: parent-first
taskmanager.memory.managed.fraction: 0.1
taskmanager.memory.process.size: 2048m
jobmanager.memory.process.size: 7072m
4.4.5 遇到的问题
由于我之前搞的flink部署有点问题,使用了桥接网络,导致直接使用flink的sql-client.sh执行之前的cdc失败了,报了如下的错误:
java.net,UnknownHostException: jobmanager: Temporary failure in name resolution

java.util.concurrent.CompletionException: java.util.concurrent.Completiotion: org.apache.flink.runtime.jobmanager.schedulerloResourceAvailableException: Could not acquire the minimurrequired resources.这个问题是之前flink采用桥接网络搭建的有问题,导致jobmanager启动不起来,使用上面正确的启动方式和flink-conf.yaml里面的配置,对taskmanager和jobmanager的资源配置和内存配置如下:```propertiestaskmanager.memory.managed.fraction: 0.1taskmanager.memory.process.size: 2048mjobmanager.memory.process.size: 7072m```请根据官网先关flink的内存参数来设置,资源尽量给大点,然后把之前有问题的容器删除重新启动后,三个容器都正常启动了。
5 cdc实践
5.1 确定flink是否正常
flink首页正常启动在没有任务执行的时候可以看到slot的数据量:

正常启动taskManagers里面可以看到task的信息:


job-manager的信息:

5.2 streampark管理端配置
streampark的默认的用户名和密码是:admin/streampark
5.2.1 flink-home配置

5.2.2 flink-cluster配置

5.2.3 新增cdc-sql和上传jar或添加依赖

5.3 cdc执行成功实例
cdc相关的请看
https://mp.weixin.qq.com/s/N1TqaLaqGCDRH9jnmhvlzg
streampark端:
streampark点击开始启任务的时候不选择savepoint了,不然flink那边会报错的

flink端:

2023-06-14 15:48:58 2023-06-14 07:48:58,551 WARN org.apache.flink.runtime.jobmaster.JobMaster [] - Error while processing AcknowledgeCheckpoint message
2023-06-14 15:48:58 org.apache.flink.runtime.checkpoint.CheckpointException: Could not finalize the pending checkpoint 3. Failure reason: Failure to finalize checkpoint.
2023-06-14 15:48:58 at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1227) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:48:58 at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1100) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:48:58 at org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$1(ExecutionGraphHandler.java:89) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:48:58 at org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:48:58 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
2023-06-14 15:48:58 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
2023-06-14 15:48:58 at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
2023-06-14 15:48:58 Caused by: java.io.IOException: Mkdirs failed to create file:/tmp/flink-checkpoints-directory/acb95418d91e34f6cce478337154dd4f/chk-3
2023-06-14 15:48:58 at org.apache.flink.core.fs.local.LocalFileSystem.create(LocalFileSystem.java:262) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:48:58 at org.apache.flink.runtime.state.filesystem.FsCheckpointMetadataOutputStream.<init>(FsCheckpointMetadataOutputStream.java:65) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:48:58 at org.apache.flink.runtime.state.filesystem.FsCheckpointStorageLocation.createMetadataOutputStream(FsCheckpointStorageLocation.java:109) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:48:58 at org.apache.flink.runtime.checkpoint.PendingCheckpoint.finalizeCheckpoint(PendingCheckpoint.java:323) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:48:58 at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1210) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:48:58 ... 6 more
2023-06-14 15:49:01 2023-06-14 07:49:01,533 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator [] - Triggering checkpoint 4 (type=CHECKPOINT) @ 1686728941531 for job acb95418d91e34f6cce478337154dd4f.
2023-06-14 15:49:01 2023-06-14 07:49:01,557 WARN org.apache.flink.runtime.jobmaster.JobMaster [] - Error while processing AcknowledgeCheckpoint message
2023-06-14 15:49:01 org.apache.flink.runtime.checkpoint.CheckpointException: Could not finalize the pending checkpoint 4. Failure reason: Failure to finalize checkpoint.
2023-06-14 15:49:01 at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.completePendingCheckpoint(CheckpointCoordinator.java:1227) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:49:01 at org.apache.flink.runtime.checkpoint.CheckpointCoordinator.receiveAcknowledgeMessage(CheckpointCoordinator.java:1100) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:49:01 at org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$acknowledgeCheckpoint$1(ExecutionGraphHandler.java:89) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:49:01 at org.apache.flink.runtime.scheduler.ExecutionGraphHandler.lambda$processCheckpointCoordinatorMessage$3(ExecutionGraphHandler.java:119) ~[flink-dist_2.12-1.14.4.jar:1.14.4]
2023-06-14 15:49:01 at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [?:1.8.0_322]
2023-06-14 15:49:01 at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [?:1.8.0_322]
2023-06-14 15:49:01 at java.lang.Thread.run(Thread.java:750) [?:1.8.0_322]
2023-06-14 15:49:01 Caused by: java.io.IOException: Mkdirs failed to create file:/tmp/flink-checkpoints-directory/acb95418d91e34f6cce478337154dd4f/chk-4

后面我又使用如下命令给两个文件夹下所有文件授权:
[root@DESKTOP-QF29H8K tmp]# chmod -R 777 flink-savepoints-directory/
[root@DESKTOP-QF29H8K tmp]# chmod -R 777 flink-checkpoints-directory/
version: '2.1'services::image: my_streampark:2.1.0command: ${RUN_COMMAND}ports:10000:10000env_file: .envvolumes:flink:/streampark/flink/${FLINK}/var/run/docker.sock:/var/run/docker.sock/etc/hosts:/etc/hosts:ro~/.kube:/root/.kube:roprivileged: truerestart: unless-stoppedjobmanager:image: apache/flink:1.14.4-scala_2.12-java8command: "jobmanager.sh start-foreground"ports:8081:8081volumes:./conf:/opt/flink/conf./tmp/flink-checkpoints-directory:/tmp/flink-checkpoints-directory./tmp/flink-savepoints-directory:/tmp/flink-savepoints-directoryenvironment:JOB_MANAGER_RPC_ADDRESS=jobmanagertaskmanager:image: apache/flink:1.14.4-scala_2.12-java8depends_on:jobmanagercommand: "taskmanager.sh start-foreground"volumes:./conf:/opt/flink/conf./tmp/flink-checkpoints-directory:/tmp/flink-checkpoints-directory./tmp/flink-savepoints-directory:/tmp/flink-savepoints-directoryenvironment:JOB_MANAGER_RPC_ADDRESS=jobmanagervolumes:flink:
docker-compose -f docker-compose-windows.yaml up -d
docker-compose 挂载目录
https://blog.csdn.net/SMILY12138/article/details/130305102



# savepoint的写法是file:/tmp/flink-savepoints-directory

停止执行savepoint的位置:

重启选择last-savepoint启动:

version: '2.1'
services:
streampark-console:
image: my_streampark:2.1.0
command: ${RUN_COMMAND}
ports:
- 10000:10000
env_file: .env
volumes:
- flink:/streampark/flink/${FLINK}
- /var/run/docker.sock:/var/run/docker.sock
- /etc/hosts:/etc/hosts:ro
- ~/.kube:/root/.kube:ro
privileged: true
restart: unless-stopped
jobmanager:
image: apache/flink:1.14.4-scala_2.12-java8
command: "jobmanager.sh start-foreground"
ports:
- 8081:8081
volumes:
- ./webUpDir:/usr/local/flink/upload
- ./webTepDir:/usr/local/flink/tmpdir
- ./conf:/opt/flink/conf
- ./tmp/flink-checkpoints-directory:/usr/local/flink/flink-checkpoints-directory
- ./tmp/flink-savepoints-directory:/usr/local/flink/flink-savepoints-directory
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
taskmanager:
image: apache/flink:1.14.4-scala_2.12-java8
depends_on:
- jobmanager
command: "taskmanager.sh start-foreground"
volumes:
- ./webUpDir:/usr/local/flink/upload
- ./webTepDir:/usr/local/flink/tmpdir
- ./conf:/opt/flink/conf
- ./tmp/flink-checkpoints-directory:/usr/local/flink/flink-checkpoints-directory
- ./tmp/flink-savepoints-directory:/usr/local/flink/flink-savepoints-directory
environment:
- JOB_MANAGER_RPC_ADDRESS=jobmanager
volumes:
flink:
flink-conf.yaml新增两个配置:
jobmanager.rpc.address: jobmanagerblob.server.port: 6124query.server.port: 6125state.backend: filesystemstate.checkpoints.dir: file:///tmp/flink-checkpoints-directorystate.savepoints.dir: file:///tmp/flink-savepoints-directoryheartbeat.interval: 1000heartbeat.timeout: 5000rest.flamegraph.enabled: trueweb.backpressure.refresh-interval: 10000classloader.resolve-order: parent-firsttaskmanager.memory.managed.fraction: 0.1taskmanager.memory.process.size: 2048mjobmanager.memory.process.size: 7072m# 新增两个配置web.upload.dir: /usr/local/flink/uploadweb.tmpdir: /usr/local/flink/tmpdir
两参数的官方位置
https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/config/

Flink standalone集群问题记录
https://blog.csdn.net/LeoGanlin/article/details/124692129

webTepDir:

webUpDir:


6.资料
链接:https://pan.baidu.com/s/1ajAAcjsMOxYR9-uQW0jzmw
提取码:c3nv
资料包内容:

部署文件夹:

7.streampark官方提供的最新的二进制试用包

试用版streampark二进制安装包:
apache-streampark 2.11: 链接:https://pan.baidu.com/s/1O_YSE-7Jqb4O2A3H9lHT3A 提取码:7cm6apache-streampark 2.12: 链接:https://pan.baidu.com/s/1pRqMXP1PbZcgSJ5Dt1g68A 提取码:ce00