2022年11月21日13:26:46

一、微服务架构下的问题

在大型系统的微服务化架构中，一个系统会被拆分成许多模块。这些模块负责不同的功能，组合成系统，最终可以提供丰富的功能。在这种架构中，一次请求往往需要涉及多个服务。互联网应用构建在不同的软件模块集上，这些软件模块，有可能是由不同的团队开发、可能使用不同的编程语言来实现、有可能部署在了几千台服务器上，横跨多个不同的数据中心，也就意味着这种架构形式也会存在一些问题：

如何快速发现问题？
如何判断故障影响范围？
如何梳理服务依赖以及依赖的合理性？
如何分析链路性能问题以及实现容量规划？

分布式链路追踪（Distributed Tracing），就是将一次分布式请求还原成调用链路，进行日志记录，性能监控，并将一次分布式请求的调用情况集中展示。比如各个服务节点上的耗时、请求具体到达哪台机器上、每个服务节点的请求状态等等。

目前业界比较流行的链路追踪系统如：Twitter 的 Zipkin，阿里的鹰眼，美团的 Mtrace，大众点评的 cat 等，大部分都是基于 Google 发表的 Dapper。Dapper 阐述了分布式系统，特别是微服务架构中链路追踪的概念、数据表示、埋点、传递、收集、存储与展示等技术细节。

二、Sleuth 概述

2.1 简介

SpringCloud Sleuth 主要功能就是在分布式系统中提供追踪的解决方案，并且兼容支持了 Zipkin，只需要在 pom 文件中引入相应的依赖即可。

2.2 相关概念

SpringCloud Sleuth 为 SpringCloud 提供了分布式根据的解决方案。它大量借用了 Google Dapper 的设计。先来了解一下 Sleuth 中的术语和相关概念。

SpringCloud Sleuth 采用的是 Google 的开源项目 Dapper 的专业术语。

Span：基本工作单元，例如，在一个新建的 span 中发送一个 RPC 等同于发送一个回应请求给 RPC，span 通过一个 64 位 ID 唯一标识，trace 以另一个 64 位 ID 表示，span 还有其他数据信息，比如摘要、时间戳事件、关键值注释（tags）、span 的 ID、以及进度 ID（通常是 IP 地址）
Trace：一系列 spans 组成的一个树状结构，例如，如果你正在跑一个分布式大数据工程，你可能需要创建一个 trace
Annotation：用来及时记录一个事件的存在，一些核心 annotations 用来定义一个请求的开始和结束。
cs - Client Sent，客户端发起一个请求，这个 annotation 描述了这个 span 的开始
sr - Server Received -服务端获得请求并准备开始处理它，如果将其sr减去cs时间戳便可得到网络延迟
ss - Server Sent -注解表明请求处理的完成(当请求返回客户端)，如果ss减去sr时间戳便可得到服务端需要的处理请求时间
cr - Client Received -表明span的结束，客户端成功接收到服务端的回复，如果cr减去cs时间戳便可得到客户端从服务端获取回复的所有所需时间

SpringCloud Sleuth 分布式请求链路跟踪
span表示调用链路来源，通俗的理解span就是一次请求信息
trace 可以理解为整个调用链路。

三、Zipkin 概述

Zipkin 是 Twitter 的一个开源项目，它基于 Google Dapper 实现，致力于收集服务的定时数据，以解决微服务架构中的延迟问题，包括数据的收集、存储、查找和展现。我们可以使用它来收集各个服务器上请求链路的跟踪数据，并通过它提供的 REST API 接口来辅助我们查询跟踪数据，以实现对分布式系统的监控程序，从而及时地发现系统中出现的延迟升高问题，并找出系统性能瓶颈的根源。除了面向开发的 API 接口之外，它也提供了方便的 UI 组件来帮助我们直观的搜索跟踪信息和分析请求链路明细，比如：可以查询某段时间内向各用户请求的处理时间等。Zipkin 提供了可插拔数据存储方式：In-Memory、MySQL、Cassandra 以及 Elasticsearch。
SpringCloud Sleuth 分布式请求链路跟踪
上图展示了 Zipkin 的基础架构，它主要由 4 个核心组件构成：

Collector：收集器组件，它主要用于处理从外部系统发送过来的跟踪信息，将这些信息转换为 Zipkin 内部处理的 Span 格式，以支持后续的存储、分析、展示等功能。
Storage：存储组件，它主要对处理收集器接收到的跟踪信息，默认会将这些信息存储在内存中，我们也可以修改此存储策略，铜鼓哦使用其他存储组件将跟踪信息存储到数据库中。
RESTful API：API 组件，它主要用来提供外部访问接口。比如给客户端展示跟踪信息，或是外界系统访问以实现监控等。
Web UI：UI 组件，基于 API 组件实现的上层应用。通过 UI 最贱用户可以方便而又直观地查询和分析跟踪信息。

Zipkin 分为两端，一个是 Zipkin 服务端，一个是 Zipkin 客户端，客户端也就是微服务的应用。
客户端会配置服务端的 URL 地址，一旦发生服务间的调用的时候，会被配置在微服务里面的 Sleuth 的监听器监听，并生成相应的 Trace 和 Span 信息发送给服务端。
发送的方式主要有两种，一种是 HTTP 报文的方式，还有一种是消息总线的方式，如 RabbitMQ。

不论哪种方式，我们都需要：

一个 Eureka 服务注册中心
一个 Zipkin 服务端
多个微服务，这些微服务中配置 Zipkin 客户端

四、Zipkin Server 的部署和配置

4.1 Zipkin Server 下载

从 SpringBoot 2.0 开始，官方就不再支持使用自建 Zipkin Server 的方式进行服务链路追踪，而是直接提供了编译好的 jar 包来给我们使用。可以从官网上下载 Zipkin 的 Web UI，下载地址：https://dl.bintray.com/openzipkin/maven/io/zipkin/java/zipkin-server/，我们这里下载的是zipkin-server-2.12.9-exec.jar

4.2 启动

在命令行输入java -jar zipkin-server-2.12.9-exec.jar 启动 Zipkin Server。
SpringCloud Sleuth 分布式请求链路跟踪

默认 Zipkin Server 的请求端口为 9411
Zipkin Server 的启动参数可以通过官方提供的yml 配置文件查找
在浏览器输入http://127.0.0.1:9411 即可进入到 Zipkin Sever 的管理后台

五、客户端 Zipkin + Sleuth 整合

5.1 客户端 payment8001 以及 order80 添加依赖

<dependency><groupId>org.springframework.cloud</groupId><artifactId>spring-cloud-starter-zipkin</artifactId></dependency>

5.2 修改客户端的配置文件

修改 8001 及 80 服务的 application.yml，增加 zipkin 以及 sleuth 相关配置。

spring:zipkin:base-url: http://127.0.0.1:9411# zipkin server 的请求地址sender:type: web# 请求方式，默认以 http 的方式向 Zipkin server 发送追踪数据sleuth:sampler:probability:1.0# 采样的百分比

全部配置如下：
8001：

server:port:8001spring:application:name: cloud-payment-servicezipkin:base-url: http://127.0.0.1:9411# zipkin server 的请求地址sender:type: web# 请求方式，默认以 http 的方式向 Zipkin server 发送追踪数据sleuth:sampler:probability:1.0# 采样的百分比datasource:type: com.alibaba.druid.pool.DruidDataSource# 当前数据源操作类型driver-class-name: com.mysql.cj.jdbc.Driver# mysql 驱动包url: jdbc:mysql://localhost:3306/db2019?useUnicode=true&characterEncoding=utf-8&useSSL=falseusername: rootpassword: rootmybatis:mapper-locations: classpath:mapper/*.xmltype-aliases-package: com.atguigu.springcloud.entities# 所有 Entity 别名类所在包eureka:client:register-with-eureka:true#表示是否将自己注册进EurekaServer，默认为truefetch-registry:true# 是否从 EurekaServer 抓取已有的注册信息，默认为true。单节点无所谓，集群必须设置为true，才能配合ribbon使用负载均衡service-url:defaultZone: http://localhost:7001/eurekainstance:instance-id: payment8001prefer-ip-address:truelease-renewal-interval-in-seconds:1# 客户端向服务端发送心跳的时间间隔，单位为秒（默认是30秒）lease-expiration-duration-in-seconds:2# 服务端在收到最后一次心跳后等待时间上限，单位为秒（默认是90秒），超时将剔除

80：

server:port:80spring:application:name: cloud-order-servicezipkin:base-url: http://127.0.0.1:9411# zipkin server 的请求地址sender:type: web# 请求方式，默认以 http 的方式向 Zipkin server 发送追踪数据sleuth:sampler:probability:1.0# 采样的百分比eureka:client:register-with-eureka:truefetch-registry:trueservice-url:defaultZone: http://localhost:7001/eurekainstance:instance-id: order80prefer-ip-address:true

通过spring.zipkin.base-url 来指定 Zipkin server 的地址
通过spring.sleuth.sampler.probability 来指定需采样的百分比，默认为 0.1，即 10%，这里配置 1，表示记录全部的 Sleuth 信息，是为了收集到更多的数据（仅供测试用）。在分布式系统中，过于频繁的采用会影响系统性能，所以这里配置需要采用一个合适的值。

5.3 测试

依次启动eureka7001/8001/80，浏览器访问：http://127.0.0.1/consumer/payment/get/31，打开 Zipkin service 控制台。
SpringCloud Sleuth 分布式请求链路跟踪
单击该 trace 可以看到请求的细节：

六、存储跟踪数据

Zipkin Sever 默认是将追踪数据信息保存到内存，这种方式不适合生产环境。因为一旦 Service 关闭重启或者服务崩溃，就会导致历史数据消失。Zipkin 支持将追踪数据持久化到 MySQL 数据库或者存储到 elasticsearch 中。这里以 MySQL 为例。

6.1 准备数据库

可以从官网找到 Zipkin Server 持久化到 MySQL 的数据库脚本。

createdatabaseifnotexists zipkin;use zipkin;CREATETABLEIFNOTEXISTS zipkin_spans(`trace_id_high`BIGINTNOTNULLDEFAULT0COMMENT'If non zero, this means the trace uses 128 bit traceIds instead of 64 bit',`trace_id`BIGINTNOTNULL,`id`BIGINTNOTNULL,`name`VARCHAR(255)NOTNULL,`parent_id`BIGINT,`debug`BIT(1),`start_ts`BIGINTCOMMENT'Span.timestamp(): epoch micros used for endTs query and to implement TTL',`duration`BIGINTCOMMENT'Span.duration(): micros used for minDuration and maxDuration query')ENGINE=InnoDB ROW_FORMAT=COMPRESSEDCHARACTERSET=utf8COLLATE utf8_general_ci;ALTERTABLE zipkin_spansADDUNIQUEKEY(`trace_id_high`,`trace_id`,`id`)COMMENT'ignore insert on duplicate';ALTERTABLE zipkin_spansADDINDEX(`trace_id_high`,`trace_id`,`id`)COMMENT'for joining with zipkin_annotations';ALTERTABLE zipkin_spansADDINDEX(`trace_id_high`,`trace_id`)COMMENT'for 
getTracesByIds';ALTERTABLE zipkin_spansADDINDEX(`name`)COMMENT'for getTraces and 
getSpanNames';ALTERTABLE zipkin_spansADDINDEX(`start_ts`)COMMENT'for getTraces 
ordering and range';CREATETABLEIFNOTEXISTS zipkin_annotations(`trace_id_high`BIGINTNOTNULLDEFAULT0COMMENT'If non zero, this means 
the trace uses 128 bit traceIds instead of 64 bit',`trace_id`BIGINTNOTNULLCOMMENT'coincides with zipkin_spans.trace_id',`span_id`BIGINTNOTNULLCOMMENT'coincides with zipkin_spans.id',`a_key`VARCHAR(255)NOTNULLCOMMENT'BinaryAnnotation.key or 
Annotation.value if type == -1',`a_value`BLOBCOMMENT'BinaryAnnotation.value(), which must be smaller 
than 64KB',`a_type`INTNOTNULLCOMMENT'BinaryAnnotation.type() or -1 if 
Annotation',`a_timestamp`BIGINTCOMMENT'Used to implement TTL; Annotation.timestamp 
or zipkin_spans.timestamp',`endpoint_ipv4`INTCOMMENT'Null when Binary/Annotation.endpoint is 
null',`endpoint_ipv6`BINARY(16)COMMENT'Null when Binary/Annotation.endpoint 
is null, or no IPv6 address',`endpoint_port`SMALLINTCOMMENT'Null when Binary/Annotation.endpoint is 
null',`endpoint_service_name`VARCHAR(255)COMMENT'Null when 
Binary/Annotation.endpoint is null')ENGINE=InnoDB ROW_FORMAT=COMPRESSEDCHARACTERSET=utf8COLLATE 
utf8_general_ci;ALTERTABLE zipkin_annotationsADDUNIQUEKEY(`trace_id_high`,`trace_id`,`span_id`,`a_key`,`a_timestamp`)COMMENT'Ignore insert on duplicate';ALTERTABLE zipkin_annotationsADDINDEX(`trace_id_high`,`trace_id`,`span_id`)COMMENT'for joining with zipkin_spans';ALTERTABLE zipkin_annotationsADDINDEX(`trace_id_high`,`trace_id`)COMMENT'for getTraces/ByIds';ALTERTABLE zipkin_annotationsADDINDEX(`endpoint_service_name`)COMMENT'for getTraces and getServiceNames';ALTERTABLE zipkin_annotationsADDINDEX(`a_type`)COMMENT'for getTraces';ALTERTABLE zipkin_annotationsADDINDEX(`a_key`)COMMENT'for getTraces';ALTERTABLE zipkin_annotationsADDINDEX(`trace_id`,`span_id`,`a_key`)COMMENT'for dependencies job';CREATETABLEIFNOTEXISTS zipkin_dependencies(`day`DATENOTNULL,`parent`VARCHAR(255)NOTNULL,`child`VARCHAR(255)NOTNULL,`call_count`BIGINT)ENGINE=InnoDB ROW_FORMAT=COMPRESSEDCHARACTERSET=utf8COLLATE 
utf8_general_ci;ALTERTABLE zipkin_dependenciesADDUNIQUEKEY(`day`,`parent`,`child`);

6.2 配置启动服务端

java-jar zipkin-server-2.12.9-exec.jar--STORAGE_TYPE=mysql--
MYSQL_HOST=127.0.0.1--MYSQL_TCP_PORT=3306--MYSQL_DB=zipkin--MYSQL_USER=root--MYSQL_PASS=111111

STORAGE_TYPE : 存储类型
MYSQL_HOST： mysql主机地址
MYSQL_TCP_PORT：mysql端口
MYSQL_DB： mysql数据库名称
MYSQL_USER：mysql用户名
MYSQL_PASS ：mysql密码

配置好服务端之后，可以在浏览器请求几次。回到数据库查看会发现数据已经持久化到mysql中
SpringCloud Sleuth 分布式请求链路跟踪

七、基于消息中间件收集数据

在默认情况下，Zipkin 客户端和 Server 之间是使用 HTTP 请求的方式进行通信（即同步的请求方式），在网络波动、Server 端异常等情况下可能存在信息收集不及时的问题。Zipkin 支持与 RabbitMQ 整合，完成异步消息传输。

加了 MQ 之后，通信过程如下如所示：
SpringCloud Sleuth 分布式请求链路跟踪

7.1 RabbitMQ 的安装与启动

略

7.2 服务端启动

java-jar zipkin-server-2.12.9-exec.jar--RABBIT_ADDRESSES=127.0.0.1:5672

RABBIT_ADDRESSES ：指定RabbitMQ地址
RABBIT_USER：用户名（默认guest）
RABBIT_PASSWORD ：密码（默认guest）

启动Zipkin Server之后，我们打开RabbitMQ的控制台可以看到多了一个Queue
SpringCloud Sleuth 分布式请求链路跟踪
其中 zipkin 就是为我们自动创建的 Queue 队列。

7.3 客户端配置

7.3.1 配置依赖

<dependency><groupId>org.springframework.amqp</groupId><artifactId>spring-rabbit</artifactId></dependency><dependency><groupId>org.springframework.cloud</groupId><artifactId>spring-cloud-starter-zipkin</artifactId></dependency>

导入 spring-rabbit 依赖，是 Spring 提供的对 Rabbit 的封装，客户端会根据配置自动的生产消息并发送到目标队列中。

7.3.2 配置消息中间件 RabbitMQ 地址等信息

spring:zipkin:base-url: http://127.0.0.1:9411# zipkin server 的请求地址sender:type: rabbit#      type: web  # 请求方式，默认以 http 的方式向 Zipkin server 发送追踪数据sleuth:sampler:probability:1.0# 采样的百分比rabbitmq:host: localhostport:5672username: guestpassword: guestlistener:# 这里配置了重试策略direct:retry:enabled:truesimple:retry:enabled:true

修改消息的投递方式，改为 rabbit 即可。
添加 RabbitMQ 的相关配置。

7.3.3 创建配置类

packagecom.atguigu.springcloud.config;importcom.rabbitmq.client.ConnectionFactory;importorg.springframework.beans.factory.annotation.Autowired;importorg.springframework.boot.autoconfigure.amqp.RabbitProperties;importorg.springframework.context.annotation.Bean;importorg.springframework.context.annotation.Configuration;importzipkin2.reporter.Sender;importzipkin2.reporter.amqp.RabbitMQSender;@ConfigurationpublicclassZipkinConfig{@AutowiredprivateRabbitProperties rabbitProperties;@BeanSenderrabbitSender2(){ConnectionFactory connectionFactory=newConnectionFactory();
        connectionFactory.setHost(rabbitProperties.getHost());
        connectionFactory.setPort(rabbitProperties.getPort());
        connectionFactory.setPassword(rabbitProperties.getPassword());
        connectionFactory.setUsername(rabbitProperties.getUsername());returnRabbitMQSender.newBuilder().connectionFactory(connectionFactory).queue("zipkin").addresses(rabbitProperties.getHost()+":"+rabbitProperties.getPort()).build();}}

如果不添加此配置类，服务 8001 和 80 在启动时会报如下错误：

Description:Parameter2 of method reporter inorg.springframework.cloud.sleuth.zipkin2.ZipkinAutoConfiguration required a bean of type 'zipkin2.reporter.Sender' that could not befound.

The following candidates were found but could not be injected:-Bean method 'kafkaSender' in 'ZipkinKafkaSenderConfiguration' not loaded because@ConditionalOnClass did not find requiredclass 'org.apache.kafka.common.serialization.ByteArraySerializer'-Bean method 'rabbitSender' in 'ZipkinRabbitSenderConfiguration' not loaded because@ConditionalOnBean(types:org.springframework.amqp.rabbit.connection.CachingConnectionFactory;SearchStrategy: all) did not find any beans of typeorg.springframework.amqp.rabbit.connection.CachingConnectionFactory-Bean method 'restTemplateSender' in 'ZipkinRestTemplateSenderConfiguration' not loaded becauseZipkinSenderorg.springframework.cloud.sleuth.zipkin2.sender.ZipkinRestTemplateSenderConfiguration rabbit sender type

注意以上修改，服务 8001 和 80 均需要修改。

7.4 测试

关闭Zipkin Server，并随意请求连接。打开rabbitmq管理后台可以看到，消息已经推送到rabbitmq。当Zipkin Server启动时，会自动的从rabbitmq获取消息并消费，展示追踪数据
SpringCloud Sleuth 分布式请求链路跟踪
可以看到如下效果：

请求的耗时时间不会出现突然耗时特长的情况
当ZipkinServer不可用时（比如关闭、网络不通等），追踪信息不会丢失，因为这些信息会保存在 Rabbitmq 服务器上，直到Zipkin服务器可用时，再从Rabbitmq中取出这段时间的信息