一、背景
线上服务监控告警某个服务接口大量失败,先访问出问题的机器,拿到异常堆栈,然后临时紧急重启后正常。
异常堆栈信息如下:[DUBBO] Thread pool is EXHAUSTED! Thread Name: DubboServerHandler-xxxx:20880, Pool Size: 200 (active: 200, core: 200, max: 200, largest: 200) ...
初步分析是Dubbo线程池满了导致的报错。
二、具体分析
Dubbo线程池满了,可能的原因有:
- 突发流量
- 阻塞
1、突发流量
首先查看服务数据监控,数据曲线正常,首先排除突发流量的原因。
2、阻塞
再通过jstack
工具获取Dubbo线程堆栈信息,可以看到大量线程阻塞在join()方法。
"DubboServerHandler-thread-15" #423 daemon prio=5 os_prio=0 tid=0x00007f0e101d7800 nid=0x4f5f waiting on condition [0x00007f0d30047000]
java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for <0x0000000730d4fb98> (a java.util.concurrent.ComletableFuture$Siganaller)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
at java.util.concurrent.CompletableFuture$Signaller.block(CompletableFuture.java:1693)
at java.util.concurrent.ForkJoinPool.managedBlock(ForkJoinPool.java:3323)
at java.util.concurrent.CompletableFuture.waitingGet(CompletableFuture.java:1729)
at java.util.concurrent.CompletableFuture.join(CompletableFuture.java:1934)
at com.yy.test.service.impl.TestServiceImpl.executeAfter(TestServiceImpl.java:386)
at com.yy.test.service.impl.TestServiceImpl.executeFirst(TestServiceImpl.java:104)
查看对应TestServiceImpl类代码,简化如下:
@Service
public class TestServiceImpl {
// 线程池
private ExecutorService executorService = Executors.newFixedThreadPool(8);
public void executeFirst() throws Exception {
// ...
executorService.execute(this.executeAfter());
// ...
}
public void executeAfter() throws Exception{
// ...
// 启动两个线程同时执行任务
CompletableFuture<String> future1 = CompletableFuture.supplyAsync(()->{...}, executorService);
CompletableFuture<String> future2 = CompletableFuture.supplyAsync(()->{...}, executorService);
// 同步等待线程执行完成并获取值
String result1 = future1.get();
String result2 = future2.get();
//...
}
}
当executeBefore()同时收到8个请求时,线程池的8个工作线程都被分配给executeBefore()去异步调用executeAfter(),这时候由于线程池工作线程已满,所以executeAfter()里的future调用只能阻塞,所以就形成了:
- executeBefore()持有线程,等待executeAfter()执行完释放线程
- executeAfter()由于没有空间的线程而阻塞,进而导致executeBefore()阻塞
3、结论
业务线程池的线程全都阻塞住,导致上层的Dubbo线程也阻塞住。
三、总结
问题:Dubbo线程池耗尽
排查:业务线程阻塞,导致上层的Dubbo线程阻塞
解决方案:做好线程池隔离,即不同使用场景尽量使用不同的线程池,避免父子任务共用一个线程池而导致互相阻塞的问题。相关优化代码如下
@Service
public class TestServiceImpl {
// 线程池
private ExecutorService executorServiceFirst = Executors.newFixedThreadPool(8);
private ExecutorService executorServiceAfter = Executors.newFixedThreadPool(8);
public void executeFirst() throws Exception {
// ...
executorServiceFirst.execute(this.executeAfter());
// ...
}
public void executeAfter() throws Exception{
// ...
// 启动两个线程同时执行任务
CompletableFuture<String> future1 = CompletableFuture.supplyAsync(()->{...}, executorServiceAfter);
CompletableFuture<String> future2 = CompletableFuture.supplyAsync(()->{...}, executorServiceAfter);
// 同步等待线程执行完成并获取值
String result1 = future1.get();
String result2 = future2.get();
//...
}
}