博客
关于我
强烈建议你试试无所不能的chatGPT,快点击我
php进程导致服务器cpu 100问题追查过程
阅读量:6786 次
发布时间:2019-06-26

本文共 6116 字,大约阅读时间需要 20 分钟。

前段时间,出现了一次服务器cpu 占用100的问题。以下为追查原因的过程。仅当抛砖引玉,欢迎拍砖。
查看占用cpu高的进程

想找出占用cpu高的进程,用top命令就可以搞定。

$top.....此处省略n多行...10434 root     20   0  509m 174m 1528 R 99.7  0.5   8:42.43 php                                                                                                         5638 root     20   0  509m 174m 1528 R 99.1  0.5   9:12.35 php                                                                                                        16390 root     20   0  541m 182m 5244 R 98.4  0.6   8:40.92 php

此时,轻轻按下C键。就会看到如下信息。

10434  root     20   0  540m 182m 5244 R 101.0  0.6   9:57.39 php /home/gearman_manager/pecl-manager.php -P /home/gearman_manager/manager.pid -u root -d -v5638 root     20   0  509m 174m 1528 R 101.0  0.5  10:15.28 php /home/gearman_manager/pecl-manager.php -P /home/gearman_manager/manager.pid -u root -d -v16390 root     20   0  509m 174m 1528 R 99.5  0.5   9:41.18 php /home/gearman_manager/pecl-manager.php -P /home/gearman_manager/manager.pid -u root -d -vv

找出进程占用cpu高的原因

进程占用cpu高,一般是由于进程长时间占用cpu,又没有主动释放占用。如果想主动释放cpu,可以调用sleep。在写程序的时候,尤其要注意while 等循环的地方。

找出php进程在执行那段代码

$sudo gdb -p 10434(gdb) print (char *)executor_globals.active_op_array->filename$13 = 0x2924118 "/home/gearman_manager/pecl-manager.php"(gdb) print executor_globals->current_execute_data->opline->lineno$14 = 55(gdb) cContinuing.^CProgram received signal SIGINT, Interrupt.0x00000031d32306d0 in sigprocmask () from /lib64/libc.so.6(gdb) print executor_globals->current_execute_data->opline->lineno$15 = 71(gdb) cContinuing.^CProgram received signal SIGINT, Interrupt.0x00000000006250e1 in zend_hash_find ()(gdb) print executor_globals->current_execute_data->opline->lineno$16 = 53

如果对上面的命令有疑问,可以查看 []

根据上面的信息,我们可以知道,cpu高时,正在执行/home/gearman_manager/pecl-manager.php文件。并且正在执行53和71行附近的代码。

分析相关代码

php的相关代码如下:

while(!$this->stop_work){ //此处是第51行             if(@$thisWorker->work() ||               $thisWorker->returnCode() == GEARMAN_IO_WAIT ||               $thisWorker->returnCode() == GEARMAN_NOT_CONNECTED ||               $thisWorker->returnCode() == GEARMAN_NO_JOBS) {                                 if ($thisWorker->returnCode() == GEARMAN_SUCCESS) continue;                 if (!@$thisWorker->wait()){                    if ($thisWorker->returnCode() == GEARMAN_NO_ACTIVE_FDS){                        sleep(5);                    }                }             }             /**             * Check the running time of the current child. If it has             * been too long, stop working.             */            //此处是第71行            if($this->max_run_time > 0 && time() - $start > $this->max_run_time) {                $this->log("Been running too long, exiting", GearmanManager::LOG_LEVEL_WORKER_INFO);                $this->stop_work = true;            }             if(!empty($this->config["max_runs_per_worker"]) && $this->job_execution_count >= $this->config["max_runs_per_worker"]) {                $this->log("Ran $this->job_execution_count jobs which is over the maximum({$this->config['max_runs_per_worker']}), exiting", GearmanManager::LOG_LEVEL_WORKER_INFO);                $this->stop_work = true;            }         }

看来作者已经考虑到某些情况可能导致cpu 100,因此在代码中使用了sleep方法。现在php进程占用cpu 100,看来是没执行sleep(5)。如果$thisWorker->work() 和 $thisWorker->returnCode() 方法中没有IO等操作,能释放cpu的占用,又没有调用sleep的情况下,很容易导致进程占用cpu 100。

根据代码我们可以得出如下几个结论:
- 肯定没执行slepp(5)
- $thisWorker->work() 和 $thisWorker->returnCode() 方法中没有IO等操作,能释放cpu的占用。

进一步跟踪代码中可疑部分

分析 $thisWorker->returnCode() 的返回值。如果很容易复现cpu 100的话,你可以在程序中echo $thisWorker->returnCode() 来获得返回值。但是现在cpu 100的问题,复现比较麻烦。所以,还是使用gdb 来搞吧。

$gdb -p 10434(gdb)b zif_gearman_worker_return_code Breakpoint 1, zif_gearman_worker_return_code (ht=0, return_value=0x2a1b648, return_value_ptr=0x0, this_ptr=0x2a14090, return_value_used=1)    at /home/php_src/pecl/gearman-1.1.1/php_gearman.c:31863186    in /home/php_src/pecl/gearman-1.1.1/php_gearman.c(gdb) p return_value$3 = (zval *) 0x2a1b648(gdb) p *$3$4 = {value = {lval = 0, dval = 0, str = {val = 0x0, len = 0}, ht = 0x0, obj = {handle = 0, handlers = 0x0}}, refcount__gc = 1, type = 0 '\0', is_ref__gc = 0 '\0'}(gdb) s3189    in /home/php_src/pecl/gearman-1.1.1/php_gearman.c(gdb) p *$3$5 = {value = {lval = 0, dval = 0, str = {val = 0x0, len = 0}, ht = 0x0, obj = {handle = 0, handlers = 0x0}}, refcount__gc = 1, type = 0 '\0', is_ref__gc = 0 '\0'}(gdb) s3186    in /home/php_src/pecl/gearman-1.1.1/php_gearman.c(gdb) p *$3$6 = {value = {lval = 0, dval = 0, str = {val = 0x0, len = 0}, ht = 0x0, obj = {handle = 0, handlers = 0x0}}, refcount__gc = 1, type = 0 '\0', is_ref__gc = 0 '\0'}(gdb) s3189    in /home/php_src/pecl/gearman-1.1.1/php_gearman.c(gdb) p *$3$7 = {value = {lval = 0, dval = 0, str = {val = 0x0, len = 0}, ht = 0x0, obj = {handle = 0, handlers = 0x0}}, refcount__gc = 1, type = 0 '\0', is_ref__gc = 0 '\0'}(gdb) s3190    in /home/php_src/pecl/gearman-1.1.1/php_gearman.c(gdb) p *$3$8 = {value = {lval = 0, dval = 0, str = {val = 0x0, len = 0}, ht = 0x0, obj = {handle = 0, handlers = 0x0}}, refcount__gc = 1, type = 0 '\0', is_ref__gc = 0 '\0'}(gdb) s3191    in /home/php_src/pecl/gearman-1.1.1/php_gearman.c(gdb) p *$3$9 = {value = {lval = 25, dval = 1.2351641146031164e-322, str = {val = 0x19 
, len = 0}, ht = 0x19, obj = {handle = 25, handlers = 0x0}}, refcount__gc = 1, type = 1 '\001', is_ref__gc = 0 '\0'}
从最后的lval = 25 和 type =1 我们可以看出 returnCode()方法的最后返回值是 25 。
根据文档,返回值25 对应的是 GEARMAN_NOT_CONNECTED。
## 解决问题
既然出问题时, returnCode()方法的最后返回值是 25,而现在程序中对这种情况又没进行出来,导致了cpu 100。那我们只要在出现这种情况时,sleep几秒就ok了。

if(@$thisWorker->work() ||               $thisWorker->returnCode() == GEARMAN_IO_WAIT ||               $thisWorker->returnCode() == GEARMAN_NOT_CONNECTED ||               $thisWorker->returnCode() == GEARMAN_NO_JOBS) {                //下面一行是新增的                if ($thisWorker->returnCode() == GEARMAN_NOT_CONNECTED) sleep(5);                if ($thisWorker->returnCode() == GEARMAN_SUCCESS) continue;                 if (!@$thisWorker->wait()){                    if ($thisWorker->returnCode() == GEARMAN_NO_ACTIVE_FDS){                        sleep(5);                    }                }             }

转载地址:http://nucgo.baihongyu.com/

你可能感兴趣的文章
MySQL基础【MySQL运维实践】
查看>>
人工智能教程001:什么是人工智能以及相关知识要求
查看>>
30Mysql 的配置
查看>>
关于摄影的技巧,摄影爱好者们都好好学习吧
查看>>
Mac tips - 隐藏窗口及恢复
查看>>
dvbbs论坛的安装
查看>>
linux管道
查看>>
Apache web目录修改
查看>>
Android存储子系统
查看>>
kafka学习笔记五
查看>>
CentOS 6 VNC详细配置
查看>>
html返回json响应键盘?
查看>>
Ubuntu SSH Algorithm negotiation failed
查看>>
使用Custom.pll修改标准Form的LOV
查看>>
Redis 主从复制
查看>>
lucene的使用详解
查看>>
java每日小算法(14)
查看>>
Docker 命令行和后台参数
查看>>
英雄王座的数据库表格部分成功转换
查看>>
AbstractQueuedSynchronizer的介绍和原理分析
查看>>