前几天,连续四天,让控制器晚上跑着,第二天早上去看,shell挂掉了,其他进程都还好好的,细细用PS命令看,发现一个进程后跟着一个"僵尸"的shell
......
125 685 S monitor
126 Z [sh]
......
在网上查了查僵尸进程的处理方法,
找到了两种:
1.忽略掉signal(SIGCHLD,SIG_IGN);
2.写个信号处理函数
- void check_ps()
- {
- char buf[1024];
- int ps_map[EXEC_FILES_NUM];
- int nptr=0;
- int i;
- FILE *bc;
- char str_cmd[200];
- int retu=0;
- char tmpchar[30]={0};
- char debug_message[128]={0};
-
- FILE *fp;
- char line_buffer[128]=
- { 0 };
- int i_len=0;
-
-
- char run_file[10][20];
- memset(run_file, 0, sizeof(run_file));
-
-
- memset(cSTAR, 0, sizeof(cSTAR));
-
-
- ..........
-
- bc= popen("ps", "r");
- if (!bc)
- {
- printf("WATCH::进程表打开错误...\n");
- }
-
- memset(buf, 0, sizeof(buf));
- while (readline(bc, buf))
- {
- for (i=0; i< i_len; i++)
- {
- retu=chuli_run_file(run_file[i],tmpchar);
- nptr = str_find(buf, tmpchar);
- if (nptr == 0)
- {
- if (ps_map[i]==0)
- {
- ps_map[i]=0;
- }
-
- }
- else
- {
- ps_map[i]=1;
- }
- }
-
- }
- memset(buf, 0, sizeof(buf));
-
- for (i=0; i< i_len; i++)
- {
- if (ps_map[i]==0)
- {
- memset(str_cmd, 0, sizeof(str_cmd));
- strcat(str_cmd, "exec ");
- strcat(str_cmd, run_file[i]);
- printf("WATCH::发现[%s]模块没有运行,重新启动该模块...\n", str_cmd);
-
- strcpy(cSTAR[iSTAR],str_cmd);
- iSTAR++;
- break;
- }
- }
-
- for (i=0; i< EXEC_FILES_NUM; i++)
- ps_map[i]=0;
- i_len=0;
- }
-
- void process_zombie()
- {
- pid_t pid;
- int stat;
- while((pid = waitpid(-1, &stat, WNOHANG)) > 0);
-
- return;
- }
-
- void init_sigaction2()
- {
- struct sigaction act;
- act.sa_handler=process_zombie;
- act.sa_flags=0;
- sigemptyset(&act.sa_mask);
- sigaction(SIGCHLD, &act, NULL);
- }
-
- int main(int argc, char *argv[])
- {
-
- int res01;
-
- pthread_t a_thread01;
-
- int fdd,cmp=0;
- char out[20]={0};
- ........
-
-
- init_sigaction2();
-
- _procid=getpid();
- cfg_InitSIGNINFO();
- InitShmInfo(1);
- init_wdog();
- init_sigaction();
- set_STIMER(10);
- CREATE_PUB_PIPE(FIFO_N);
- res01 = pthread_create(&a_thread01, NULL, thread_PIPE_RECV, FIFO_N);
- if (res01 != 0)
- {
- perror("Thread creation failed");
- }
-
- while (1)
- {
- int i=0;
- watchdog();
- if(iSTAR!=0)
- {
- for(i=0;i<iSTAR;i++)
- {
- system(cSTAR[i]);
- sleep(1);
- }
- iSTAR=0;
- }
- }
- return 0;
- }
重新编译好后,重新下载运行
果然,
126 Z [sh]
没有了
仔细想想,这里没有fork出什么子进程出来啊,怎么会有僵尸进程出来,好好看了看代码,初步怀疑到system函数上,对上述代码添加了写调试信息,发现每隔一段时间,就可以捕捉到一个SIGCHLD信号,与system函数的运行几率不符,下班回家好好查了查《UNIX环境高级编程》,才发现popen也fork出来了子进程,而且到了最后也没有调用pclose安全退出.基本怀疑问题在此,明天把pclose加上再试试
(yufangbo) |