通过十个问题助你彻底理解linux epoll工作原理( 二 )


多个进程关注同一个 epoll 实例,那么有事件发生后先唤醒谁?后唤醒谁?还是一起全唤醒?这涉及到一个称为“惊群效应”的问题 。
Question 3:什么是 epoll 惊群?答案:多个进程等待在 ep->wq 上,事件触发后所有进程都被唤醒,但只有其中 1 个进程能够成功继续执行的现象 。其他被白白唤起的进程等于做了无用功,可能会造成系统负载过高的问题 。下面这段代码能够直观感受什么是 epoll 惊群:
#include <sys/types.h>#include <sys/socket.h>#include <sys/epoll.h>#include <netdb.h>#include <string.h>#include <stdio.h>#include <unistd.h>#include <fcntl.h>#include <stdlib.h>#include <errno.h>#include <sys/wait.h>#define PROCESS_NUM 10static int create_and_bind (char *port){int fd = socket(PF_INET, SOCK_STREAM, 0);struct sockaddr_in serveraddr;serveraddr.sin_family = AF_INET;serveraddr.sin_addr.s_addr = htonl(INADDR_ANY);serveraddr.sin_port = htons(atoi(port));bind(fd, (struct sockaddr*)&serveraddr, sizeof(serveraddr));return fd;}static int make_socket_non_blocking (int sfd){int flags, s;flags = fcntl (sfd, F_GETFL, 0);if (flags == -1){perror ("fcntl");return -1;}flags |= O_NONBLOCK;s = fcntl (sfd, F_SETFL, flags);if (s == -1){perror ("fcntl");return -1;}return 0;}#define MAXEVENTS 64int main (int argc, char *argv[]){int sfd, s;int efd;struct epoll_event event;struct epoll_event *events;sfd = create_and_bind("8001");if (sfd == -1)abort ();s = make_socket_non_blocking (sfd);if (s == -1)abort ();s = listen(sfd, SOMAXCONN);if (s == -1){perror ("listen");abort ();}efd = epoll_create(MAXEVENTS);if (efd == -1){perror("epoll_create");abort();}event.data.fd = sfd;//event.events = EPOLLIN | EPOLLET;event.events = EPOLLIN;s = epoll_ctl(efd, EPOLL_CTL_ADD, sfd, &event);if (s == -1){perror("epoll_ctl");abort();}/* Buffer where events are returned */events = calloc(MAXEVENTS, sizeof event);int k;for(k = 0; k < PROCESS_NUM; k++){int pid = fork();if(pid == 0){/* The event loop */while (1){int n, i;n = epoll_wait(efd, events, MAXEVENTS, -1);printf("process %d return from epoll_wait!n", getpid());for (i = 0; i < n; i++){if ((events[i].events & EPOLLERR) || (events[i].events & EPOLLHUP) || (!(events[i].events & EPOLLIN))){/* An error has occured on this fd, or the socket is not ready for reading (why were we notified then?) */fprintf (stderr, "epoll errorn");close (events[i].data.fd);continue;}else if (sfd == events[i].data.fd){/* We have a notification on the listening socket, which means one or more incoming connections. */struct sockaddr in_addr;socklen_t in_len;int infd;char hbuf[NI_MAXHOST], sbuf[NI_MAXSERV];in_len = sizeof in_addr;infd = accept(sfd, &in_addr, &in_len);if (infd == -1){printf("process %d accept failed!n", getpid());break;}printf("process %d accept successed!n", getpid());/* Make the incoming socket non-blocking and add it to the list of fds to monitor. */close(infd);}}}}}int status;wait(&status);free (events);close (sfd);return EXIT_SUCCESS;}将服务端的监听 socket fd 加入到 epoll_wait 的监视集合中,这样当有客户端想要建立连接,就会事件触发 epoll_wait 返回 。此时如果 10 个进程同时在 epoll_wait 同一个 epoll 实例就出现了惊群效应 。所有 10 个进程都被唤起,但只有一个能成功 accept 。

通过十个问题助你彻底理解linux epoll工作原理

文章插图
 
为了解决 epoll 惊群,内核后续的高版本又提供了 EPOLLEXCLUSIVE 选项和 SO_REUSEPORT 选项,我个人理解两种解决方案思路上的不同点在于:EPOLLEXCLUSIVE 是在唤起进程阶段起作用,只唤起排在队列最前面的 1 个进程;而 SO_REUSEPORT 是在分配连接时起作用,相当于每个进程自己都有一个独立的 epoll 实例,内核来决策把连接分配给哪个 epoll 。
【文章福利】需要C/C++ Linux服务器架构师学习资料加群812855908(资料包括C/C++,Linux,golang技术,Nginx,ZeroMQ,MySQL,redis,fastdfs,MongoDB,ZK,流媒体,CDN,P2P,K8S,Docker,TCP/IP,协程,DPDK,ffmpeg等)
通过十个问题助你彻底理解linux epoll工作原理

文章插图
 
Question 4:ep->poll_wait 的作用是什么?答案:ep->poll_wait 是 epoll 实例中另一个等待队列 。当被监视的文件是一个 epoll 类型时,需要用这个等待队列来处理递归唤醒 。
在阅读内核代码过程中,ep->wq 还算挺好理解,但我发现伴随着 ep->wq 唤醒,还有一个 ep->poll_wait 的唤醒过程 。比如下面这段代码,在 eventpoll.c 中出现了很多次:
/* If the file is already "ready" we drop it inside the ready list */if ((revents & event->events) && !ep_is_linked(&epi->rdllink)) {list_add_tail(&epi->rdllink, &ep->rdllist);/* Notify waiting tasks that events are available */if (waitqueue_active(&ep->wq))wake_up_locked(&ep->wq);if (waitqueue_active(&ep->poll_wait))pwake++;}spin_unlock_irqrestore(&ep->lock, flags);atomic_long_inc(&ep->user->epoll_watches);/* We have to call this outside the lock */if (pwake)ep_poll_safewake(&ep->poll_wait);


推荐阅读