CategoryResourceRepost/极客时间专栏/网络编程实战/第四模块：实战篇/34 | 自己动手写高性能HTTP服务器（三）：TCP字节流处理和HTTP协议实现.md

<audio id="audio" title="34 | 自己动手写高性能HTTP服务器（三）：TCP字节流处理和HTTP协议实现" controls="" preload="none"><source id="mp3" src="https://static001.geekbang.org/resource/audio/f6/fb/f6ee76bc2b5a07fce8b463339e5a27fb.mp3"></audio>

你好，我是盛延敏，这里是网络编程实战第34讲，欢迎回来。

这一讲，我们延续第33讲的话题，继续解析高性能网络编程框架的字节流处理部分，并为网络编程框架增加HTTP相关的功能，在此基础上完成HTTP高性能服务器的编写。

## buffer对象

你肯定在各种语言、各种框架里面看到过不同的buffer对象，buffer，顾名思义，就是一个缓冲区对象，缓存了从套接字接收来的数据以及需要发往套接字的数据。

如果是从套接字接收来的数据，事件处理回调函数在不断地往buffer对象增加数据，同时，应用程序需要不断把buffer对象中的数据处理掉，这样，buffer对象才可以空出新的位置容纳更多的数据。

如果是发往套接字的数据，应用程序不断地往buffer对象增加数据，同时，事件处理回调函数不断调用套接字上的发送函数将数据发送出去，减少buffer对象中的写入数据。

可见，buffer对象是同时可以作为输入缓冲（input buffer）和输出缓冲（output buffer）两个方向使用的，只不过，在两种情形下，写入和读出的对象是有区别的。

这张图描述了buffer对象的设计。

<img src="https://static001.geekbang.org/resource/image/44/bb/44eaf37e860212a5c6c9e7f8dc2560bb.png" alt=""><br>
下面是buffer对象的数据结构。

```
//数据缓冲区
struct buffer {
    char *data;          //实际缓冲
    int readIndex;       //缓冲读取位置
    int writeIndex;      //缓冲写入位置
    int total_size;      //总大小
};

```

buffer对象中的writeIndex标识了当前可以写入的位置；readIndex标识了当前可以读出的数据位置，图中红色部分从readIndex到writeIndex的区域是需要读出数据的部分，而绿色部分从writeIndex到缓存的最尾端则是可以写出的部分。

随着时间的推移，当readIndex和writeIndex越来越靠近缓冲的尾端时，前面部分的front_space_size区域变得会很大，而这个区域的数据已经是旧数据，在这个时候，就需要调整一下整个buffer对象的结构，把红色部分往左侧移动，与此同时，绿色部分也会往左侧移动，整个缓冲区的可写部分就会变多了。

make_room函数就是起这个作用的，如果右边绿色的连续空间不足以容纳新的数据，而最左边灰色部分加上右边绿色部分一起可以容纳下新数据，就会触发这样的移动拷贝，最终红色部分占据了最左边，绿色部分占据了右边，右边绿色的部分成为一个连续的可写入空间，就可以容纳下新的数据。下面的一张图解释了这个过程。

<img src="https://static001.geekbang.org/resource/image/63/80/638e76a9f926065a72de9116192ef780.png" alt=""><br>
下面是make_room的具体实现。

```
void make_room(struct buffer *buffer, int size) {
    if (buffer_writeable_size(buffer) &gt;= size) {
        return;
    }
    //如果front_spare和writeable的大小加起来可以容纳数据，则把可读数据往前面拷贝
    if (buffer_front_spare_size(buffer) + buffer_writeable_size(buffer) &gt;= size) {
        int readable = buffer_readable_size(buffer);
        int i;
        for (i = 0; i &lt; readable; i++) {
            memcpy(buffer-&gt;data + i, buffer-&gt;data + buffer-&gt;readIndex + i, 1);
        }
        buffer-&gt;readIndex = 0;
        buffer-&gt;writeIndex = readable;
    } else {
        //扩大缓冲区
        void *tmp = realloc(buffer-&gt;data, buffer-&gt;total_size + size);
        if (tmp == NULL) {
            return;
        }
        buffer-&gt;data = tmp;
        buffer-&gt;total_size += size;
    }
}

```

当然，如果红色部分占据过大，可写部分不够，会触发缓冲区的扩大操作。这里我通过调用realloc函数来完成缓冲区的扩容。

下面这张图对此做了解释。

<img src="https://static001.geekbang.org/resource/image/9f/ba/9f66d628572b0ef5b7d9d5989c7a14ba.png" alt="">

## 套接字接收数据处理

套接字接收数据是在tcp_connection.c中的handle_read来完成的。在这个函数里，通过调用buffer_socket_read函数接收来自套接字的数据流，并将其缓冲到buffer对象中。之后你可以看到，我们将buffer对象和tcp_connection对象传递给应用程序真正的处理函数messageCallBack来进行报文的解析工作。这部分的样例在HTTP报文解析中会展开。

```
int handle_read(void *data) {
    struct tcp_connection *tcpConnection = (struct tcp_connection *) data;
    struct buffer *input_buffer = tcpConnection-&gt;input_buffer;
    struct channel *channel = tcpConnection-&gt;channel;

    if (buffer_socket_read(input_buffer, channel-&gt;fd) &gt; 0) {
        //应用程序真正读取Buffer里的数据
        if (tcpConnection-&gt;messageCallBack != NULL) {
            tcpConnection-&gt;messageCallBack(input_buffer, tcpConnection);
        }
    } else {
        handle_connection_closed(tcpConnection);
    }
}

```

在buffer_socket_read函数里，调用readv往两个缓冲区写入数据，一个是buffer对象，另外一个是这里的additional_buffer，之所以这样做，是担心buffer对象没办法容纳下来自套接字的数据流，而且也没有办法触发buffer对象的扩容操作。通过使用额外的缓冲，一旦判断出从套接字读取的数据超过了buffer对象里的实际最大可写大小，就可以触发buffer对象的扩容操作，这里buffer_append函数会调用前面介绍的make_room函数，完成buffer对象的扩容。

```
int buffer_socket_read(struct buffer *buffer, int fd) {
    char additional_buffer[INIT_BUFFER_SIZE];
    struct iovec vec[2];
    int max_writable = buffer_writeable_size(buffer);
    vec[0].iov_base = buffer-&gt;data + buffer-&gt;writeIndex;
    vec[0].iov_len = max_writable;
    vec[1].iov_base = additional_buffer;
    vec[1].iov_len = sizeof(additional_buffer);
    int result = readv(fd, vec, 2);
    if (result &lt; 0) {
        return -1;
    } else if (result &lt;= max_writable) {
        buffer-&gt;writeIndex += result;
    } else {
        buffer-&gt;writeIndex = buffer-&gt;total_size;
        buffer_append(buffer, additional_buffer, result - max_writable);
    }
    return result;
}

```

## 套接字发送数据处理

当应用程序需要往套接字发送数据时，即完成了read-decode-compute-encode过程后，通过往buffer对象里写入encode以后的数据，调用tcp_connection_send_buffer，将buffer里的数据通过套接字缓冲区发送出去。

```
int tcp_connection_send_buffer(struct tcp_connection *tcpConnection, struct buffer *buffer) {
    int size = buffer_readable_size(buffer);
    int result = tcp_connection_send_data(tcpConnection, buffer-&gt;data + buffer-&gt;readIndex, size);
    buffer-&gt;readIndex += size;
    return result;
}

```

如果发现当前channel没有注册WRITE事件，并且当前tcp_connection对应的发送缓冲无数据需要发送，就直接调用write函数将数据发送出去。如果这一次发送不完，就将剩余需要发送的数据拷贝到当前tcp_connection对应的发送缓冲区中，并向event_loop注册WRITE事件。这样数据就由框架接管，应用程序释放这部分数据。

```
//应用层调用入口
int tcp_connection_send_data(struct tcp_connection *tcpConnection, void *data, int size) {
    size_t nwrited = 0;
    size_t nleft = size;
    int fault = 0;

    struct channel *channel = tcpConnection-&gt;channel;
    struct buffer *output_buffer = tcpConnection-&gt;output_buffer;

    //先往套接字尝试发送数据
    if (!channel_write_event_registered(channel) &amp;&amp; buffer_readable_size(output_buffer) == 0) {
        nwrited = write(channel-&gt;fd, data, size);
        if (nwrited &gt;= 0) {
            nleft = nleft - nwrited;
        } else {
            nwrited = 0;
            if (errno != EWOULDBLOCK) {
                if (errno == EPIPE || errno == ECONNRESET) {
                    fault = 1;
                }
            }
        }
    }

    if (!fault &amp;&amp; nleft &gt; 0) {
        //拷贝到Buffer中，Buffer的数据由框架接管
        buffer_append(output_buffer, data + nwrited, nleft);
        if (!channel_write_event_registered(channel)) {
            channel_write_event_add(channel);
        }
    }

    return nwrited;
}

```

## HTTP协议实现

下面，我们在TCP的基础上，加入HTTP的功能。

为此，我们首先定义了一个http_server结构，这个http_server本质上就是一个TCPServer，只不过暴露给应用程序的回调函数更为简单，只需要看到http_request和http_response结构。

```
typedef int (*request_callback)(struct http_request *httpRequest, struct http_response *httpResponse);

struct http_server {
    struct TCPserver *tcpServer;
    request_callback requestCallback;
};

```

在http_server里面，重点是需要完成报文的解析，将解析的报文转化为http_request对象，这件事情是通过http_onMessage回调函数来完成的。在http_onMessage函数里，调用的是parse_http_request完成报文解析。

```
// buffer是框架构建好的，并且已经收到部分数据的情况下
// 注意这里可能没有收到全部数据，所以要处理数据不够的情形
int http_onMessage(struct buffer *input, struct tcp_connection *tcpConnection) {
    yolanda_msgx(&quot;get message from tcp connection %s&quot;, tcpConnection-&gt;name);

    struct http_request *httpRequest = (struct http_request *) tcpConnection-&gt;request;
    struct http_server *httpServer = (struct http_server *) tcpConnection-&gt;data;

    if (parse_http_request(input, httpRequest) == 0) {
        char *error_response = &quot;HTTP/1.1 400 Bad Request\r\n\r\n&quot;;
        tcp_connection_send_data(tcpConnection, error_response, sizeof(error_response));
        tcp_connection_shutdown(tcpConnection);
    }

    //处理完了所有的request数据，接下来进行编码和发送
    if (http_request_current_state(httpRequest) == REQUEST_DONE) {
        struct http_response *httpResponse = http_response_new();

        //httpServer暴露的requestCallback回调
        if (httpServer-&gt;requestCallback != NULL) {
            httpServer-&gt;requestCallback(httpRequest, httpResponse);
        }

        //将httpResponse发送到套接字发送缓冲区中
        struct buffer *buffer = buffer_new();
        http_response_encode_buffer(httpResponse, buffer);
        tcp_connection_send_buffer(tcpConnection, buffer);

        if (http_request_close_connection(httpRequest)) {
            tcp_connection_shutdown(tcpConnection);
            http_request_reset(httpRequest);
        }
    }
}

```

还记得[第16讲中](https://time.geekbang.org/column/article/132443)讲到的HTTP协议吗？我们从16讲得知，HTTP通过设置回车符、换行符作为HTTP报文协议的边界。

<img src="https://static001.geekbang.org/resource/image/6d/5a/6d91c7c2a0224f5d4bad32a0f488765a.png" alt=""><br>
parse_http_request的思路就是寻找报文的边界，同时记录下当前解析工作所处的状态。根据解析工作的前后顺序，把报文解析的工作分成REQUEST_STATUS、REQUEST_HEADERS、REQUEST_BODY和REQUEST_DONE四个阶段，每个阶段解析的方法各有不同。

在解析状态行时，先通过定位CRLF回车换行符的位置来圈定状态行，进入状态行解析时，再次通过查找空格字符来作为分隔边界。

在解析头部设置时，也是先通过定位CRLF回车换行符的位置来圈定一组key-value对，再通过查找冒号字符来作为分隔边界。

最后，如果没有找到冒号字符，说明解析头部的工作完成。

parse_http_request函数完成了HTTP报文解析的四个阶段:

```
int parse_http_request(struct buffer *input, struct http_request *httpRequest) {
    int ok = 1;
    while (httpRequest-&gt;current_state != REQUEST_DONE) {
        if (httpRequest-&gt;current_state == REQUEST_STATUS) {
            char *crlf = buffer_find_CRLF(input);
            if (crlf) {
                int request_line_size = process_status_line(input-&gt;data + input-&gt;readIndex, crlf, httpRequest);
                if (request_line_size) {
                    input-&gt;readIndex += request_line_size;  // request line size
                    input-&gt;readIndex += 2;  //CRLF size
                    httpRequest-&gt;current_state = REQUEST_HEADERS;
                }
            }
        } else if (httpRequest-&gt;current_state == REQUEST_HEADERS) {
            char *crlf = buffer_find_CRLF(input);
            if (crlf) {
                /**
                 *    &lt;start&gt;-------&lt;colon&gt;:-------&lt;crlf&gt;
                 */
                char *start = input-&gt;data + input-&gt;readIndex;
                int request_line_size = crlf - start;
                char *colon = memmem(start, request_line_size, &quot;: &quot;, 2);
                if (colon != NULL) {
                    char *key = malloc(colon - start + 1);
                    strncpy(key, start, colon - start);
                    key[colon - start] = '\0';
                    char *value = malloc(crlf - colon - 2 + 1);
                    strncpy(value, colon + 1, crlf - colon - 2);
                    value[crlf - colon - 2] = '\0';

                    http_request_add_header(httpRequest, key, value);

                    input-&gt;readIndex += request_line_size;  //request line size
                    input-&gt;readIndex += 2;  //CRLF size
                } else {
                    //读到这里说明:没找到，就说明这个是最后一行
                    input-&gt;readIndex += 2;  //CRLF size
                    httpRequest-&gt;current_state = REQUEST_DONE;
                }
            }
        }
    }
    return ok;
}

```

处理完了所有的request数据，接下来进行编码和发送的工作。为此，创建了一个http_response对象，并调用了应用程序提供的编码函数requestCallback，接下来，创建了一个buffer对象，函数http_response_encode_buffer用来将http_response中的数据，根据HTTP协议转换为对应的字节流。

可以看到，http_response_encode_buffer设置了如Content-Length等http_response头部，以及http_response的body部分数据。

```
void http_response_encode_buffer(struct http_response *httpResponse, struct buffer *output) {
    char buf[32];
    snprintf(buf, sizeof buf, &quot;HTTP/1.1 %d &quot;, httpResponse-&gt;statusCode);
    buffer_append_string(output, buf);
    buffer_append_string(output, httpResponse-&gt;statusMessage);
    buffer_append_string(output, &quot;\r\n&quot;);

    if (httpResponse-&gt;keep_connected) {
        buffer_append_string(output, &quot;Connection: close\r\n&quot;);
    } else {
        snprintf(buf, sizeof buf, &quot;Content-Length: %zd\r\n&quot;, strlen(httpResponse-&gt;body));
        buffer_append_string(output, buf);
        buffer_append_string(output, &quot;Connection: Keep-Alive\r\n&quot;);
    }

    if (httpResponse-&gt;response_headers != NULL &amp;&amp; httpResponse-&gt;response_headers_number &gt; 0) {
        for (int i = 0; i &lt; httpResponse-&gt;response_headers_number; i++) {
            buffer_append_string(output, httpResponse-&gt;response_headers[i].key);
            buffer_append_string(output, &quot;: &quot;);
            buffer_append_string(output, httpResponse-&gt;response_headers[i].value);
            buffer_append_string(output, &quot;\r\n&quot;);
        }
    }

    buffer_append_string(output, &quot;\r\n&quot;);
    buffer_append_string(output, httpResponse-&gt;body);
}

```

## 完整的HTTP服务器例子

现在，编写一个HTTP服务器例子就变得非常简单。

在这个例子中，最主要的部分是onRequest callback函数，这里，onRequest方法已经在parse_http_request之后，可以根据不同的http_request的信息，进行计算和处理。例子程序里的逻辑非常简单，根据http request的URL path，返回了不同的http_response类型。比如，当请求为根目录时，返回的是200和HTML格式。

```
#include &lt;lib/acceptor.h&gt;
#include &lt;lib/http_server.h&gt;
#include &quot;lib/common.h&quot;
#include &quot;lib/event_loop.h&quot;

//数据读到buffer之后的callback
int onRequest(struct http_request *httpRequest, struct http_response *httpResponse) {
    char *url = httpRequest-&gt;url;
    char *question = memmem(url, strlen(url), &quot;?&quot;, 1);
    char *path = NULL;
    if (question != NULL) {
        path = malloc(question - url);
        strncpy(path, url, question - url);
    } else {
        path = malloc(strlen(url));
        strncpy(path, url, strlen(url));
    }

    if (strcmp(path, &quot;/&quot;) == 0) {
        httpResponse-&gt;statusCode = OK;
        httpResponse-&gt;statusMessage = &quot;OK&quot;;
        httpResponse-&gt;contentType = &quot;text/html&quot;;
        httpResponse-&gt;body = &quot;&lt;html&gt;&lt;head&gt;&lt;title&gt;This is network programming&lt;/title&gt;&lt;/head&gt;&lt;body&gt;&lt;h1&gt;Hello, network programming&lt;/h1&gt;&lt;/body&gt;&lt;/html&gt;&quot;;
    } else if (strcmp(path, &quot;/network&quot;) == 0) {
        httpResponse-&gt;statusCode = OK;
        httpResponse-&gt;statusMessage = &quot;OK&quot;;
        httpResponse-&gt;contentType = &quot;text/plain&quot;;
        httpResponse-&gt;body = &quot;hello, network programming&quot;;
    } else {
        httpResponse-&gt;statusCode = NotFound;
        httpResponse-&gt;statusMessage = &quot;Not Found&quot;;
        httpResponse-&gt;keep_connected = 1;
    }

    return 0;
}


int main(int c, char **v) {
    //主线程event_loop
    struct event_loop *eventLoop = event_loop_init();

    //初始tcp_server，可以指定线程数目，如果线程是0，就是在这个线程里acceptor+i/o；如果是1，有一个I/O线程
    //tcp_server自己带一个event_loop
    struct http_server *httpServer = http_server_new(eventLoop, SERV_PORT, onRequest, 2);
    http_server_start(httpServer);

    // main thread for acceptor
    event_loop_run(eventLoop);
}

```

运行这个程序之后，我们可以通过浏览器和curl命令来访问它。你可以同时开启多个浏览器和curl命令，这也证明了我们的程序是可以满足高并发需求的。

```
$curl -v http://127.0.0.1:43211/
*   Trying 127.0.0.1...
* TCP_NODELAY set
* Connected to 127.0.0.1 (127.0.0.1) port 43211 (#0)
&gt; GET / HTTP/1.1
&gt; Host: 127.0.0.1:43211
&gt; User-Agent: curl/7.54.0
&gt; Accept: */*
&gt;
&lt; HTTP/1.1 200 OK
&lt; Content-Length: 116
&lt; Connection: Keep-Alive
&lt;
* Connection #0 to host 127.0.0.1 left intact
&lt;html&gt;&lt;head&gt;&lt;title&gt;This is network programming&lt;/title&gt;&lt;/head&gt;&lt;body&gt;&lt;h1&gt;Hello, network programming&lt;/h1&gt;&lt;/body&gt;&lt;/html&gt;%

```

<img src="https://static001.geekbang.org/resource/image/71/a5/719804f279f057a9a12b5904a39e06a5.png" alt="">

## 总结

这一讲我们主要讲述了整个编程框架的字节流处理能力，引入了buffer对象，并在此基础上通过增加HTTP的特性，包括http_server、http_request、http_response，完成了HTTP高性能服务器的编写。实例程序利用框架提供的能力，编写了一个简单的HTTP服务器程序。

## 思考题

和往常一样，给你布置两道思考题：

第一道， 你可以试着在HTTP服务器中增加MIME的处理能力，当用户请求/photo路径时，返回一张图片。

第二道，在我们的开发中，已经有很多面向对象的设计，你可以仔细研读代码，说说你对这部分的理解。

欢迎你在评论区写下你的思考，也欢迎把这篇文章分享给你的朋友或者同事，一起交流一下。