优雅的关闭和清理

列表 21-20 中的代码通过使用线程池异步响应请求,正如我们所期望的那样。我们得到了一些关于 workersidthread 字段的警告,这些字段我们没有直接使用,这提醒我们没有清理任何内容。当我们使用不太优雅的 ctrl-c 方法来停止主线程时,所有其他线程也会立即停止,即使它们正在处理请求。

接下来,我们将实现Drop trait,以便在池中的每个线程上调用join,使它们能够在关闭之前完成正在处理的请求。然后,我们将实现一种方法来通知线程它们应该停止接受新请求并关闭。为了演示这段代码,我们将修改我们的服务器,使其仅接受两个请求后优雅地关闭其线程池。

我们需要注意的一点是:这不会影响处理执行闭包的代码部分,因此即使我们使用线程池为异步运行时,这里的一切也会完全相同。

ThreadPool上实现Drop特性

让我们从在我们的线程池上实现Drop开始。当池被丢弃时,我们的线程应该全部加入以确保它们完成工作。 清单 21-22 显示了Drop实现的第一次尝试;这段代码还不能完全工作。

Filename: src/lib.rs
use std::{ sync::{Arc, Mutex, mpsc}, thread, }; pub struct ThreadPool { workers: Vec<Worker>, sender: mpsc::Sender<Job>, } type Job = Box<dyn FnOnce() + Send + 'static>; impl ThreadPool { /// Create a new ThreadPool. /// /// The size is the number of threads in the pool. /// /// # Panics /// /// The `new` function will panic if the size is zero. pub fn new(size: usize) -> ThreadPool { assert!(size > 0); let (sender, receiver) = mpsc::channel(); let receiver = Arc::new(Mutex::new(receiver)); let mut workers = Vec::with_capacity(size); for id in 0..size { workers.push(Worker::new(id, Arc::clone(&receiver))); } ThreadPool { workers, sender } } pub fn execute<F>(&self, f: F) where F: FnOnce() + Send + 'static, { let job = Box::new(f); self.sender.send(job).unwrap(); } } impl Drop for ThreadPool { fn drop(&mut self) { for worker in &mut self.workers { println!("Shutting down worker {}", worker.id); worker.thread.join().unwrap(); } } } struct Worker { id: usize, thread: thread::JoinHandle<()>, } impl Worker { fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker { let thread = thread::spawn(move || { loop { let job = receiver.lock().unwrap().recv().unwrap(); println!("Worker {id} got a job; executing."); job(); } }); Worker { id, thread } } }
Listing 21-22: Joining each thread when the thread pool goes out of scope

首先,我们遍历线程池中的每个 workers。我们使用 &mut,因为 self 是一个可变引用,同时我们也需要能够修改 worker。对于每个工作线程,我们打印一条消息,说明这个特定的 Worker 实例正在关闭,然后我们调用该 Worker 实例线程的 join 方法。如果 join 调用失败,我们使用 unwrap 使 Rust 发生恐慌并进入非优雅的关闭状态。

当我们编译这段代码时,会出现以下错误:

$ cargo check Checking hello v0.1.0 (file:///projects/hello) error[E0507]: cannot move out of `worker.thread` which is behind a mutable reference --> src/lib.rs:52:13 | 52 | worker.thread.join().unwrap(); | ^^^^^^^^^^^^^ ------ `worker.thread` moved due to this method call | | | move occurs because `worker.thread` has type `JoinHandle<()>`, which does not implement the `Copy` trait | note: `JoinHandle::<T>::join` takes ownership of the receiver `self`, which moves `worker.thread` --> file:///home/.rustup/toolchains/1.85/lib/rustlib/src/rust/library/std/src/thread/mod.rs:1876:17 | 1876 | pub fn join(self) -> Result<T> { | ^^^^ For more information about this error, try `rustc --explain E0507`. error: could not compile `hello` (lib) due to 1 previous error

错误告诉我们,我们不能调用join,因为我们只有每个worker的可变借用,而join需要拥有其参数。为了解决这个问题,我们需要将线程从拥有threadWorker实例中移出,以便join可以消耗线程。一种方法是采用我们在清单18-15中使用的方法。如果Worker持有一个Option<thread::JoinHandle<()>>,我们可以在Option上调用take方法,将值从Some变体中移出,并在其位置留下一个None变体。换句话说,正在运行的Worker将在thread中有一个Some变体,而当我们想要清理一个Worker时,我们会用None替换Some,这样Worker就不会有线程可以运行。

然而,这只有在丢弃Worker时才会出现。作为交换,我们在访问worker.thread的任何地方都必须处理一个Option<thread::JoinHandle<()>>。惯用的Rust代码大量使用Option,但当你发现自己将某个已知始终存在的东西包装在Option中作为变通方法时,最好寻找替代方法。它们可以使你的代码更干净,更不容易出错。

在这种情况下,存在一个更好的替代方案:Vec::drain 方法。它接受一个范围参数来指定要从 Vec 中移除的项,并返回这些项的迭代器。传递 .. 范围语法将移除 Vec 中的 每个 值。

所以我们需要更新 ThreadPooldrop 实现如下:

Filename: src/lib.rs
#![allow(unused)] fn main() { use std::{ sync::{Arc, Mutex, mpsc}, thread, }; pub struct ThreadPool { workers: Vec<Worker>, sender: mpsc::Sender<Job>, } type Job = Box<dyn FnOnce() + Send + 'static>; impl ThreadPool { /// Create a new ThreadPool. /// /// The size is the number of threads in the pool. /// /// # Panics /// /// The `new` function will panic if the size is zero. pub fn new(size: usize) -> ThreadPool { assert!(size > 0); let (sender, receiver) = mpsc::channel(); let receiver = Arc::new(Mutex::new(receiver)); let mut workers = Vec::with_capacity(size); for id in 0..size { workers.push(Worker::new(id, Arc::clone(&receiver))); } ThreadPool { workers, sender } } pub fn execute<F>(&self, f: F) where F: FnOnce() + Send + 'static, { let job = Box::new(f); self.sender.send(job).unwrap(); } } impl Drop for ThreadPool { fn drop(&mut self) { for worker in self.workers.drain(..) { println!("Shutting down worker {}", worker.id); worker.thread.join().unwrap(); } } } struct Worker { id: usize, thread: thread::JoinHandle<()>, } impl Worker { fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker { let thread = thread::spawn(move || { loop { let job = receiver.lock().unwrap().recv().unwrap(); println!("Worker {id} got a job; executing."); job(); } }); Worker { id, thread } } } }

这解决了编译器错误,并且不需要对我们的代码进行任何其他更改。

向线程发出停止监听任务的信号

随着我们所做的所有更改,我们的代码编译时没有任何警告。 然而,坏消息是这段代码还没有按我们希望的方式工作。关键在于由Worker实例的线程运行的闭包中的逻辑:目前,我们调用join,但这不会关闭线程,因为它们会无限循环地寻找任务。如果我们尝试使用当前实现的drop来删除我们的ThreadPool,主线程将永远阻塞,等待第一个线程完成。

为了解决这个问题,我们需要更改 ThreadPooldrop 实现,然后更改 Worker 循环。

首先我们将更改 ThreadPooldrop 实现,以在等待线程完成之前显式地释放 sender。列表 21-23 显示了对 ThreadPool 的更改,以显式地释放 sender。与线程不同,这里我们 确实 需要使用 Option,以便能够使用 Option::takesenderThreadPool 中移出。

Filename: src/lib.rs
use std::{ sync::{Arc, Mutex, mpsc}, thread, }; pub struct ThreadPool { workers: Vec<Worker>, sender: Option<mpsc::Sender<Job>>, } // --snip-- type Job = Box<dyn FnOnce() + Send + 'static>; impl ThreadPool { /// Create a new ThreadPool. /// /// The size is the number of threads in the pool. /// /// # Panics /// /// The `new` function will panic if the size is zero. pub fn new(size: usize) -> ThreadPool { // --snip-- assert!(size > 0); let (sender, receiver) = mpsc::channel(); let receiver = Arc::new(Mutex::new(receiver)); let mut workers = Vec::with_capacity(size); for id in 0..size { workers.push(Worker::new(id, Arc::clone(&receiver))); } ThreadPool { workers, sender: Some(sender), } } pub fn execute<F>(&self, f: F) where F: FnOnce() + Send + 'static, { let job = Box::new(f); self.sender.as_ref().unwrap().send(job).unwrap(); } } impl Drop for ThreadPool { fn drop(&mut self) { drop(self.sender.take()); for worker in self.workers.drain(..) { println!("Shutting down worker {}", worker.id); worker.thread.join().unwrap(); } } } struct Worker { id: usize, thread: thread::JoinHandle<()>, } impl Worker { fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker { let thread = thread::spawn(move || { loop { let job = receiver.lock().unwrap().recv().unwrap(); println!("Worker {id} got a job; executing."); job(); } }); Worker { id, thread } } }
Listing 21-23: Explicitly drop sender before joining the Worker threads

释放 sender 会关闭通道,这表示不会再发送更多消息。当这种情况发生时,Worker 实例在无限循环中对 recv 的所有调用将返回错误。在清单 21-24 中,我们更改了 Worker 循环,以便在这种情况下优雅地退出循环,这意味着当 ThreadPooldrop 实现调用 join 时,线程将结束。

Filename: src/lib.rs
use std::{ sync::{Arc, Mutex, mpsc}, thread, }; pub struct ThreadPool { workers: Vec<Worker>, sender: Option<mpsc::Sender<Job>>, } type Job = Box<dyn FnOnce() + Send + 'static>; impl ThreadPool { /// Create a new ThreadPool. /// /// The size is the number of threads in the pool. /// /// # Panics /// /// The `new` function will panic if the size is zero. pub fn new(size: usize) -> ThreadPool { assert!(size > 0); let (sender, receiver) = mpsc::channel(); let receiver = Arc::new(Mutex::new(receiver)); let mut workers = Vec::with_capacity(size); for id in 0..size { workers.push(Worker::new(id, Arc::clone(&receiver))); } ThreadPool { workers, sender: Some(sender), } } pub fn execute<F>(&self, f: F) where F: FnOnce() + Send + 'static, { let job = Box::new(f); self.sender.as_ref().unwrap().send(job).unwrap(); } } impl Drop for ThreadPool { fn drop(&mut self) { drop(self.sender.take()); for worker in self.workers.drain(..) { println!("Shutting down worker {}", worker.id); worker.thread.join().unwrap(); } } } struct Worker { id: usize, thread: thread::JoinHandle<()>, } impl Worker { fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker { let thread = thread::spawn(move || { loop { let message = receiver.lock().unwrap().recv(); match message { Ok(job) => { println!("Worker {id} got a job; executing."); job(); } Err(_) => { println!("Worker {id} disconnected; shutting down."); break; } } } }); Worker { id, thread } } }
Listing 21-24: Explicitly breaking out of the loop when recv returns an error

要查看此代码的运行情况,让我们修改 main 以仅接受两个请求,然后优雅地关闭服务器,如示例 21-25 所示。

Filename: src/main.rs
use hello::ThreadPool; use std::{ fs, io::{BufReader, prelude::*}, net::{TcpListener, TcpStream}, thread, time::Duration, }; fn main() { let listener = TcpListener::bind("127.0.0.1:7878").unwrap(); let pool = ThreadPool::new(4); for stream in listener.incoming().take(2) { let stream = stream.unwrap(); pool.execute(|| { handle_connection(stream); }); } println!("Shutting down."); } fn handle_connection(mut stream: TcpStream) { let buf_reader = BufReader::new(&stream); let request_line = buf_reader.lines().next().unwrap().unwrap(); let (status_line, filename) = match &request_line[..] { "GET / HTTP/1.1" => ("HTTP/1.1 200 OK", "hello.html"), "GET /sleep HTTP/1.1" => { thread::sleep(Duration::from_secs(5)); ("HTTP/1.1 200 OK", "hello.html") } _ => ("HTTP/1.1 404 NOT FOUND", "404.html"), }; let contents = fs::read_to_string(filename).unwrap(); let length = contents.len(); let response = format!("{status_line}\r\nContent-Length: {length}\r\n\r\n{contents}"); stream.write_all(response.as_bytes()).unwrap(); }
Listing 21-25: Shutting down the server after serving two requests by exiting the loop

你不会希望一个现实世界的Web服务器在仅服务了两个请求后就关闭。这段代码只是证明了优雅的关闭和清理机制是有效的。

take 方法在 Iterator 特性中定义,将迭代限制为最多前两个项目。ThreadPool 将在 main 结束时超出作用域,drop 实现将运行。

使用 cargo run 启动服务器,并发出三个请求。第三个请求应该出错,并且在您的终端中应看到类似的输出:

$ cargo run Compiling hello v0.1.0 (file:///projects/hello) Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.41s Running `target/debug/hello` Worker 0 got a job; executing. Shutting down. Shutting down worker 0 Worker 3 got a job; executing. Worker 1 disconnected; shutting down. Worker 2 disconnected; shutting down. Worker 3 disconnected; shutting down. Worker 0 disconnected; shutting down. Shutting down worker 1 Shutting down worker 2 Shutting down worker 3

你可能会看到不同的 Worker ID 和消息打印顺序。我们可以通过这些消息了解代码的工作原理:Worker 实例 0 和 3 获得了前两个请求。服务器在第二个连接后停止接受连接,而 ThreadPool 上的 Drop 实现甚至在 Worker 3 开始其任务之前就开始执行。释放 sender 会断开所有 Worker 实例的连接并告诉它们关闭。每个 Worker 实例在断开连接时都会打印一条消息,然后线程池调用 join 以等待每个 Worker 线程完成。

注意这个特定执行的一个有趣方面:ThreadPool 丢弃了 sender,并且在任何 Worker 收到错误之前,我们尝试加入 Worker 0。Worker 0 尚未从 recv 收到错误,因此主线程阻塞等待 Worker 0 完成。在此期间,Worker 3 收到了一个任务,然后所有线程都收到了错误。当 Worker 0 完成时,主线程等待其余的 Worker 实例完成。那时,它们都已经退出了循环并停止。

恭喜!我们现在完成了项目;我们有一个使用线程池异步响应的基本Web服务器。我们能够执行服务器的优雅关闭,清理池中的所有线程。

以下是完整的代码供参考:

Filename: src/main.rs
use hello::ThreadPool; use std::{ fs, io::{BufReader, prelude::*}, net::{TcpListener, TcpStream}, thread, time::Duration, }; fn main() { let listener = TcpListener::bind("127.0.0.1:7878").unwrap(); let pool = ThreadPool::new(4); for stream in listener.incoming().take(2) { let stream = stream.unwrap(); pool.execute(|| { handle_connection(stream); }); } println!("Shutting down."); } fn handle_connection(mut stream: TcpStream) { let buf_reader = BufReader::new(&stream); let request_line = buf_reader.lines().next().unwrap().unwrap(); let (status_line, filename) = match &request_line[..] { "GET / HTTP/1.1" => ("HTTP/1.1 200 OK", "hello.html"), "GET /sleep HTTP/1.1" => { thread::sleep(Duration::from_secs(5)); ("HTTP/1.1 200 OK", "hello.html") } _ => ("HTTP/1.1 404 NOT FOUND", "404.html"), }; let contents = fs::read_to_string(filename).unwrap(); let length = contents.len(); let response = format!("{status_line}\r\nContent-Length: {length}\r\n\r\n{contents}"); stream.write_all(response.as_bytes()).unwrap(); }
Filename: src/lib.rs
use std::{ sync::{Arc, Mutex, mpsc}, thread, }; pub struct ThreadPool { workers: Vec<Worker>, sender: Option<mpsc::Sender<Job>>, } type Job = Box<dyn FnOnce() + Send + 'static>; impl ThreadPool { /// Create a new ThreadPool. /// /// The size is the number of threads in the pool. /// /// # Panics /// /// The `new` function will panic if the size is zero. pub fn new(size: usize) -> ThreadPool { assert!(size > 0); let (sender, receiver) = mpsc::channel(); let receiver = Arc::new(Mutex::new(receiver)); let mut workers = Vec::with_capacity(size); for id in 0..size { workers.push(Worker::new(id, Arc::clone(&receiver))); } ThreadPool { workers, sender: Some(sender), } } pub fn execute<F>(&self, f: F) where F: FnOnce() + Send + 'static, { let job = Box::new(f); self.sender.as_ref().unwrap().send(job).unwrap(); } } impl Drop for ThreadPool { fn drop(&mut self) { drop(self.sender.take()); for worker in &mut self.workers { println!("Shutting down worker {}", worker.id); if let Some(thread) = worker.thread.take() { thread.join().unwrap(); } } } } struct Worker { id: usize, thread: Option<thread::JoinHandle<()>>, } impl Worker { fn new(id: usize, receiver: Arc<Mutex<mpsc::Receiver<Job>>>) -> Worker { let thread = thread::spawn(move || { loop { let message = receiver.lock().unwrap().recv(); match message { Ok(job) => { println!("Worker {id} got a job; executing."); job(); } Err(_) => { println!("Worker {id} disconnected; shutting down."); break; } } } }); Worker { id, thread: Some(thread), } } }

我们还可以做得更多!如果您想继续增强这个项目,这里有一些想法:

  • ThreadPool 及其公共方法添加更多文档。
  • 添加测试以检验库的功能。
  • unwrap 的调用更改为更健壮的错误处理。
  • 使用 ThreadPool 执行除处理网页请求之外的某些任务。
  • crates.io上找到一个线程池库,并使用该库实现一个类似的Web服务器。然后将其API和健壮性与我们实现的线程池进行比较。

摘要

干得好!你已经读到了这本书的最后!我们要感谢你加入我们这次的Rust之旅。你现在可以开始实现自己的Rust项目,并帮助其他人的项目。请记住,有一个欢迎你的Rustaceans社区,他们很乐意在你的Rust之旅中遇到的任何挑战时帮助你。