rust/src/librustc/util/common.rs
Alex Crichton 511f0b8a3d std: Stabilize the std::hash module
This commit aims to prepare the `std::hash` module for alpha by formalizing its
current interface whileholding off on adding `#[stable]` to the new APIs.  The
current usage with the `HashMap` and `HashSet` types is also reconciled by
separating out composable parts of the design. The primary goal of this slight
redesign is to separate the concepts of a hasher's state from a hashing
algorithm itself.

The primary change of this commit is to separate the `Hasher` trait into a
`Hasher` and a `HashState` trait. Conceptually the old `Hasher` trait was
actually just a factory for various states, but hashing had very little control
over how these states were used. Additionally the old `Hasher` trait was
actually fairly unrelated to hashing.

This commit redesigns the existing `Hasher` trait to match what the notion of a
`Hasher` normally implies with the following definition:

    trait Hasher {
        type Output;
        fn reset(&mut self);
        fn finish(&self) -> Output;
    }

This `Hasher` trait emphasizes that hashing algorithms may produce outputs other
than a `u64`, so the output type is made generic. Other than that, however, very
little is assumed about a particular hasher. It is left up to implementors to
provide specific methods or trait implementations to feed data into a hasher.

The corresponding `Hash` trait becomes:

    trait Hash<H: Hasher> {
        fn hash(&self, &mut H);
    }

The old default of `SipState` was removed from this trait as it's not something
that we're willing to stabilize until the end of time, but the type parameter is
always required to implement `Hasher`. Note that the type parameter `H` remains
on the trait to enable multidispatch for specialization of hashing for
particular hashers.

Note that `Writer` is not mentioned in either of `Hash` or `Hasher`, it is
simply used as part `derive` and the implementations for all primitive types.

With these definitions, the old `Hasher` trait is realized as a new `HashState`
trait in the `collections::hash_state` module as an unstable addition for
now. The current definition looks like:

    trait HashState {
        type Hasher: Hasher;
        fn hasher(&self) -> Hasher;
    }

The purpose of this trait is to emphasize that the one piece of functionality
for implementors is that new instances of `Hasher` can be created.  This
conceptually represents the two keys from which more instances of a
`SipHasher` can be created, and a `HashState` is what's stored in a
`HashMap`, not a `Hasher`.

Implementors of custom hash algorithms should implement the `Hasher` trait, and
only hash algorithms intended for use in hash maps need to implement or worry
about the `HashState` trait.

The entire module and `HashState` infrastructure remains `#[unstable]` due to it
being recently redesigned, but some other stability decision made for the
`std::hash` module are:

* The `Writer` trait remains `#[experimental]` as it's intended to be replaced
  with an `io::Writer` (more details soon).
* The top-level `hash` function is `#[unstable]` as it is intended to be generic
  over the hashing algorithm instead of hardwired to `SipHasher`
* The inner `sip` module is now private as its one export, `SipHasher` is
  reexported in the `hash` module.

And finally, a few changes were made to the default parameters on `HashMap`.

* The `RandomSipHasher` default type parameter was renamed to `RandomState`.
  This renaming emphasizes that it is not a hasher, but rather just state to
  generate hashers. It also moves away from the name "sip" as it may not always
  be implemented as `SipHasher`. This type lives in the
  `std::collections::hash_map` module as `#[unstable]`

* The associated `Hasher` type of `RandomState` is creatively called...
  `Hasher`! This concrete structure lives next to `RandomState` as an
  implemenation of the "default hashing algorithm" used for a `HashMap`. Under
  the hood this is currently implemented as `SipHasher`, but it draws an
  explicit interface for now and allows us to modify the implementation over
  time if necessary.

There are many breaking changes outlined above, and as a result this commit is
a:

[breaking-change]
2015-01-07 12:18:08 -08:00

223 lines
6.5 KiB
Rust

// Copyright 2012-2014 The Rust Project Developers. See the COPYRIGHT
// file at the top-level directory of this distribution and at
// http://rust-lang.org/COPYRIGHT.
//
// Licensed under the Apache License, Version 2.0 <LICENSE-APACHE or
// http://www.apache.org/licenses/LICENSE-2.0> or the MIT license
// <LICENSE-MIT or http://opensource.org/licenses/MIT>, at your
// option. This file may not be copied, modified, or distributed
// except according to those terms.
#![allow(non_camel_case_types)]
use std::cell::{RefCell, Cell};
use std::collections::HashMap;
use std::fmt::Show;
use std::hash::{Hash, Hasher};
use std::iter::repeat;
use std::time::Duration;
use std::collections::hash_state::HashState;
use syntax::ast;
use syntax::visit;
use syntax::visit::Visitor;
// Useful type to use with `Result<>` indicate that an error has already
// been reported to the user, so no need to continue checking.
#[derive(Clone, Copy, Show)]
pub struct ErrorReported;
pub fn time<T, U, F>(do_it: bool, what: &str, u: U, f: F) -> T where
F: FnOnce(U) -> T,
{
thread_local!(static DEPTH: Cell<uint> = Cell::new(0));
if !do_it { return f(u); }
let old = DEPTH.with(|slot| {
let r = slot.get();
slot.set(r + 1);
r
});
let mut u = Some(u);
let mut rv = None;
let dur = {
let ref mut rvp = rv;
Duration::span(move || {
*rvp = Some(f(u.take().unwrap()))
})
};
let rv = rv.unwrap();
println!("{}time: {}.{:03} \t{}", repeat(" ").take(old).collect::<String>(),
dur.num_seconds(), dur.num_milliseconds() % 1000, what);
DEPTH.with(|slot| slot.set(old));
rv
}
pub fn indent<R, F>(op: F) -> R where
R: Show,
F: FnOnce() -> R,
{
// Use in conjunction with the log post-processor like `src/etc/indenter`
// to make debug output more readable.
debug!(">>");
let r = op();
debug!("<< (Result = {:?})", r);
r
}
pub struct Indenter {
_cannot_construct_outside_of_this_module: ()
}
impl Drop for Indenter {
fn drop(&mut self) { debug!("<<"); }
}
pub fn indenter() -> Indenter {
debug!(">>");
Indenter { _cannot_construct_outside_of_this_module: () }
}
struct LoopQueryVisitor<P> where P: FnMut(&ast::Expr_) -> bool {
p: P,
flag: bool,
}
impl<'v, P> Visitor<'v> for LoopQueryVisitor<P> where P: FnMut(&ast::Expr_) -> bool {
fn visit_expr(&mut self, e: &ast::Expr) {
self.flag |= (self.p)(&e.node);
match e.node {
// Skip inner loops, since a break in the inner loop isn't a
// break inside the outer loop
ast::ExprLoop(..) | ast::ExprWhile(..) | ast::ExprForLoop(..) => {}
_ => visit::walk_expr(self, e)
}
}
}
// Takes a predicate p, returns true iff p is true for any subexpressions
// of b -- skipping any inner loops (loop, while, loop_body)
pub fn loop_query<P>(b: &ast::Block, p: P) -> bool where P: FnMut(&ast::Expr_) -> bool {
let mut v = LoopQueryVisitor {
p: p,
flag: false,
};
visit::walk_block(&mut v, b);
return v.flag;
}
struct BlockQueryVisitor<P> where P: FnMut(&ast::Expr) -> bool {
p: P,
flag: bool,
}
impl<'v, P> Visitor<'v> for BlockQueryVisitor<P> where P: FnMut(&ast::Expr) -> bool {
fn visit_expr(&mut self, e: &ast::Expr) {
self.flag |= (self.p)(e);
visit::walk_expr(self, e)
}
}
// Takes a predicate p, returns true iff p is true for any subexpressions
// of b -- skipping any inner loops (loop, while, loop_body)
pub fn block_query<P>(b: &ast::Block, p: P) -> bool where P: FnMut(&ast::Expr) -> bool {
let mut v = BlockQueryVisitor {
p: p,
flag: false,
};
visit::walk_block(&mut v, &*b);
return v.flag;
}
/// K: Eq + Hash<S>, V, S, H: Hasher<S>
///
/// Determines whether there exists a path from `source` to `destination`. The graph is defined by
/// the `edges_map`, which maps from a node `S` to a list of its adjacent nodes `T`.
///
/// Efficiency note: This is implemented in an inefficient way because it is typically invoked on
/// very small graphs. If the graphs become larger, a more efficient graph representation and
/// algorithm would probably be advised.
pub fn can_reach<T, S>(edges_map: &HashMap<T, Vec<T>, S>, source: T,
destination: T) -> bool
where S: HashState,
<S as HashState>::Hasher: Hasher<Output=u64>,
T: Hash< <S as HashState>::Hasher> + Eq + Clone,
{
if source == destination {
return true;
}
// Do a little breadth-first-search here. The `queue` list
// doubles as a way to detect if we've seen a particular FR
// before. Note that we expect this graph to be an *extremely
// shallow* tree.
let mut queue = vec!(source);
let mut i = 0;
while i < queue.len() {
match edges_map.get(&queue[i]) {
Some(edges) => {
for target in edges.iter() {
if *target == destination {
return true;
}
if !queue.iter().any(|x| x == target) {
queue.push((*target).clone());
}
}
}
None => {}
}
i += 1;
}
return false;
}
/// Memoizes a one-argument closure using the given RefCell containing
/// a type implementing MutableMap to serve as a cache.
///
/// In the future the signature of this function is expected to be:
/// ```
/// pub fn memoized<T: Clone, U: Clone, M: MutableMap<T, U>>(
/// cache: &RefCell<M>,
/// f: &|&: T| -> U
/// ) -> impl |&: T| -> U {
/// ```
/// but currently it is not possible.
///
/// # Example
/// ```
/// struct Context {
/// cache: RefCell<HashMap<uint, uint>>
/// }
///
/// fn factorial(ctxt: &Context, n: uint) -> uint {
/// memoized(&ctxt.cache, n, |n| match n {
/// 0 | 1 => n,
/// _ => factorial(ctxt, n - 2) + factorial(ctxt, n - 1)
/// })
/// }
/// ```
#[inline(always)]
pub fn memoized<T, U, S, F>(cache: &RefCell<HashMap<T, U, S>>, arg: T, f: F) -> U
where T: Clone + Hash<<S as HashState>::Hasher> + Eq,
U: Clone,
S: HashState,
<S as HashState>::Hasher: Hasher<Output=u64>,
F: FnOnce(T) -> U,
{
let key = arg.clone();
let result = cache.borrow().get(&key).map(|result| result.clone());
match result {
Some(result) => result,
None => {
let result = f(arg);
cache.borrow_mut().insert(key, result.clone());
result
}
}
}