petgraph is a rust library which allows you to work with graph data structures in rust.
:dep petgraph = "0.4.13"
:dep petgraph-evcxr = "*"
extern crate petgraph;
use petgraph::graph::Graph;
use petgraph::dot::Dot;
use petgraph_evcxr::draw_graph;
I guess I'll start by specifying what kind of graphs we're talking about. We're talking about the datastructure type of graph, not the 'chart' type of graph. For the chart type of graph try the plotters library.
:dep plotters = { git = "https://github.com/38/plotters", default_features = false, features = ["evcxr"] }
extern crate plotters;
use plotters::prelude::*;
let figure = evcxr_figure((640, 480), |root| {
root.fill(&WHITE);
let mut chart = ChartBuilder::on(&root)
.caption("Not this type of graph", ("Arial", 50).into_font())
.margin(5)
.x_label_area_size(30)
.y_label_area_size(30)
.build_ranged(-1f32..1f32, -0.1f32..1f32)?;
chart.configure_mesh().draw()?;
chart.draw_series(LineSeries::new(
(-50..=50).map(|x| x as f32 / 50.0).map(|x| (x, x * x)),
&RED,
)).unwrap()
.label("y = x^2")
.legend(|(x,y)| PathElement::new(vec![(x,y), (x + 20,y)], &RED));
chart.configure_series_labels()
.background_style(&WHITE.mix(0.8))
.border_style(&BLACK)
.draw()?;
Ok(())
});
figure
This is an example of a graph. You can create a graph in petgraph by initializing a Graph
struct and adding some nodes and edges to it.
let mut g : Graph<&str, &str> = Graph::new();
let a = g.add_node("Node (vertex): a");
let b = g.add_node("Node (vertex): b");
g.add_edge(a, b, "edge from a → b");
draw_graph(&g);
Graphs are universal datastructures. Every data structure can be represented as a graph. Here are some examples of data structures that can be represented as graphs.
A variable can be represented, from a theoretical standpoint, as a singleton graph.
let mut singleton : Graph<&str, &str, petgraph::Undirected> = Graph::new_undirected();
let singleton_node = singleton.add_node("only value");
draw_graph(&singleton);
let mut list : Graph<&str, &str, petgraph::Undirected> = Graph::new_undirected();
let item1 = list.add_node("a");
let item2 = list.add_node("b");
let item3 = list.add_node("c");
list.add_edge(item1, item2, "");
list.add_edge(item2, item3, "");
draw_graph(&list);
let mut table : Graph<&str, &str, petgraph::Undirected> = Graph::new_undirected();
let cellA1 = table.add_node("A1");
let cellA2 = table.add_node("A2");
let cellA3 = table.add_node("A3");
let cellB1 = table.add_node("B1");
let cellB2 = table.add_node("B2");
let cellB3 = table.add_node("B3");
let cellC1 = table.add_node("C1");
let cellC2 = table.add_node("C2");
let cellC3 = table.add_node("C3");
// Columns
table.add_edge(cellA1, cellA2, "");
table.add_edge(cellA2, cellA3, "");
table.add_edge(cellB1, cellB2, "");
table.add_edge(cellB2, cellB3, "");
table.add_edge(cellC1, cellC2, "");
table.add_edge(cellC2, cellC3, "");
// Rows
table.add_edge(cellA1, cellB1, "");
table.add_edge(cellB1, cellC1, "");
table.add_edge(cellA2, cellB2, "");
table.add_edge(cellB2, cellC2, "");
table.add_edge(cellA3, cellB3, "");
table.add_edge(cellB3, cellC3, "");
draw_graph(&table);
let mut tree : Graph<&str, &str, petgraph::Directed> = Graph::new();
let tree_item1 = tree.add_node("a");
let tree_item2 = tree.add_node("b");
let tree_item3 = tree.add_node("c");
let tree_item4 = tree.add_node("d");
let tree_item5 = tree.add_node("e");
tree.add_edge(tree_item1, tree_item2, "");
tree.add_edge(tree_item1, tree_item3, "");
tree.add_edge(tree_item2, tree_item4, "");
tree.add_edge(tree_item2, tree_item5, "");
draw_graph(&tree);
let mut ring : Graph<&str, &str> = Graph::new();
let ring_item1 = ring.add_node("a");
let ring_item2 = ring.add_node("b");
let ring_item3 = ring.add_node("c");
let ring_item4 = ring.add_node("d");
ring.add_edge(ring_item1, ring_item2, "");
ring.add_edge(ring_item2, ring_item3, "");
ring.add_edge(ring_item3, ring_item4, "");
ring.add_edge(ring_item4, ring_item1, "");
draw_graph(&ring);
let mut dict : Graph<&str, &str> = Graph::new();
let core = dict.add_node("dict");
let key1 = dict.add_node("key 1");
let key2 = dict.add_node("key 2");
let key3 = dict.add_node("key 3");
let value1 = dict.add_node("value 1");
let value2 = dict.add_node("value 2");
let value3 = dict.add_node("value 3");
dict.add_edge(core, key1, "");
dict.add_edge(core, key2, "");
dict.add_edge(core, key3, "");
dict.add_edge(key1, value1, "");
dict.add_edge(key2, value2, "");
dict.add_edge(key3, value3, "");
draw_graph(&dict);
In petgraph there are various ways of storing a graph. The simplest is to use the Graph
data type. Graph
s can be either Directed or Undirected. (Note: Graphs are stored the same way whether they are Directed or Undirected. All graphs are stored internally as directed graphs. The only difference is the behavior of some algorithms such as the shortest path algorithms. )
You can initialize a directed graph with the statement:
let mut directed_graph : Graph<&str, &str> = Graph::new();
You can initialize an undirected graph with the statement:
let mut undirected_graph : Graph<&str, &str, petgraph::Undirected> = Graph::new_undirected();
You may have noticed that we specify the Graph
's type with two type parameters &str
and &str
.
The first parameter is the Node
weight. This is the data that is associated with the node. Weights are sometimes refered to as labels.
The second parameter is the Edge
weight.
With petgraph you can store any type of data as node or edge weights. Lets create a simple social graph to demonstrate this fact.
use std::fmt;
#[derive(Debug, Copy, Clone)]
struct Person<'a> {
name: &'a str,
age: u8,
}
#[derive(Debug, Copy, Clone)]
enum Relationship {
Friend,
Parent,
Sibling,
Child,
Boss,
}
impl<'a> fmt::Display for Person<'a> {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{}, {}", self.name, self.age)
}
}
impl fmt::Display for Relationship {
fn fmt(&self, f: &mut fmt::Formatter<'_>) -> fmt::Result {
write!(f, "{:?}", self)
}
}
let mut social_graph : Graph<Person, Relationship> = Graph::new();
let bob = social_graph.add_node(Person{name: "Bob", age: 37});
let alice = social_graph.add_node(Person{name: "Alice", age: 17});
social_graph.add_edge(bob, alice, Relationship::Parent);
let lilly = social_graph.add_node(Person{name: "Lilly", age: 50});
social_graph.add_edge(lilly, bob, Relationship::Boss);
let george = social_graph.add_node(Person{name: "George", age: 16});
social_graph.add_edge(george, alice, Relationship::Friend);
social_graph.add_edge(lilly, george, Relationship::Parent);
let fred = social_graph.add_node(Person{name: "Fred", age: 16});
social_graph.add_edge(george, fred, Relationship::Friend);
social_graph.add_edge(alice, fred, Relationship::Friend);
draw_graph(&social_graph);
let original_social_graph = social_graph.clone();
The most basic graph operations can be found directly as methods on the Graph
struct.
One of the most common methods is the map
method which applies given functions to each node and edge weight in the graph. This method creates a new graph and the functions can change the type of the weights.
pub fn map<'a, F, G, N2, E2>(
&'a self,
node_map: F,
edge_map: G
) -> Graph<N2, E2, Ty, Ix> where
F: FnMut(NodeIndex<Ix>, &'a N) -> N2,
G: FnMut(EdgeIndex<Ix>, &'a E) -> E2,
Here we increment the age of each person in the social graph.
let mut social_graph_a_year_later = social_graph.map(
|_, p| {
let mut p = p.clone();
p.age += 1;
p
},
|_, x| x.clone()
);
draw_graph(&social_graph_a_year_later);
You can also modify the weights in place using the node_weights_mut
and edge_weights_mut
iterators. This method does not allow you to change the types of the weights.
for person in social_graph_a_year_later.node_weights_mut(){
person.age += 1;
}
draw_graph(&social_graph_a_year_later);
There are a large number of methods provided for Graph
. Too many to list here. The last thing I'll mention about these methods is that methods wich remove nodes, such as filter_map
and remove_node
remove edges which are attached to the removed nodes, rather than leaving them dangling.
social_graph.remove_node(fred);
draw_graph(&social_graph);
Beyond the basic graph operations, petgraph also provides an array of graph algorithms in the module algo
. These operations are well known and you can find their descriptions in most graph textbooks or on wikipedia.
I'll start by describing the functions is_cyclic_directed
and is_cyclcic_undirected
. These functions are very important, because the distinction between cyclic graphs and acyclic graphs plays a big role in determining which other algorithms can be applied to the graph.
use petgraph::algo::*;
println!("Is tree cyclic? {}", is_cyclic_undirected(&tree));
println!("Is social_graph cyclic? {}", is_cyclic_directed(&social_graph_a_year_later));
println!("Is social_graph cyclic if we ignore edge direction? {}", is_cyclic_undirected(&social_graph_a_year_later));
println!("Is ring cyclic? {}", is_cyclic_directed(&ring));
One constant of computer science is that data structures that are simple are used far more frequently than those which are complex. I started this article by claiming that graphs are the universal data structure. I went on to show how a the petgraph library can represent any datastructure from a variable, to a list, to a table, to a tree, to a dictionary/hashmap. All of these data structures, however, exist in far more efficent forms either directly as built in types in rust, or as provided in third party libraries. If you need to represent a list, tree or a table and nothing else, you should never use petgraph to do so. You should always use the most limited possible data structure which is capable of representing your data because doing so allows your compiler and typechecker to help you prevent bugs. If you write a function that takes a tree and does something with it, and it's type signature looks like:
fn cut_tree(tree: Tree) -> Stump {
Then you only need to test your function for various possible trees that could be given to that function. The type checker will ensure that only trees are passed to that function, and you do not need to wory about edge cases where something other than a tree might have been passed to it. But if your type signature looks like:
fn cut_tree(tree: Graph) -> Stump {
You'll need to test for all sorts of data structures that aren't trees in order to be sure that your program will behave correctly and will not crash.
The simplest, and therefore most common type of data structure that you actually need the power of petgraph to represent is called a DAG or Directed Acyclic Graph. DAGs are a lot like trees but their branches can merge.
This is a tree:
draw_graph(&tree);
And this is a DAG
let mut dag = tree.clone();
let dag_item_f = dag.add_node("f");
dag.add_edge(tree_item4, dag_item_f, "");
dag.add_edge(tree_item5, dag_item_f, "");
draw_graph(&dag);
There is a certain hierarchy of data structures in computer science. All singletons are lists, all lists are tables. All lists are also trees. All trees (IFF they're edges are directed) are DAGs.
You can tell if a graph is a tree by running the is_cyclic_undirected
function on it. If this function returns false
, you're working with a tree.
use petgraph::*;
fn is_tree<'a, N: 'a, E: 'a, Ty, Ix>(g: &'a Graph<N, E, Ty, Ix>) -> bool
where
Ty: EdgeType,
Ix: petgraph::graph::IndexType
{
return !is_cyclic_undirected(g);
}
println!("Is tree a tree? {}", is_tree(&tree));
println!("Is dag a tree? {}", is_tree(&dag));
You can tell if a graph is a DAG if it is directed and is_cyclic_directed
returns false
.
fn is_dag<'a, N: 'a, E: 'a, Ty, Ix>(g: &'a Graph<N, E, Ty, Ix>) -> bool
where
Ty: EdgeType,
Ix: petgraph::graph::IndexType
{
return g.is_directed() && !is_cyclic_directed(g);
}
println!("Is tree a DAG? {}", is_dag(&tree));
println!("Is dag a DAG? {}", is_dag(&dag));
println!("Is ring a DAG? {}", is_dag(&ring));
POSIX file trees: At a low level, the POSIX file system is implemented as a DAG. I know, we refer to the directory hierarchy as a file tree, but hard links allow for the merging of branches which makes them not trees in all cases. Since hard links can only be used to link two files, and not directories, cycles cannot be formed. At a higher level, if you take symlinks (which are actually just a special kind of file) into account, file trees are not even DAGs, but generalized graphs.
Dependency graphs: Traditional package managers in which packages depend on one another, end up representing a DAG of packages. You cannot have cycles in such a graph, because the package manager never installs a package before it's dependencies have been installed.
Makefiles: Makefiles are another case of doing steps which depend on eachother.
init system services: In init systems such as sysv init and systemd the same property applies, that you start services who's dependencies have started first.
Rust struct declarations: In rust, the structs which you have declared form a DAG. They cannot form a cycle because rust needs to know how large a struct within a struct is at compile time. This can be worked around by including pointers/references to other structs rather than including them directly.
Git commits: Just the vocabulary of branches
should suggest that there is a tree of git commits, but the existance of merge
commits means that we're working with a DAG.
IPFS nodes: In IPFS, which is a distributed Hashed DAG there are no cycles.
There are two properties at play here which make DAGs extremely usefull in computer science. The first is watershedding
. Watershedding is the idea that some regions of your graph are clearly isolated from others. Remember the origional dag I mentioned?
draw_graph(&dag);
There is no way to get from e
to b
. That's a good thing in some cases, and is the reason why POSIX file DAGs are DAGs and not generalized graphs. You want to be certain that when you type in
$ rm -rf /a/b/e
You'll be deleting only e
and f
and not your entire file system. If there was ever a loop like this:
let mut general_graph = dag.clone();
general_graph.add_edge(dag_item_f, tree_item1, "");
draw_graph(&general_graph);
Then you could end up deleting all your files accidentally, and that would make POSIX harder to use and less predictable.
The second reason why DAGs are so common is more fundamental. With DAGs there is always a clear order of operations. That doesn't just make it easier to work with DAGs than general graphs, it makes a lot of things possible which simply cannot done otherwise.
The algorithm which tells you the order of operations for a DAG is called a topological sort, and is defined in petgraph's algo library as the toposort
function.
draw_graph(&tree);
fn print_topo_sort<'a, Ty, Ix>(g: &'a Graph<&'a str, &'a str, Ty, Ix>)
where
Ty: EdgeType,
Ix: petgraph::graph::IndexType
{
match toposort(g, None){
Ok(order) => {
print!("Sorted: ");
for i in order {
g.node_weight(i).map(|weight| {
print!("{}, ", weight);
weight
});
}
},
Err(err) => {
g.node_weight(err.node_id()).map(|weight|
println!("Error graph has cycle at node {}", weight));
}
}
println!("");
}
print_topo_sort(&tree);
Note that topological sorts are by their very nature non-deterministic functions. The only guarantee is that dependencies come before dependents, but when two nodes are at the same level, they can come in any order. Here, the output is Sorted tree: a, c, b, e, d
, but it could just as easilly have been Sorted tree: a, b, c, d, e
.
draw_graph(&dag);
print_topo_sort(&dag);
During actual execution, topoloical sorting is less common than you might think. The thing is that dependency DAGs differ from TODO lists in that dependency DAGs in that their execution can be easilly parallelized. In the example above, the c
and b
branches can be executed symultaneously, as can the d
and e
tasks. There are a number of strategies for executing DAGs in parallel. I won't be describing them here.
draw_graph(&general_graph);
print_topo_sort(&general_graph);
If you reverse the direction of the edges in a DAG, you end up with a DAG. This is an important thing to understand when working on a package manager, because if you want to uninstall a package and all of it's dependents, then you need to uninstall the dependents first. You can reverse the DAG with the reverse
method, or, if you are working with a topolgical sort, simply by reversing the output of toposort
.
The other algorithms provided by petgraph are applicable to more general graphs. The best known are the shortest path and existant path algorithms. Specifically, the Dijkstra algorithm is widely taught as being a generally usefull and elegant algorithm in schools. I mentioned earlier that the values associated with edges in petgraph are called weights
rather than labels
as is done elsewhere. This is because these values can be used to calculate edge cost
when evaluating the shortest distance algorithms. In the traditional map based questions, this cost is the literal distance (or travel time) between points. This cost can also be something else.
Here are some example graphs that you might use shortest length algorithms on:
All of these can be modeled by petgraph, however, you probably shouldn't use petgraph for geospatial graphs you should use a specialized library or database for this. Postgres, for example, can store geospatial data using the postgis addon and use pgRouting to find the shortest paths. Specialized geospatial routing engines are able to use spatial heuristics such as direction to significantly speed up route finding.
Petgraph provides 3 shortest path algorithms. (Links are to wikipedia)
Here is the type declaration for petgraph's dijstra.
pub fn dijkstra<G, F, K>(
graph: G,
start: G::NodeId,
goal: Option<G::NodeId>,
edge_cost: F
) -> HashMap<G::NodeId, K> where
G: IntoEdges + Visitable,
G::NodeId: Eq + Hash,
F: FnMut(G::EdgeRef) -> K,
K: Measure + Copy,
Notice how the function returns a HashMap? What's up with that? A HashMap is not a path. It should be noted that Dijstra and astar are actually the same algorithm and the Dijstra implementation provided by petgraph is kind of useless, so I'll be ignoring the provided dijstra
function.
pub fn astar<G, F, H, K, IsGoal>(
graph: G,
start: G::NodeId,
is_goal: IsGoal,
edge_cost: F,
estimate_cost: H
) -> Option<(K, Vec<G::NodeId>)> where
G: IntoEdges + Visitable,
IsGoal: FnMut(G::NodeId) -> bool,
G::NodeId: Eq + Hash,
F: FnMut(G::EdgeRef) -> K,
H: FnMut(G::NodeId) -> K,
K: Measure + Copy,
As I said, A* and Dijkstra are the same. The differenece is the estimate_cost
function. This function allows the algorithm to have some heuristics about which direction to start looking in. It is especially usefull when using geospatial data. If you are trying to find the shortest path in terms of distance and you know the x y coordinates of each node in your graph, estimate_cost
should return the distance as a crow flies from the given node to the start node. If you're looking for the shortest path in terms if travel time then you should take the crow flies distance and divide it by the top speed limit for the given mode of transportation. If you are doing network routing and you happen to be lucky enough to know the coordinates of each node in the network, then the speed limit is C, aka the speed of light. If you don't have any way of determining a heuristic, then return 0
and you have the Dijkstra algorithm.
Remember that old social graph example? Let's apply A* to it. Imagine that that kid Fred shares some sensitive private information on an employee of Lilly's. What is the most likely route by which the info got out? Lets say that such information is most likely to be shared between friends (with a virtual cost of 1), followed by sharing between familly members (lets give that a virtual cost of 2), and finally quite unlikely to be shared between collegues because they are accutely aware that sharing such information is illegal and wrong (cost of 5).
Lets start by declaring the cost and heuristic functions for our social graph.
fn rumor_spreading_cost(er : graph::EdgeReference<Relationship>) -> u32 {
match *er.weight() {
Relationship::Friend => 1,
Relationship::Parent => 2,
Relationship::Sibling => 2,
Relationship::Child => 2,
Relationship::Boss => 5,
}
}
fn no_clue_heuristic<N>(nid: N) -> u32 {
0
}
Now we'll get to actually calling the astar function. To find out how the rumor spread.
let (cost, path) = astar(
&original_social_graph,
lilly,
|n| n == fred,
rumor_spreading_cost,
no_clue_heuristic,
).unwrap();
for node_ix in path.iter() {
println!("{:?}", original_social_graph.node_weight(*node_ix).unwrap());
}
draw_graph(&original_social_graph);
So now we can see that the most likely scenario is that Lilly told her son George the private info who then told Fred.
pub fn bellman_ford<G>(
g: G,
source: G::NodeId
) -> Result<(Vec<G::EdgeWeight>, Vec<Option<G::NodeId>>), NegativeCycle> where
G: NodeCount + IntoNodeIdentifiers + IntoEdges + NodeIndexable,
G::EdgeWeight: FloatMeasure,
Bellman Ford differs from A*/Dijkstra in that it supports negative edge weights. This is usefull in cases such as when searching for optimal trade paths, because in some cases crossing a tax border can be a good thing due to the existance of subsidies. Ironically, Bellman Ford doesn't fair perfectly in this case, because it doesn't know how to deal with a situation where there is a negative edge weight cycle. Such a situation would lead to an infintely long shortest path ;). That sometimes happens in real life trade though, a country might grant an outgoing subsidy on a good while neglecting to impose an equal and oposite incomming tarif and we see ships sailing in circles just to collect infinite subsidies. bellman_ford
does, however, return a Result
with a NegativeCycle
outcome if such a cycle does exist. Perhaps legislators should use petgraph's bellman_ford
algorithm to detect such cycles before writing subsidies into law.
One of the first things I noticed when looking at bellman_ford
is that rather than taking an edge_cost
function like astar
does, it just expects edge weights in the graph to be of the type G::EdgeWeight: FloatMeasure
. This isn't that big of a deal, we can use the map
method to map edges of any type to floats, but it does lead to potential memory inefficiency. map
returns an new graph, which means that in order to use bellman_ford
we may need to have the graph in memory twice for the duration of the path finding procedure, and that merely because of a design choice.
Another thing I noticed is that bellman_ford
returns a tuple of two Vecs. One contains the length of each path, the other contains the "predecessor". Here is the exact docstring:
" On success, return one vec with path costs, and another one which points out the predecessor of a node along a shortest path. The vectors are indexed by the graph's node indices. "
For a while I didn't understand how this works. What is happening here, is that you should think of the second Vec
as a map which maps nodes to nodes. If the Option
is None
that means the node is unreachable. If it is Some(x)
that means that there is a path to that node and the previous node in that path is x. If that Vec looks like let predicessors = vec![None,Some(0),Some(1),Some(2)]
that means that:
node 3 can be best reached via node 2
Note that the source node is considered unreachable unless there is an actual path to it.
Lets imagine a situation, again in our social graph, in which Lilly wants to send money to people. There are different tax liabilities depending on her relationship to that person. There is a flat 30% income tax in this theoretical country and no gift tax between friends.
Lets figure out the best way for Lilly to send a christmass bonus to her employee Bob.
The first thing we need to do is realize that our social graph is directed, but it makes sense that friends should be able to transfer funds in either direction. Also, we haven't added Child → Parent relationships to our graph. We're going to need to add new edges based on the existing edges, to make these relationships bidirectional. Note, that we can't just convert the graph to a non-directed graph because in this case, the relationships are NOT symetrical from a taxation standpoint.
let mut bdsg = original_social_graph.clone(); //bidirectional_social_graph
for edge_ix in bdsg.edge_indices() {
let (source, dest) = bdsg.edge_endpoints(edge_ix).unwrap();
match *bdsg.edge_weight(edge_ix).unwrap() {
Relationship::Parent => {
bdsg.add_edge(dest, source, Relationship::Child);
},
Relationship::Friend => {
bdsg.add_edge(dest, source, Relationship::Friend);
},
Relationship::Sibling => {
bdsg.add_edge(dest, source, Relationship::Sibling);
},
_ => (),
}
}
draw_graph(&bdsg);
let social_graph_with_tax_costs = bdsg.map(
|_, nw| nw.clone(),
|_, ew| match ew {
Relationship::Friend => 0.0,
Relationship::Parent => -0.2,
Relationship::Sibling => 0.0,
Relationship::Child => 0.2,
Relationship::Boss => 0.3,
},
);
draw_graph(&social_graph_with_tax_costs);
println!("{:?}", bellman_ford(&social_graph_with_tax_costs, lilly));
So we have this output from the bellman_ford
function, but how do we actually use it? How do we turn those two lists into an actual tax optimal path for sending money to Bob? First we need to understand what the "predicessor's" list actual is. It's a shortest-path tree. We can visualize this tree as follows:
let (sgwtc_costs, sgwtc_predicessors) = bellman_ford(&social_graph_with_tax_costs, lilly).unwrap();
let mut spt_ = bdsg.clone(); //spt = Shortest path tree
spt_.clear_edges();
let mut spt = spt_.map(
|_, nw| nw.clone(),
|_, _| 0.0,
); // Use map to change type of edge weights to float.
let mut i = 0;
for op_node_ix in sgwtc_predicessors.iter() {
match op_node_ix {
Some(node_ix) => {
spt.add_edge(*node_ix, graph::NodeIndex::new(i), sgwtc_costs[i]);
}
None => ()
};
i += 1;
}
draw_graph(&spt);