Documentation
¶
Overview ¶
Package extern provides graph algorithms that operate directly on a Tier 2 (mmap-backed) csrfile.Reader without first materialising the CSR in memory. The algorithms are semi-external: vertex-sized auxiliary structures (visited bitsets, level frontiers) live in RAM while edge data is streamed from the mapped file.
Index ¶
- Variables
- func BFS(r *csrfile.Reader, src graph.NodeID, ...) error
- func BFSCtx(ctx context.Context, r *csrfile.Reader, src graph.NodeID, ...) error
- func PageRank(r *csrfile.Reader, opts PageRankOptions) (ranks []float64, iterations int, err error)
- func PageRankCtx(ctx context.Context, r *csrfile.Reader, opts PageRankOptions) (ranks []float64, iterations int, err error)
- type PageRankOptions
Examples ¶
Constants ¶
This section is empty.
Variables ¶
var ErrInvalidInput = errors.New("extern: input option contains NaN or Inf")
ErrInvalidInput is returned by extern algorithms when their float options carry NaN or +/-Inf.
Functions ¶
func BFS ¶
BFS performs breadth-first traversal of the graph captured by r, starting at src. visit is invoked for every reached node in non-decreasing depth order; returning false aborts the traversal.
The implementation keeps the visited bitset and per-level frontiers in RAM and streams adjacency directly from the mmap region. The current frontier is sorted before expansion so that edge reads stay sequential, maximising the benefit of any MADV_SEQUENTIAL hint configured on the reader.
Example ¶
ExampleBFS runs a semi-external breadth-first traversal directly over an mmap-backed csrfile.Reader, reporting the depth at which each node is reached. Adjacency is streamed from the file; only the visited set and frontiers live in RAM.
package main
import (
"fmt"
"os"
"path/filepath"
"github.com/FlavioCFOliveira/GoGraph/graph"
"github.com/FlavioCFOliveira/GoGraph/graph/adjlist"
"github.com/FlavioCFOliveira/GoGraph/graph/csr"
"github.com/FlavioCFOliveira/GoGraph/search/extern"
"github.com/FlavioCFOliveira/GoGraph/store/csrfile"
)
// writeDiamond builds the directed diamond 0->{1,2}->3, serialises it
// to a Tier 2 csrfile in a fresh temp directory, and returns an open
// mmap-backed reader, the mapper, and a cleanup func the caller must
// defer. The reader streams adjacency from the mapped file; extern's
// algorithms never materialise the CSR in memory.
func writeDiamond() (*csrfile.Reader, *graph.Mapper[int], func()) {
a := adjlist.New[int, struct{}](adjlist.Config{Directed: true})
for _, e := range [][2]int{{0, 1}, {0, 2}, {1, 3}, {2, 3}} {
_ = a.AddEdge(e[0], e[1], struct{}{})
}
c := csr.BuildFromAdjList(a)
dir, err := os.MkdirTemp("", "extern-example-")
if err != nil {
panic(err)
}
path := filepath.Join(dir, "diamond.csr")
if _, err := csrfile.WriteToFile(path, c); err != nil {
_ = os.RemoveAll(dir)
panic(err)
}
r, err := csrfile.Open(path)
if err != nil {
_ = os.RemoveAll(dir)
panic(err)
}
cleanup := func() {
_ = r.Close()
_ = os.RemoveAll(dir)
}
return r, a.Mapper(), cleanup
}
func main() {
r, m, cleanup := writeDiamond()
defer cleanup()
src, _ := m.Lookup(0)
depth := map[int]int{}
if err := extern.BFS(r, src, func(node graph.NodeID, d int) bool {
v, _ := m.Resolve(node)
depth[v] = d
return true // keep traversing
}); err != nil {
fmt.Println("error:", err)
return
}
for v := 0; v < 4; v++ {
fmt.Printf("node %d at depth %d\n", v, depth[v])
}
}
Output: node 0 at depth 0 node 1 at depth 1 node 2 at depth 1 node 3 at depth 2
func BFSCtx ¶
func BFSCtx(ctx context.Context, r *csrfile.Reader, src graph.NodeID, visit func(node graph.NodeID, depth int) bool) error
BFSCtx is the context-aware variant of BFS. ctx.Err() is checked once per BFS level; on cancellation returns the wrapped ctx.Err.
func PageRank ¶
PageRank runs the power-iteration form of PageRank over the graph captured by an mmap-backed csrfile.Reader. Rank arrays live in RAM (size = nVertices); adjacency is streamed sequentially from the file each iteration.
The returned slice is indexed by NodeID; only NodeIDs that participate in at least one edge (live nodes) carry non-zero rank. The sum over the slice equals 1.0 within numerical tolerance.
Algorithm. Mirrors the in-memory centrality.PageRank: at every step the mass currently held by dangling nodes (out-degree 0) is redistributed uniformly across all live nodes via baseShare = (1-d)/live + d * danglingMass / live, before mass from non-dangling sources is forwarded along outgoing edges. This guarantees total-mass conservation.
Concurrency: safe to invoke from any number of goroutines on a shared csrfile.Reader.
Example ¶
ExamplePageRank runs PageRank over the same mmap-backed reader. The rank vector lives in RAM while edges stream from the file each iteration; the ranks sum to 1.0, and the sink (node 3, the only destination with no out-edges) carries more mass than the source.
package main
import (
"fmt"
"os"
"path/filepath"
"github.com/FlavioCFOliveira/GoGraph/graph"
"github.com/FlavioCFOliveira/GoGraph/graph/adjlist"
"github.com/FlavioCFOliveira/GoGraph/graph/csr"
"github.com/FlavioCFOliveira/GoGraph/search/extern"
"github.com/FlavioCFOliveira/GoGraph/store/csrfile"
)
// writeDiamond builds the directed diamond 0->{1,2}->3, serialises it
// to a Tier 2 csrfile in a fresh temp directory, and returns an open
// mmap-backed reader, the mapper, and a cleanup func the caller must
// defer. The reader streams adjacency from the mapped file; extern's
// algorithms never materialise the CSR in memory.
func writeDiamond() (*csrfile.Reader, *graph.Mapper[int], func()) {
a := adjlist.New[int, struct{}](adjlist.Config{Directed: true})
for _, e := range [][2]int{{0, 1}, {0, 2}, {1, 3}, {2, 3}} {
_ = a.AddEdge(e[0], e[1], struct{}{})
}
c := csr.BuildFromAdjList(a)
dir, err := os.MkdirTemp("", "extern-example-")
if err != nil {
panic(err)
}
path := filepath.Join(dir, "diamond.csr")
if _, err := csrfile.WriteToFile(path, c); err != nil {
_ = os.RemoveAll(dir)
panic(err)
}
r, err := csrfile.Open(path)
if err != nil {
_ = os.RemoveAll(dir)
panic(err)
}
cleanup := func() {
_ = r.Close()
_ = os.RemoveAll(dir)
}
return r, a.Mapper(), cleanup
}
func main() {
r, m, cleanup := writeDiamond()
defer cleanup()
ranks, _, err := extern.PageRank(r, extern.DefaultPageRankOptions())
if err != nil {
fmt.Println("error:", err)
return
}
var total float64
for _, x := range ranks {
total += x
}
src, _ := m.Lookup(0)
sink, _ := m.Lookup(3)
fmt.Printf("ranks sum to 1.0: %v\n", total > 0.999 && total < 1.001)
fmt.Printf("sink outranks source: %v\n", ranks[sink] > ranks[src])
}
Output: ranks sum to 1.0: true sink outranks source: true
Types ¶
type PageRankOptions ¶
type PageRankOptions struct {
// Damping is the random-jump probability (typical 0.85).
Damping float64
// MaxIterations bounds the iteration count.
MaxIterations int
// Tolerance is the convergence threshold on the L1 norm of the
// rank delta.
Tolerance float64
}
PageRankOptions configures PageRank.
func DefaultPageRankOptions ¶
func DefaultPageRankOptions() PageRankOptions
DefaultPageRankOptions returns the classic Brin-Page configuration.