Extract

Code extraction via tree-sitter queries


The Extract system pulls code snippets from source files at build time using tree-sitter queries. Extracted snippets become EXTRACTIONS constants in generated Rust libraries, enabling documentation pages to embed live source code that stays in sync with the codebase.


Extract

The extract rule runs a tree-sitter query against a source file and produces a rust_library with a pub static EXTRACTIONS array.

def extract(name, source, language, query = "", visibility = None): """ Extract a code snippet from a source file via tree-sitter query. Produces a rust_library with pub static EXTRACTIONS: &[Extraction]. When query is empty, embeds the entire file. Args: name: Target name (becomes the crate name) source: Source file to query language: Source language (rust, starlark, bash, json, molten) query: Tree-sitter query with @capture pattern (empty for whole file) visibility: Bazel visibility """ kwargs = {} if visibility != None: kwargs["visibility"] = visibility _extract( name = name + ".generate", source = source, query = query, language = language, ) crate_name = name.replace(".", "_").replace("-", "_") rust_library( name = name, srcs = [":" + name + ".generate"], crate_name = crate_name, deps = [ "//component/web:extraction", "//component/web:language", ], **kwargs ) _source( name = name + ".source", source = source, **kwargs )
Parameter Description
name Target name (becomes the crate name)
source Source file to extract from
language Source language for parsing
query Tree-sitter query with @capture (empty for whole file)
visibility Bazel visibility

When query is empty, the entire file is embedded. Each extract target also produces a .source target for linking to the original file in rendered documentation.


Query

The query rule validates a command binary at build time and produces an extractable shell snippet. It chains into extract to generate a rust_library with the validated command string.

def query(name, binary, label, arguments = [], visibility = None): """ Validate a command binary with arguments and produce an extractable shell snippet. Runs the binary with ? at build time to validate arguments, then chains the output into extract to produce a rust_library with EXTRACTIONS. Args: name: Target name (becomes the crate name) binary: Binary target to validate label: Bazel label for the binary (used in the generated command string) arguments: CLI arguments to validate visibility: Bazel visibility """ _query( name = name + ".query", binary = binary, label = label, arguments = arguments, ) extract( name = name, source = ":" + name + ".query", language = "bash", visibility = visibility, )
Parameter Description
name Target name (becomes the crate name)
binary Binary target to validate
label Bazel label for the generated command string
arguments CLI arguments to validate
visibility Bazel visibility

The binary is invoked with ? at build time to validate arguments without executing the full command. This ensures all documented commands are syntactically valid against their binaries.


Language

Extraction supports multiple source languages via tree-sitter grammars:

Language Extensions Grammar
Rust .rs tree-sitter-rust
Starlark .bzl, .bazel, .star tree-sitter-python
Bash .sh tree-sitter-bash
JSON .json tree-sitter-json
Molten .magma, .lava tree-sitter-molten

Pattern

Tree-sitter queries use S-expression syntax. The @capture name marks which nodes to extract:

  • (function_item name: (identifier) @name (#eq? @name "target")) @capture — extract a Rust function by name
  • (call function: (identifier) @fn (#eq? @fn "rule")) @capture — extract a Starlark rule invocation
  • (attribute_item) @capture . (function_item) @capture — extract a function with its preceding attribute

All queries must include at least one @capture binding. Multiple @capture nodes within the same match are merged into a single extraction.