Context and Indexing (ConnectSoft)¶

This document defines how OpenClaw assistants obtain high-quality ConnectSoft context by building and querying a semantic index over selected repositories.

Why a semantic index¶

ConnectSoft repos are large and fast-moving. A semantic index enables:

high recall across many repos
targeted retrieval (only the most relevant chunks)
stable, repeatable context injection into assistants

Scope (start small, expand)¶

Index only a curated repo allowlist:

ConnectSoft.CompanyDocumentation (MkDocs docs)
ConnectSoft.LibraryTemplate (library scaffolding patterns)
other repos as needed (templates, core extensions, standards)

Important

The index must exclude generated/output folders to avoid noise and accidental leakage.

Exclusions (minimum)¶

Exclude at least:

.git/
bin/, obj/
site/ (MkDocs build output)
node_modules/
**/*.png, **/*.jpg, large binaries

Build and refresh strategy¶

Nightly refresh: re-sync repos, rebuild index incrementally.
On-demand refresh: before major assistant runs, rebuild only impacted repos.
Pinned run context: each run records the repo commit SHAs used for indexing and retrieval.

Cost and quality controls¶

Prefer these defaults:

Docs-first retrieval, then code only if needed
Chunk caps per question (start with 5–10 chunks)
Bounded chunk size (avoid full-file dumps unless required)
Cache common answers (template commands, repo structure, standard runbooks)
Exclude noisy dirs (bin/obj/site/.git/node_modules)

Tip

The best cost control is good scoping: fewer repos indexed, better manifests, and tight retrieval limits.

Trust model (what is authoritative)¶

When sources disagree, use this precedence:

ConnectSoft docs (ConnectSoft.CompanyDocumentation)
ConnectSoft templates/libraries docs (e.g., ConnectSoft.LibraryTemplate)
Code in canonical repos (pinned to a specific commit SHA)
External sources (official docs/blogs) for platform-level facts

Run artifacts¶

Each assistant run should write a short “context report” to the run folder:

indexed repos list + SHAs
retrieval queries executed
top chunks used