Context and Indexing (ConnectSoft)¶
This document defines how OpenClaw assistants obtain high-quality ConnectSoft context by building and querying a semantic index over selected repositories.
Why a semantic index¶
ConnectSoft repos are large and fast-moving. A semantic index enables:
- high recall across many repos
- targeted retrieval (only the most relevant chunks)
- stable, repeatable context injection into assistants
Scope (start small, expand)¶
Index only a curated repo allowlist:
ConnectSoft.CompanyDocumentation(MkDocs docs)ConnectSoft.LibraryTemplate(library scaffolding patterns)- other repos as needed (templates, core extensions, standards)
Important
The index must exclude generated/output folders to avoid noise and accidental leakage.
Exclusions (minimum)¶
Exclude at least:
.git/bin/,obj/site/(MkDocs build output)node_modules/**/*.png,**/*.jpg, large binaries
Build and refresh strategy¶
- Nightly refresh: re-sync repos, rebuild index incrementally.
- On-demand refresh: before major assistant runs, rebuild only impacted repos.
- Pinned run context: each run records the repo commit SHAs used for indexing and retrieval.
Cost and quality controls¶
Prefer these defaults:
- Docs-first retrieval, then code only if needed
- Chunk caps per question (start with 5–10 chunks)
- Bounded chunk size (avoid full-file dumps unless required)
- Cache common answers (template commands, repo structure, standard runbooks)
- Exclude noisy dirs (
bin/obj/site/.git/node_modules)
Tip
The best cost control is good scoping: fewer repos indexed, better manifests, and tight retrieval limits.
Trust model (what is authoritative)¶
When sources disagree, use this precedence:
- ConnectSoft docs (
ConnectSoft.CompanyDocumentation) - ConnectSoft templates/libraries docs (e.g.,
ConnectSoft.LibraryTemplate) - Code in canonical repos (pinned to a specific commit SHA)
- External sources (official docs/blogs) for platform-level facts
Run artifacts¶
Each assistant run should write a short “context report” to the run folder:
- indexed repos list + SHAs
- retrieval queries executed
- top chunks used