Skip to content

Context and Indexing (ConnectSoft)

This document defines how OpenClaw assistants obtain high-quality ConnectSoft context by building and querying a semantic index over selected repositories.

Why a semantic index

ConnectSoft repos are large and fast-moving. A semantic index enables:

  • high recall across many repos
  • targeted retrieval (only the most relevant chunks)
  • stable, repeatable context injection into assistants

Scope (start small, expand)

Index only a curated repo allowlist:

  • ConnectSoft.CompanyDocumentation (MkDocs docs)
  • ConnectSoft.LibraryTemplate (library scaffolding patterns)
  • other repos as needed (templates, core extensions, standards)

Important

The index must exclude generated/output folders to avoid noise and accidental leakage.

Exclusions (minimum)

Exclude at least:

  • .git/
  • bin/, obj/
  • site/ (MkDocs build output)
  • node_modules/
  • **/*.png, **/*.jpg, large binaries

Build and refresh strategy

  • Nightly refresh: re-sync repos, rebuild index incrementally.
  • On-demand refresh: before major assistant runs, rebuild only impacted repos.
  • Pinned run context: each run records the repo commit SHAs used for indexing and retrieval.

Cost and quality controls

Prefer these defaults:

  • Docs-first retrieval, then code only if needed
  • Chunk caps per question (start with 5–10 chunks)
  • Bounded chunk size (avoid full-file dumps unless required)
  • Cache common answers (template commands, repo structure, standard runbooks)
  • Exclude noisy dirs (bin/obj/site/.git/node_modules)

Tip

The best cost control is good scoping: fewer repos indexed, better manifests, and tight retrieval limits.

Trust model (what is authoritative)

When sources disagree, use this precedence:

  1. ConnectSoft docs (ConnectSoft.CompanyDocumentation)
  2. ConnectSoft templates/libraries docs (e.g., ConnectSoft.LibraryTemplate)
  3. Code in canonical repos (pinned to a specific commit SHA)
  4. External sources (official docs/blogs) for platform-level facts

Run artifacts

Each assistant run should write a short “context report” to the run folder:

  • indexed repos list + SHAs
  • retrieval queries executed
  • top chunks used