Skip to content
GitHubBuy Me A Coffee

Knowledge Discovery

Most search tools match exact words. Obsilo goes further: it understands meaning. A search for "improving focus" can find a note titled "Deep Work Techniques" even though the words do not overlap.

Traditional keyword search looks for exact text matches. Semantic search converts your notes into numerical vectors (called embeddings) that represent their meaning. Your query gets the same treatment, and the system finds notes whose vectors are closest to yours.

This means:

  • "recipes for pasta" finds notes about Italian cooking, even if they never say "pasta"
  • "how to sleep better" finds your note titled "Evening Wind-Down Routine"
  • "budget planning" finds notes about financial forecasting and expense tracking

Setup

Semantic search requires an embedding model to convert text into embeddings. You set this up once; Obsilo handles the rest.

  1. Open Settings > Obsilo Agent > Embeddings
  2. Choose an embedding model from the dropdown
  3. Click Build Index to process your vault

Which embedding model?

Any configured provider that supports embeddings will work. If you are using OpenAI or a compatible API, the default embedding model is a good starting point. Local models via Ollama work well if you want everything to stay on your machine.

Building the index

The first build processes every note in your vault. This can take a few minutes for large vaults (1000+ notes). After that, the index updates automatically:

  • On startup: new or changed files are re-indexed
  • On file changes: edits trigger re-indexing after a short delay
  • Manually: use the Rebuild Index button in settings at any time

Your notes stay local

Embeddings are stored in a local database inside your vault. If you use a cloud embedding model, note content is sent to the provider for processing, but the resulting embeddings live only on your machine. With a local model, nothing leaves your device.

How search works under the hood

When you or the agent run a semantic search, Obsilo combines multiple retrieval strategies:

1. BM25 (keyword matching)

A fast, traditional ranking algorithm. It finds notes that contain your search terms and ranks them by relevance. Good for specific terms like names, dates, or technical jargon.

2. Semantic similarity (embedding matching)

Compares the meaning of your query against the embeddings of every chunk in your vault. Finds conceptually related notes even without keyword overlap.

3. Reciprocal rank fusion (RRF)

Combines the results from BM25 and semantic search into a single ranked list. Notes that score well on both methods rise to the top. In practice, this hybrid ranking beats either method used alone.

The knowledge graph

Beyond search, Obsilo builds a knowledge graph from the structure already in your vault:

  • Wikilinks: [[note]] connections between your notes
  • Tags: shared tags create implicit groupings
  • MOC properties: Maps of Content link related topics

When the agent searches, it can expand results through the graph. If a search finds Note A, and Note A links to Note B, the agent can follow that link to pull in related content. You configure how many hops the graph expansion follows in settings.

Example: Searching for "machine learning" finds your note on Neural Networks. Graph expansion then follows its wikilinks to your notes on Training Data and Model Evaluation, things you might not have found with search alone.

Implicit connections

Obsilo can find notes that are semantically similar but not linked to each other. Two notes about closely related topics, written months apart, that you never connected. The agent spots those.

When it finds them, a suggestion banner appears in the sidebar offering to show you the discovered relationships. In large vaults, this regularly surfaces connections you would not have found manually.

Scales with vault size

The larger your vault, the more useful implicit connections get.

Local reranking

After the initial search returns candidates, Obsilo can run a second pass using a cross-encoder model to improve result quality. This model runs entirely on your device via WebAssembly, so no data is sent anywhere.

The reranker (based on ms-marco-MiniLM) reads each candidate alongside your query and produces a more accurate relevance score. False positives get pushed down; actually relevant results move up.

Toggle it in Settings > Obsilo Agent > Embeddings > Local Reranking.

Contextual retrieval

When enabled, Obsilo enriches each chunk with surrounding context before creating its embedding. The agent reads the note around a chunk and adds a brief description of what that chunk is about. This improves search accuracy for short or ambiguous passages.

For example, a chunk containing just a table of numbers becomes much more findable when the system adds context like "quarterly revenue figures from the 2025 financial review."

Configuration

SettingWhereRecommendation
Embedding modelSettings > EmbeddingsChoose based on your privacy needs and provider
Chunk sizeSettings > Embeddings > AdvancedDefault works well for most vaults. Smaller chunks (256 tokens) for short notes, larger (1024) for long-form writing
Excluded foldersSettings > Embeddings > ExcludedExclude templates, archive, or attachment folders to keep the index focused
Auto-indexSettings > EmbeddingsKeep enabled for automatic updates on file changes
Graph hopsSettings > Embeddings > Graph1-2 hops is usually enough. More hops find broader connections but may include noise
Local rerankingSettings > EmbeddingsEnable for better result quality at minimal performance cost

Large vaults

For vaults with 5000+ notes, the initial index build may take 10-20 minutes depending on your embedding model. After that, incremental updates are fast. Consider excluding attachment folders or archives you rarely search.

Examples

  • "Find notes related to my goals for this year" (semantic search finds notes about resolutions, plans, and objectives)
  • "What do I know about distributed systems?" (searches by meaning across your entire vault)
  • "Show me notes similar to @architecture-decisions" (finds thematically related notes)
  • "Are there any notes I should link together?" (triggers implicit connection discovery)

Next steps