Knowledge Universe API — populate Weaviate with scored, multi-source knowledge in one call

Title: Knowledge Universe API — populate Weaviate with scored, multi-source knowledge in one call

Hey Weaviate community,

I built Knowledge Universe API and wanted to share a pattern that might be useful for anyone building RAG pipelines on Weaviate.

The problem it solves: getting fresh, structured knowledge into your Weaviate collection without writing individual crawlers for every source.

One API call retrieves from arXiv, GitHub, Wikipedia, StackOverflow, HuggingFace, Semantic Scholar and 8 more official sources simultaneously. Every result is scored across 5 dimensions (content quality, freshness, pedagogical fit, trust, social proof) before it reaches you.

The Weaviate integration:

import weaviate
import requests

# 1. Get scored + embedded knowledge
response = requests.post(
    'YOUR_API_URL/v1/discover',
    json={
        'topic': 'vector search optimization',
        'output_format': 'embeddings'  # returns 384-dim vectors
    }
)

# 2. Upsert directly into Weaviate
client = weaviate.Client(WEAVIATE_URL)
with client.batch as batch:
    for item in response.json()['embeddings']:
        batch.add_data_object(
            data_object={
                'title': item['title'],
                'url': item['url'],
                'platform': item['source_platform'],
                'quality_score': item.get('quality_score', 0),
            },
            class_name='KnowledgeSource',
            vector=item['vector']
        )

You get pre-scored, multi-source knowledge in your vector store. Quality score is stored as metadata so you can filter by it at query time.

Open source, MIT licensed: GitHub - VLSiddarth/Knowledge-Universe: "Find the best knowledge sources across the internet. For learning, research, and AI. 🌌" · GitHub
Free tier: 100 calls/month, no credit card.

Two questions for the community:

  1. Does the embedding schema above work well for your Weaviate setup, or would a different metadata structure be more useful?
  2. Are there knowledge sources you wish were covered that aren’t in the list above?

Happy to adjust the output format based on what actually fits Weaviate workflows.