Title: Knowledge Universe API — populate Weaviate with scored, multi-source knowledge in one call
Hey Weaviate community,
I built Knowledge Universe API and wanted to share a pattern that might be useful for anyone building RAG pipelines on Weaviate.
The problem it solves: getting fresh, structured knowledge into your Weaviate collection without writing individual crawlers for every source.
One API call retrieves from arXiv, GitHub, Wikipedia, StackOverflow, HuggingFace, Semantic Scholar and 8 more official sources simultaneously. Every result is scored across 5 dimensions (content quality, freshness, pedagogical fit, trust, social proof) before it reaches you.
The Weaviate integration:
import weaviate
import requests
# 1. Get scored + embedded knowledge
response = requests.post(
'YOUR_API_URL/v1/discover',
json={
'topic': 'vector search optimization',
'output_format': 'embeddings' # returns 384-dim vectors
}
)
# 2. Upsert directly into Weaviate
client = weaviate.Client(WEAVIATE_URL)
with client.batch as batch:
for item in response.json()['embeddings']:
batch.add_data_object(
data_object={
'title': item['title'],
'url': item['url'],
'platform': item['source_platform'],
'quality_score': item.get('quality_score', 0),
},
class_name='KnowledgeSource',
vector=item['vector']
)
You get pre-scored, multi-source knowledge in your vector store. Quality score is stored as metadata so you can filter by it at query time.
Open source, MIT licensed: GitHub - VLSiddarth/Knowledge-Universe: "Find the best knowledge sources across the internet. For learning, research, and AI. 🌌" · GitHub
Free tier: 100 calls/month, no credit card.
Two questions for the community:
- Does the embedding schema above work well for your Weaviate setup, or would a different metadata structure be more useful?
- Are there knowledge sources you wish were covered that aren’t in the list above?
Happy to adjust the output format based on what actually fits Weaviate workflows.