El Housseine Jaafari

Posted on Apr 1 • Originally published at clawship.app

Building an Engineering & Security News Aggregator (10 Sources, No APIs)

#security #webdev #javascript #opensource

We built a curated engineering and security news aggregator that pulls from 10 high-signal sources, deduplicates content, and updates every 6 hours.

No paid APIs. No scraping. No login. Just clean, structured news for developers.

This post breaks down exactly how it works.

What This Is

A lightweight news wire combining:

Hacker News
Lobsters
InfoQ
Cloudflare Blog
Krebs on Security
The Hacker News (Security)
NIST NVD (vulnerabilities)
GitHub Blog
OpenAI Blog
Anthropic Research

The goal: high-quality signal, zero noise, zero cost.

Why Build This?

Most engineering/news aggregators fail in one of these ways:

Too noisy (no curation)
Too expensive (paid APIs)
Too slow (manual updates)
Too fragmented (you check 10 sites anyway)

We wanted:

A single feed
Fresh updates (but not real-time obsession)
No operational cost
No lock-in (no accounts, no tracking)

Stack

Hono (API layer)
Drizzle ORM
Postgres
Next.js (frontend)
RSS feeds + Hacker News Firebase API

High-Level Architecture

           ┌───────────────┐
           │   RSS Feeds   │
           │ (9 sources)   │
           └──────┬────────┘
                  │
                  ▼
           ┌───────────────┐
           │ Fetch Workers │
           │ (every 6 hrs) │
           └──────┬────────┘
                  │
                  ▼
        ┌──────────────────────┐
        │ Normalize Articles   │
        │ title, url, date     │
        └─────────┬────────────┘
                  │
                  ▼
        ┌──────────────────────┐
        │ SHA-256 Deduplication│
        │ (based on URL)       │
        └─────────┬────────────┘
                  │
                  ▼
           ┌───────────────┐
           │   Postgres    │
           └──────┬────────┘
                  │
                  ▼
           ┌───────────────┐
           │   Hono API    │
           └──────┬────────┘
                  │
                  ▼
           ┌───────────────┐
           │   Next.js UI  │
           └───────────────┘

Data Sources

We deliberately chose sources with:

High editorial quality
Low duplication between each other
Stable RSS feeds or APIs

Breakdown

Source	Type	Why It Matters
Hacker News	API	Real-time dev signal
Lobsters	RSS	More technical discussions
InfoQ	RSS	Deep engineering content
Cloudflare Blog	RSS	Infra + performance insights
Krebs on Security	RSS	Trusted security reporting
The Hacker News	RSS	Security news (broader)
NIST NVD	RSS/API	Verified vulnerabilities
GitHub Blog	RSS	Platform + ecosystem updates
OpenAI Blog	RSS	AI developments
Anthropic Research	RSS	AI + safety research

Fetching Strategy

We run a simple scheduled job:

// every 6 hours
cron.schedule("0 */6 * * *", async () => {
  await fetchAllSources();
});

Why every 6 hours?

Keeps content fresh
Avoids unnecessary load
Works well with RSS update frequencies

Deduplication (Key Part)

Different sources often post the same story.

We solve this using SHA-256 hashing of URLs.

import { createHash } from "crypto";

function hashUrl(url: string) {
  return createHash("sha256").update(url).digest("hex");
}

Why URL hashing?

Fast
Deterministic
No fuzzy matching complexity
Works across sources

Tradeoff

Won’t catch rewritten articles with different URLs
But avoids false positives (important for trust)

Normalization

Each source has its own format. We normalize into a single shape:

type Article = {
  title: string;
  url: string;
  source: string;
  publishedAt: Date;
};

This keeps the frontend simple and predictable.

API Layer (Hono)

Example endpoint:

app.get("/articles", async (c) => {
  const articles = await db.query.articles.findMany({
    orderBy: (a, { desc }) => [desc(a.publishedAt)],
    limit: 100,
  });

  return c.json(articles);
});

Minimal, fast, no overengineering.

Frontend (Next.js)

Server-rendered list
No login required
No personalization
Just chronological, deduplicated news

Limitations

Not real-time (by design)
No personalization
Deduplication is URL-based only
Dependent on RSS availability

What We’d Improve

Smarter clustering (same story, different URLs)
Tagging (infra, AI, security, etc.)
Optional filters (without accounts)

Try It

The news wire is open to everyone:

👉 https://clawship.app/blog/engineering-security-news-wire

Connect with Us

Discord: https://discord.gg/
Twitter: https://twitter.com/
News Wire: https://clawship.app/blog/engineering-security-news-wire

DEV Community