frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

I'm Just Having Fun

https://jyn.dev/i-m-just-having-fun/
1•lemper•31s ago•0 comments

EU household real income per capita up 22% since 2004

https://ec.europa.eu/eurostat/web/products-eurostat-news/w/ddn-20251125-2
1•andrewstetsenko•49s ago•0 comments

English prose has become much easier to read

https://www.worksinprogress.news/p/english-prose-has-become-much-easier
1•ortegaygasset•1m ago•0 comments

The Churn

https://blog.cleancoder.com/uncle-bob/2016/07/27/TheChurn.html
1•BinaryIgor•3m ago•1 comments

Show HN: Add sticky notes to any page on the web

https://leafovers.com/
1•appetizersnack•4m ago•0 comments

Night factory tours provide history lesson of Japan's modern economy

https://immersive.kyodonews.net/night-factory-tours-provide-history-lesson-of-japans-modern-econo...
1•Kaibeezy•5m ago•0 comments

When Everything Is Important, Nothing Matters

https://ramezanpour.net/post/2025/12/16/when-everything-is-important-nothing-is
2•ramezanpour•6m ago•0 comments

AI that helps startups assess technical candidates

https://www.algo-voice.dev/recruiter
2•jarlen•9m ago•1 comments

State of HTML 2025 Results

https://2025.stateofhtml.com/en-US
2•fmerian•9m ago•1 comments

Vampire Ground Finch

https://en.wikipedia.org/wiki/Vampire_ground_finch
1•thunderbong•10m ago•0 comments

Show HN: MethodsAgent – Solves "I can build but can't sell" for founders

https://www.methodsagent.com/
2•pierremouchan•12m ago•0 comments

C++ lowcode toolkit for ERP and Accounting Software

https://fin.in.net
1•basesdk•16m ago•1 comments

An Interview with Rivian CEO RJ Scaringe About Building a Car Company and

https://stratechery.com/2025/an-interview-with-rivian-ceo-rj-scaringe-about-building-a-car-compan...
1•feross•16m ago•0 comments

Show HN: Roam Cinema – Discover worldwide cinema via interactive world map

https://roamcinema.app
1•knlgwr•17m ago•0 comments

Valid Polish: Learn Polish and JSON Schema, Together

https://validpolish.com/
1•kyyt•17m ago•0 comments

Nemotron 3 Nano Technical Report [pdf]

https://research.nvidia.com/labs/nemotron/files/NVIDIA-Nemotron-3-Nano-Technical-Report.pdf
3•todsacerdoti•18m ago•0 comments

Large language models are not about natural language

https://arxiv.org/abs/2512.13441
1•50kIters•19m ago•0 comments

Show HN: 85% Cheaper Crypto Data

https://qoery.com/
1•SamTinnerholm•21m ago•1 comments

Show HN: MetroYatra – Metro Route Finder For Indian Cities

https://metroyatra.com
1•codebyprakash•21m ago•0 comments

Should we fear Microsoft's monopoly?

https://www.cursor.tue.nl/en/background/2025/december/week-2/should-we-fear-microsofts-monopoly
3•sergdigon•23m ago•0 comments

Teaching agentic AI to French developers: feedback from a professional trainer

https://www.ericburel.tech/blog/teaching-agentic-ai-2025
1•eric-burel•26m ago•0 comments

Show HN: I Built an Autonomous Marketing OS from a Small Town in India (Vect AI)

https://blog.vect.pro/ai-marketing-command-center-guide
1•afrazullal•27m ago•0 comments

What I Learned About Deploying AV1 from Two Deployers

https://streaminglearningcenter.com/articles/what-i-learned-about-deploying-av1-from-two-deployer...
1•breve•28m ago•0 comments

Microscopic robots that sense, think, act, and compute

https://www.science.org/doi/10.1126/scirobotics.adu8009
1•croes•28m ago•0 comments

Code Actions as Tools: Evolving Tool Libraries for Agents

https://gradion-ai.github.io/agents-nanny/2025/12/16/code-actions-as-tools-evolving-tool-librarie...
1•krasserm•29m ago•1 comments

Kaist researchers develop new 'stealth cloak' to be applied to robots, wearable

https://koreajoongangdaily.joins.com/news/2025-12-16/business/tech/KAIST-researchers-develop-new-...
1•_____k•32m ago•0 comments

Build a Spike, Not a Triangle

https://holenventures.substack.com/p/build-a-spike-not-a-triangle
1•hholen•34m ago•0 comments

'A hostile climate for workers': US labor movement struggles under Trump

https://www.theguardian.com/us-news/2025/dec/14/labor-movement-union-trump-nlrb
2•robtherobber•39m ago•0 comments

Optimization Countermeasures

https://mcyoung.xyz/2025/12/15/value-barriers/
3•todsacerdoti•40m ago•0 comments

Going Fast, yet Standing Still

https://blog.rybarix.com/2025/12/16/going-fast.html
2•sandruso•42m ago•0 comments
Open in hackernews

A linear-time alternative for Dimensionality Reduction and fast visualisation

https://medium.com/@roman.f/a-linear-time-alternative-to-t-sne-for-dimensionality-reduction-and-fast-visualisation-5cd1a7219d6f
61•romanfll•4h ago

Comments

romanfll•3h ago
Author here. I built this because I needed to run dimensionality reduction entirely in the browser (client-side) for an interactive tool. The standard options (UMAP, t-SNE) were either too heavy for JS/WASM or required a GPU backend to run at acceptable speeds for interactive use.

This approach ("Sine Landmark Reduction") uses linearised trilateration—similar to GPS positioning—against a synthetic "sine skeleton" of landmarks.

The main trade-offs:

It is O(N) and deterministic (solves Ax=b instead of iterative gradient descent).

It forces the topology onto a loop structure, so it is less accurate than UMAP for complex manifolds (like Swiss Rolls), but it guarantees a clean layout for user interfaces.

It can project ~9k points (50 dims) to 3D in about 2 seconds on a laptop CPU. Python implementation and math details are in the post. Happy to answer questions!

aoeusnth1•3h ago
This is really cool! Are you considering publishing a paper on it? This seems conceptually similar to landmark MDS / Isomap, except using PCA on the landmark matrix instead of MDS. (https://cannoodt.dev/2019/11/lmds-landmark-multi-dimensional...)
romanfll•2h ago
Thanks! You nailed the intuition! Yes, it shares DNA with Landmark MDS, but we needed something strictly deterministic for the UI. Re: Publishing: We don't have a paper planned for this specific visualisation technique yet. I just wanted to open-source it because it solved a major bottleneck for our dashboard. However, our main research focus at Thingbook is DriftMind (a cold start streaming forecaster and anomaly detector, preprint here: https://www.researchgate.net/publication/398142288_DriftMind...). That paper is currently under peer review! It shares the same 'efficiency-first' philosophy as this visualisation tool
lmeyerov•2h ago
Fwiw, we are heavy UMAP users (pygraphistry), and find UMAP CPU fine for interactive use at up to 30K rows and GPU at 100K rows, then generally switch to a trained mode when > 100K rows. Our use case is often highly visual - see correlations, and link together similar entities into explorable & interactive network diagrams. For headless, like in daily anomaly detection, we will do this to much larger scales.

We see a lot of wide social, log, and cyber data where this works, anywhere from 5-200 dim. Our bio users are trickier, as we can have 1K+ dimensions pretty fast. We find success there too, and mostly get into preconditioning tricks for those.

At the same time, I'm increasingly thinking of learning neural embeddings in general for these instead of traditional clustering algorithms. As scales go up, the performance argument here goes up too.

abhgh•32m ago
I was not aware this existed and it looks cool! I am definitely going to take out some time to explore it further.

I have a couple of questions for now: (1) I am confused by your last sentence. It seems you're saying embeddings are a substitute for clustering. My understanding is that you usually apply a clustering algorithm over embeddings - good embeddings just ensure that the grouping produced by the clustering algo "makes sense".

(2) Have you tried PaCMAP? I found it to produce high quality and quick results when I tried it. Haven't tried it in a while though - and I vaguely remember that it won't install properly on my machine (a Mac) the last time I had reached out for it. Their group has some new stuff coming out too (on the linked page).

[1] https://github.com/YingfanWang/PaCMAP

threeducks•1h ago
Without looking at the code, O(N * k) with N = 9000 points and k = 50 dimensions should take in the order of milliseconds, not seconds. Did you profile your code to see whether there is perhaps something that takes an unexpected amount of time?
donkeybeer•1h ago
If he wrote the for loop in python instead of numpy or C or whatever it could be a plausible runtime.
yorwba•52m ago
Each of the N data points is processed through several expensive linear algebra operations. O(N * k) just expresses that if you double N, the runtime also at most doubles. It doesn't mean it has to be fast in an absolute sense for any particular value of N and k.
akoboldfrying•28m ago
Didn't read TFA, but it's hard to think of a linear algebra operation that is both that slow and takes time independent of n and k.
memming•3h ago
first subsample a fixed number of random landmark points from data, then...
romanfll•2h ago
Thanks for your comment. You are spot on, that is effectively the standard Nyström/Landmark MDS approach.

The technique actually supports both modes in the implementation (synthetic skeleton or random subsampling). However, for this browser visualisation, we default to the synthetic sine skeleton for two reasons:

1. Determinism: Random landmarks produce a different layout every time you calculate the projection. For a user interface, we needed the layout to be identical every time the user loads the data, without needing to cache a random seed. 2. Topology Forcing: By using a fixed sine/loop skeleton, we implicitly 'unroll' the high-dimensional data onto a clean reduced structure. We found this easier for users to visually navigate compared to the unpredictable geometry that comes from a random subset

HelloNurse•1h ago
You don't need a "proper" random selection: if your points are sorted deterministically and not too adversarially, any reasonably unbiased selection (e.g. every Nth point) is pseudorandom.
jmpeax•3h ago
> They typically need to compare many or all points to each other, leading to O(N²) complexity.

UMAP is not O(n^2) it is O(n log n).

romanfll•2h ago
Thanks for your comment! You are right, Barnes-Hut implementation brings UMAP down to O(N log N). I should have been more precise in the document. The main point is that even O(N log N) could be too much if you run this in a browser.. Thanks for clarifying!
emil-lp•1h ago
If k=50, then I'm pretty sure O(n log n) beats O(nk).
benob•1h ago
Is there a pip installable version?
aw123•1h ago
i asked an llm to test it on the digits dataset and [here](https://imgur.com/a/XAt0VRU) are the results.

``` import numpy as np import time import matplotlib.pyplot as plt from sklearn.base import BaseEstimator, TransformerMixin from sklearn.decomposition import PCA from sklearn.preprocessing import StandardScaler, MinMaxScaler from sklearn.datasets import load_digits from sklearn.metrics import pairwise_distances from sklearn.manifold import TSNE

try: import umap HAS_UMAP = True except ImportError: HAS_UMAP = False print("Warning: 'umap-learn' not installed. Comparison will be skipped.")

class SineLandmarkReduction(BaseEstimator, TransformerMixin): def __init__(self, n_components=2, n_landmarks=50, mode='data_derived', # 'sine' or 'data_derived' distance_warping=1.0, random_state=42): self.n_components = n_components self.n_landmarks = n_landmarks self.mode = mode self.p = distance_warping self.random_state = random_state self.rng = np.random.RandomState(random_state)

    def _generate_sine_landmarks(self, n_features):
        """Generates the high-dim 'sine skeleton'."""
        a = self.rng.uniform(0.5, 2.0, n_features)
        omega = self.rng.uniform(0.5, 1.5, n_features)
        phi = self.rng.uniform(0, 2 * np.pi, n_features)
        
        t = np.linspace(0, 2 * np.pi, self.n_landmarks)
        
        L_high = (a[:, None] * np.sin(omega[:, None] * t + phi[:, None])).T
        return L_high

    def fit(self, X, y=None):
        self.scaler = StandardScaler()
        X_scaled = self.scaler.fit_transform(X)
        n_samples, n_features = X_scaled.shape
        
        if self.mode == 'sine':
            self.L_high = self._generate_sine_landmarks(n_features)
            l_min, l_max = self.L_high.min(), self.L_high.max()
            x_min, x_max = X_scaled.min(), X_scaled.max()
            self.L_high = (self.L_high - l_min) / (l_max - l_min) * (x_max - x_min) + x_min
            
        else: # 'data_derived'
            indices = self.rng.choice(n_samples, self.n_landmarks, replace=False)
            self.L_high = X_scaled[indices].copy()

        self.pca_landmarks = PCA(n_components=self.n_components)
        self.L_low = self.pca_landmarks.fit_transform(self.L_high)
        
        self.L0_low = self.L_low[0]
        self.L_others_low = self.L_low[1:]
        
        self.A = 2 * (self.L_others_low - self.L0_low)
        self.A_pinv = np.linalg.pinv(self.A)
        
        self.L0_sq_norm = np.sum(self.L0_low**2)
        self.Li_sq_norms = np.sum(self.L_others_low**2, axis=1)
        
        return self

    def transform(self, X):
        X_scaled = self.scaler.transform(X)
        
        D = pairwise_distances(X_scaled, self.L_high, metric='euclidean')
        
        if self.p != 1.0:
            D = np.power(D, self.p)
            
        D_sq = D**2
        
        d0_sq = D_sq[:, 0:1]
        di_sq = D_sq[:, 1:]
        
        term_dist = d0_sq - di_sq
        term_geom = self.Li_sq_norms - self.L0_sq_norm
        
        B = term_dist + term_geom
        
        Y = np.dot(self.A_pinv, B.T).T
        
        return Y
if HAS_UMAP: digits = load_digits() X = digits.data y = digits.target

    print(f"Dataset: Digits (N={X.shape[0]}, D={X.shape[1]})")
    print("-" * 40)

    start = time.time()
    umap_reducer = umap.UMAP(n_neighbors=15, min_dist=0.1, random_state=42)
    X_umap = umap_reducer.fit_transform(X)
    umap_time = time.time() - start
    print(f"UMAP Time:  {umap_time:.4f} s")

    start = time.time()
    tsne_reducer = TSNE(n_components=2, perplexity=30, random_state=42)
    X_tsne = tsne_reducer.fit_transform(X)
    tsne_time = time.time() - start
    print(f"t-SNE Time: {tsne_time:.4f} s")

    start = time.time()
    slr = SineLandmarkReduction(n_landmarks=50, mode='data_derived', distance_warping=0.5)
    X_slr = slr.fit_transform(X)
    slr_time = time.time() - start
    print(f"SLR Time:   {slr_time:.4f} s")
    print("-" * 40)
    print(f"SLR vs UMAP Speedup:  {umap_time / slr_time:.1f}x")
    print(f"SLR vs t-SNE Speedup: {tsne_time / slr_time:.1f}x")

    fig, axes = plt.subplots(1, 3, figsize=(15, 5))
    
    sc1 = axes[0].scatter(X_umap[:, 0], X_umap[:, 1], c=y, cmap='Spectral', s=5, alpha=0.7)
    axes[0].set_title(f"UMAP\nTime: {umap_time:.3f}s")
    axes[0].axis('off')
    
    sc2 = axes[1].scatter(X_tsne[:, 0], X_tsne[:, 1], c=y, cmap='Spectral', s=5, alpha=0.7)
    axes[1].set_title(f"t-SNE\nTime: {tsne_time:.3f}s")
    axes[1].axis('off')
    
    sc3 = axes[2].scatter(X_slr[:, 0], X_slr[:, 1], c=y, cmap='Spectral', s=5, alpha=0.7)
    axes[2].set_title(f"Sine Landmark Reduction (SLR)\nTime: {slr_time:.3f}s")
    axes[2].axis('off')
    
    plt.tight_layout()
    plt.savefig('comparison_plot.png', dpi=150, bbox_inches='tight')
    print("\nPlot saved to comparison_plot.png")
    plt.show()
else: print("Please install umap-learn to run the comparison: pip install umap-learn") ```