If you work with event sequences (clickstreams, shopping baskets, logs, user journeys), you’ve probably noticed that most embeddings capture co-occurrence, but not how sequences evolve.
I built Event2Vec, a small Python library that learns additive, interpretable embeddings for discrete event sequences.
Core idea:
- Each event has a vector
- A sequence is the sum of its events
- This makes transitions explicit and composable
So you can do vector arithmetic on sequences, not just similarity.
The model learns behavioral shifts (e.g. lighter choices, habitual consumption) and applies them across categories.
API is intentionally simple and scikit-style:
from event2vector import Event2Vec
model = Event2Vec(
num_event_types=len(vocab),
embedding_dim=128,
geometry="euclidean", # or "hyperbolic"
pad_sequences=True
)
model.fit(train_sequences)
embeddings = model.transform(train_sequences)
Use cases:
- Clickstream / funnel analysis
- Basket & customer modeling
- User lifecycle modeling
- Log / trace sequences
- Any ordered categorical data
It’s not meant to replace transformers or RNNs — it’s for cases where:
- You want interpretability
- You care about sequence geometry
- You want something simple and debuggable
sulcan•3h ago
I built Event2Vec, a small Python library that learns additive, interpretable embeddings for discrete event sequences.
Core idea: - Each event has a vector - A sequence is the sum of its events - This makes transitions explicit and composable
So you can do vector arithmetic on sequences, not just similarity.
Example (shopping data):
Δ = E(water_seltzer_sparkling) − E(soft_drinks) E(?) ≈ Δ + E(chips_pretzels) → fresh_dips_tapenades, bread, packaged_cheese
Another:
Δ = E(coffee) − E(instant_foods) E(?) ≈ Δ + E(cereal) → water_seltzer_sparkling, juice_nectars
The model learns behavioral shifts (e.g. lighter choices, habitual consumption) and applies them across categories.
API is intentionally simple and scikit-style:
from event2vector import Event2Vec model = Event2Vec( num_event_types=len(vocab), embedding_dim=128, geometry="euclidean", # or "hyperbolic" pad_sequences=True ) model.fit(train_sequences) embeddings = model.transform(train_sequences)
Use cases: - Clickstream / funnel analysis - Basket & customer modeling - User lifecycle modeling - Log / trace sequences - Any ordered categorical data
It’s not meant to replace transformers or RNNs — it’s for cases where: - You want interpretability - You care about sequence geometry - You want something simple and debuggable
Code (MIT):
https://github.com/sulcantonin/event2vec_public
or
pip install event2vector
Example notebooks: - Shopping baskets: https://colab.research.google.com/drive/118CVDADXs0XWRbai4rs... - Movies: https://colab.research.google.com/drive/1BL5KFAnAJom9gIzwRiS...
Very interested in feedback, real-world use cases, and where this breaks down.