Wallet Clustering: Finding Connected Polymarket Traders
On the blockchain, one person can be many wallets. Wallet clustering pierces this veil, revealing the true network of connected traders behind Polymarket's on-chain activity.
The Problem of Multiple Wallets
Creating a new Ethereum/Polygon wallet takes seconds and costs nothing. This means any Polymarket trader can operate dozens or hundreds of wallets simultaneously, each appearing as an independent participant. For anyone analyzing on-chain data, this creates a fundamental challenge: you can't take wallet-level data at face value.
Consider a scenario where you're tracking smart money convergence — multiple profitable wallets buying the same outcome. If those five "independent" wallets are actually controlled by one person, the convergence signal is meaningless. You're not seeing five informed opinions agreeing; you're seeing one opinion split across five addresses.
Wallet clustering solves this problem by identifying which wallets belong to the same entity, allowing you to aggregate their activity and assess the true independence of any signal. It's the foundation of reliable on-chain intelligence.
Clustering Techniques
Funding Chain Analysis
The most reliable clustering technique traces where wallets received their initial funding. Every wallet needs gas tokens (MATIC on Polygon) and trading capital (USDC) to operate on Polymarket. These funds have to come from somewhere, and the funding chain often reveals connections.
A simple funding chain analysis traces each wallet's first MATIC and USDC transactions backward. If wallet A and wallet B both received their initial USDC from wallet C (a "hub" wallet), they're likely related. More sophisticated analysis traces multiple hops — wallet A was funded by wallet C, which was funded by wallet D, which also funded wallet B through wallet E.
Limitations: Sophisticated operators use mixers, exchanges, or bridge contracts to break funding chains. A wallet funded through a centralized exchange deposit/withdrawal is harder to link than one funded through a direct transfer. However, even exchange-based funding can sometimes be linked through timing analysis — deposits and withdrawals that occur within minutes of each other from wallets that later show correlated behavior.
Temporal Correlation Analysis
Wallets controlled by the same entity often trade at the same times — because they're operated by the same person or the same automated system. Temporal correlation analysis measures the statistical relationship between trading timestamps across wallet pairs.
The technique involves computing the time difference between trades from two wallets across many instances. If wallet A and wallet B consistently trade within 5 seconds of each other across dozens of different markets, the probability of this being coincidental is astronomically low. They're almost certainly controlled by the same entity or automated system.
More advanced temporal analysis looks at session patterns — when wallets become active and inactive. If two wallets always start trading within minutes of each other and stop within minutes of each other, they share an operator even if their individual trades aren't synchronized.
Behavioral Fingerprinting
Every trader has behavioral patterns that are difficult to disguise across multiple wallets. These patterns form a "fingerprint" that can link wallets even when funding chains are obscured and timing is deliberately randomized.
Behavioral features for fingerprinting include:
- Position sizing patterns — Does the wallet consistently use round numbers ($1,000, $5,000) or specific fractional amounts?
- Market category preferences — Does the wallet only trade politics, or politics and crypto but never sports?
- Order type preferences — Does the wallet use limit orders or market orders? At what price increments?
- Hold duration patterns — Average time between entry and exit, and the distribution of hold times
- Reaction patterns — How quickly does the wallet trade after news events? What types of news trigger activity?
Machine learning models (particularly clustering algorithms like DBSCAN or hierarchical clustering) can process these multi-dimensional behavioral features to identify wallets with statistically similar fingerprints.
Gas Price and Infrastructure Analysis
Technical details of how transactions are submitted can reveal shared infrastructure. Wallets operated by the same entity often use the same RPC endpoint, the same gas price oracle, and the same priority fee settings. These technical fingerprints are invisible to casual observers but detectable through careful analysis.
Specifically, look for: identical gas price settings across wallets (suggesting the same bot configuration), transactions submitted through the same relayer or bundler, identical nonce patterns suggesting sequential submission from the same system, and similar transaction construction patterns (function call encoding, parameter ordering).
Building a Clustering Pipeline
Data Collection
Extract all Polymarket-related transactions for your target wallet set from the Polygon blockchain. Include trade executions, token transfers, USDC movements, and MATIC transactions. Store this data in a queryable format (PostgreSQL or Dune tables) for efficient analysis.
Pairwise Similarity Scoring
For each pair of wallets, compute similarity scores across multiple dimensions: funding chain overlap, temporal correlation, behavioral similarity, and infrastructure fingerprint match. Weight each dimension based on its reliability — funding chain evidence is stronger than behavioral similarity alone.
Graph Construction
Build a graph where wallets are nodes and edges represent similarity scores above a threshold. Apply community detection algorithms (Louvain, Label Propagation, or Infomap) to identify clusters of related wallets. Visualize the graph to identify hub wallets that connect multiple clusters.
Cluster Validation
Validate identified clusters by checking whether the clustered wallets' combined activity makes sense as a single entity. Do their combined positions exceed reasonable individual limits? Do they ever trade against each other (which a single entity wouldn't do unless wash trading)? Does their combined PnL tell a coherent story?
Continuous Monitoring
Wallet clustering isn't a one-time analysis. Traders create new wallets, abandon old ones, and change their operational patterns. Build a system that continuously updates cluster assignments as new on-chain data arrives, flagging when new wallets appear that match existing cluster fingerprints.
Sybil Detection Applications
Sybil attacks — where one entity creates many fake identities — are a specific concern on Polymarket for several reasons:
- False convergence signals — A Sybil operator can make it appear that many independent wallets agree on an outcome, tricking convergence-based trading systems
- Volume inflation — Wash trading across Sybil wallets inflates market volume metrics, making markets appear more liquid and active than they are
- Reward gaming — Any system that rewards unique wallet participation (airdrops, liquidity incentives) is vulnerable to Sybil exploitation
- Market manipulation — Coordinated trading across many wallets can move prices more effectively than a single large trade, which might trigger whale-watching alerts
Effective Sybil detection combines all the clustering techniques described above with additional heuristics: wallets created within the same block or transaction batch, wallets with identical token approval patterns, and wallets that interact with the same set of DeFi protocols in the same order.
Practical Applications for Traders
Wallet clustering isn't just an academic exercise. Practical applications include:
- Signal quality improvement — Filter your convergence signals to only count truly independent wallets, dramatically improving signal accuracy
- True whale sizing — Aggregate a whale's activity across all their wallets to understand their real position size and conviction level
- Manipulation avoidance — Identify markets where volume is driven by Sybil clusters rather than genuine diverse participation
- Competitive intelligence — Understand the true scale and strategy of competing traders by mapping their complete wallet networks
For the broader on-chain analysis toolkit that supports clustering work, see our on-chain analysis guide. To understand how manipulation networks use Sybil wallets, explore our market manipulation detection article.
Want to Copy Top Polymarket Traders Automatically?
Polycool lets you follow the best wallets and copy their trades in one tap. No manual tracking needed.
Try Polycool Free →