A Data-Cleaning Case Study From Blackhole DEX

How We Found 2.7M Useless Liquidity Events In a Blackhole DEX Pool

In this article, I want to walk through a real data issue we ran into at lpscanner.xyz: how we discovered it, how we analyzed it, and how we solved it.

Some parts of this article are written in italics. Those sections are more technical and mostly include SQL queries or database details. If you are not comfortable with SQL, you can safely skip them and still follow the main story.

Discovering the Problem

At lpscanner.xyz, we process concentrated liquidity pool data with a pipeline built around several event tables. For Blackhole DEX, the main tables are:

Blackhole DEX BigQuery silver tables: fee_changes, liquidity_movements, pool_creations, swaps
The four BigQuery silver tables that power the Blackhole DEX pipeline

After fetching raw data from Avalanche C-Chain, we transform it into this structure and begin processing it.

pool_creations — one row per pool: chain, protocol, pool_address, token0/token1, fee, tickSpacing, initial tick and sqrtPriceX96, block metadata.
fee_changes — Blackhole pools can change their fee dynamically. Each fee change event is one row: pool_address, new fee value, block metadata.
liquidity_movements — every Mint/Burn: pool_address, salt (position ID), owner, tick range (tickLower, tickUpper), liquidityDelta, token amounts, block metadata.
swaps — every swap: sender, recipient, token amounts, sqrtPriceX96, liquidity at ending tick, tick, block metadata.

Our pipeline works like this:

First, for each pool, we reconstruct the initial liquidity state using liquidity movements from before the investigation window. In our current setup, that window is the last 30 days. That gives us a starting snapshot of the pool, which looks roughly like this:

Initial liquidity snapshot (example) State
tickBoundary    →  liquidityAmount
[60, 120)       →  10,910e6
...

Then we merge all relevant events from the last 30 days into one ordered timeline: fee changes, liquidity movements, and swaps. We process them in on-chain execution order.

If the event is a liquidity movement, we update the liquidity state. If it is a fee change, we apply the new fee to upcoming swaps. If it is a swap, we identify which ticks it crossed so we can distribute volume and fees correctly across tick ranges and LP positions.

For heavy pools, this can already be expensive. Some pools have around 500,000 events, and roughly 20% of them are liquidity movements. Processing a pool like that takes about 1 hour in our current setup.

The Pool That Broke the Pipeline

About 10 days ago, we started processing Blackhole DEX pools and got stuck on this pool:

0xa02ec3ba8d17887567672b2cdcaf525534636ea0
CL1-WAVAX/USDC

At first, it looked like a normal "heavy pool" problem. But when we checked the numbers, something was clearly off:

333,158
Swap events
2,959,785
Liquidity movements

That ratio did not make sense. Even with an active LP bot, you would expect liquidity management to be much more efficient. For example, if a bot adds liquidity before swaps and removes it afterward, you might expect something closer to 2× liquidity movements per swap — not nearly 9×.

Analyzing the Anomaly

We graphed swap counts and liquidity movement counts for the pool over the last 60 days. Blue bars represent liquidity movements, green bars represent swaps.

Daily swapsCount vs liquidityMovementsCount for pool 0xa02ec3ba
Blue: liquidity movements  ·  Green: swaps  ·  The spike begins around March 18

The chart showed a very clear pattern: after March 18, liquidity movements jumped sharply while swaps did not rise in the same way. That immediately suggested that the issue was not organic trading activity — something else was generating a huge amount of liquidity events.

The graph above was created from the output of this query:

Swap vs. liquidity movement count by day SQL · BigQuery
WITH
swaps_counted AS (
  SELECT
    TIMESTAMP_TRUNC(block_timestamp, DAY) AS day_
    , COUNT(*) AS swapsCount
  FROM `lpscanner.avalanche_blackhole_swaps`
  WHERE block_timestamp > CURRENT_TIMESTAMP - INTERVAL 60 DAY
    AND pool_address = '0xa02ec3ba8d17887567672b2cdcaf525534636ea0'
  GROUP BY 1
)
, liquidity_movements_counted AS (
  SELECT
    TIMESTAMP_TRUNC(block_timestamp, DAY) AS day_
    , COUNT(*) AS liquidityMovementsCount
  FROM `lpscanner.avalanche_blackhole_liquidity_movements`
  WHERE block_timestamp > CURRENT_TIMESTAMP - INTERVAL 60 DAY
    AND pool_address = '0xa02ec3ba8d17887567672b2cdcaf525534636ea0'
  GROUP BY 1
)
SELECT
  sc.day_
  , sc.swapsCount
  , lmc.liquidityMovementsCount
FROM swaps_counted sc
LEFT JOIN liquidity_movements_counted lmc ON sc.day_ = lmc.day_

So we looked deeper.

We built a query to identify what we called unutilized liquidity movements: liquidity movements from positions that were opened and closed without being active for even a single swap.

To do that, we:

  • merged swaps and liquidity movements into one unified timeline,
  • tracked the latest swap seen before each event,
  • reconstructed each position's liquidity cycle,
  • and labeled a cycle as unutilized if the position opened and closed before any swap happened after its opening.

We use generatedIndex = block_number × 1,000,000 + log_index to preserve exact on-chain event ordering across merged event types.

What We Found

Out of 2,959,785 liquidity movements, 2,719,998 were labeled as unutilized. The vast majority of those liquidity events came from positions that were never active for even one swap — our processor was spending most of its time handling liquidity that had zero impact on fee distribution or executed trading volume.

Before the query — here is a sample of its output. Some positions continuously add and remove liquidity without any swap event between them:

BigQuery output: positions repeatedly open and close with utilization_label = unutilized
Rows 1130–1133: same position, same tick range [-254776, -254775], cycling open→close→open within seconds — zero swaps between them

Here is the query that produces this table:

Unutilized liquidity cycle detection SQL · BigQuery
WITH
-- Positions active before the 30-day window that contribute to the initial liquidity state
active_positions_feed_before_the_term AS (
  SELECT generatedIndex, liquidityDelta, ownerAddress, salt, tickLower, tickUpper, block_timestamp
  FROM (
    SELECT
      SUM(liquidityDelta) OVER(PARTITION BY ownerAddress, salt, tickLower, tickUpper) AS cumulativeLiquidityByPosition
      , generatedIndex, liquidityDelta, ownerAddress, salt, tickLower, tickUpper, block_timestamp
    FROM lpscanner.avalanche_blackhole_liquidity_movements
    WHERE pool_address = '0xa02ec3ba8d17887567672b2cdcaf525534636ea0'
      AND generatedIndex < 80439933000022  -- limit of investigation period (block_number * 1e6 + log_index)
      AND block_timestamp < CURRENT_TIMESTAMP() - INTERVAL 29 DAY  -- partition pruning
  )
  WHERE cumulativeLiquidityByPosition > 0  -- only positions that were still active before the window
)

, the_list AS (
  SELECT * FROM (
    SELECT
      'lm' AS table_
      , generatedIndex, liquidityDelta
      , SUM(liquidityDelta) OVER(
          PARTITION BY salt, tickLower, tickUpper
          ORDER BY generatedIndex
        ) AS accumulativeL  -- cumulative liquidity for this position at this point in time
      , salt, tickLower, tickUpper, block_timestamp
    FROM (
      SELECT generatedIndex, liquidityDelta, ownerAddress, salt, tickLower, tickUpper, block_timestamp
      FROM lpscanner.avalanche_blackhole_liquidity_movements
      WHERE pool_address = '0xa02ec3ba8d17887567672b2cdcaf525534636ea0'
        AND generatedIndex >= 80439933000022
        AND block_timestamp > CURRENT_TIMESTAMP() - INTERVAL 31 DAY
        AND liquidityDelta != 0  -- liquidityDelta = 0 means fee-collection only, skip
      UNION ALL
      SELECT * FROM active_positions_feed_before_the_term
    )

    UNION ALL

    -- Merge swaps into the unified timeline (NULL for lm-only columns)
    SELECT
      'swa' AS table_
      , generatedIndex
      , NULL AS liquidityDelta
      , NULL AS accumulativeL
      , NULL AS salt, NULL AS tickLower, NULL AS tickUpper
      , block_timestamp
    FROM lpscanner.avalanche_blackhole_swaps
    WHERE pool_address = '0xa02ec3ba8d17887567672b2cdcaf525534636ea0'
      AND generatedIndex >= 80439933000022
      AND block_timestamp > CURRENT_TIMESTAMP() - INTERVAL 31 DAY
  )
)

-- Attach the generatedIndex of the most recent swap seen so far to every event
, with_last_swap AS (
  SELECT *,
    MAX(IF(table_ = 'swa', generatedIndex, NULL))
      OVER (ORDER BY generatedIndex ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
    AS last_swap_gi
  FROM the_list
)

, lm_events AS (
  SELECT *,
    LAG(accumulativeL, 1, 0)
      OVER (PARTITION BY salt, tickLower, tickUpper ORDER BY generatedIndex)
    AS prev_accumulativeL  -- previous accumulated liquidity for this position
  FROM with_last_swap
  WHERE table_ = 'lm'
)

-- Carry forward the generatedIndex of the most recent open event per position.
-- An "open" is: prev_accumulativeL = 0 AND accumulativeL > 0.
, lm_with_cycle AS (
  SELECT *,
    LAST_VALUE(
      IF(prev_accumulativeL = 0 AND accumulativeL > 0, generatedIndex, NULL)
      IGNORE NULLS
    ) OVER (
      PARTITION BY salt, tickLower, tickUpper
      ORDER BY generatedIndex
      ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) AS cycle_open_gi
  FROM lm_events
)

-- A cycle is unutilized if it closed before any swap occurred after its open
, unutilized_cycles AS (
  SELECT DISTINCT salt, tickLower, tickUpper, cycle_open_gi
  FROM lm_with_cycle
  WHERE accumulativeL = 0
    AND cycle_open_gi IS NOT NULL
    AND cycle_open_gi >= 80439933000022
    AND COALESCE(last_swap_gi, 0) < cycle_open_gi
)

-- Label every lm event — 'unutilized' if its cycle matches, 'utilized' otherwise
, lm_labeled AS (
  SELECT l.*,
    IF(u.cycle_open_gi IS NOT NULL, 'unutilized', 'utilized') AS utilization_label
  FROM lm_with_cycle l
  LEFT JOIN unutilized_cycles u USING (salt, tickLower, tickUpper, cycle_open_gi)
)

-- Final output: labeled lm events merged back with swap events
SELECT
  table_, generatedIndex, liquidityDelta, accumulativeL,
  salt, tickLower, tickUpper, block_timestamp,
  utilization_label, last_swap_gi
FROM lm_labeled
UNION ALL
SELECT
  table_, generatedIndex, NULL, NULL,
  NULL, NULL, NULL, block_timestamp,
  NULL AS utilization_label, last_swap_gi
FROM with_last_swap
WHERE table_ = 'swa'
ORDER BY generatedIndex

Solving the Problem

Once we confirmed that, the fix was simple: we dropped the matching unutilized liquidity movement rows from the last 30 days before running the heavy processing step.

2,959,785
Liquidity movements before
239,787
After removing unutilized  ·  ~12× smaller

After that cleaning step, we were able to process the pool successfully without overwhelming RAM or CPU resources.

Why This Matters

After cleaning the data, we successfully processed the pool and published its page on lpscanner: lpscanner.xyz → avalanche / blackhole / 0xa02ec3ba

One notable detail: the largest position in this pool holds 66.9% of total liquidity, and over the last 15 days it earned about $35.3K in swap fees.

LPScanner pool page: largest position holds 66.9% of liquidity, 90.6% APR, $35.3K in fees
Position #1 holds 66.9% of pool liquidity — $2.03M — earning $35.3K in fees over 15 days at 90.6% APR

But the investigation also raised a bigger question.

A large share of these suspicious liquidity movements appears to come from this address:

Contract: 0x07880d914bad719175099a362a88606eb975ffdc
Created ~20 days ago. Generally called by:
0xebd55519e589f6daa0a5284942266b0badf81e91
0x59630d884572199c5f7602cfa36005cef5d68787

That becomes especially interesting in the context of Retro9000 Round 2.

Effect on Retro9000?

At the time of writing, Blackhole DEX is leading the Retro9000 scoreboard with roughly ~1,040 AVAX burned across registered contracts. The pool we investigated is only 1 of 63 registered contracts.

And just from the never-utilized liquidity movements tied to this one contract, roughly 407 AVAX appears to have been burned.

Here is a simple query that anyone can run on a free Dune account and use as a starting point for their investigation:

Cumulative gas burned by contract SQL · Dune / Avalanche C
SELECT
  *
  , SUM(total_gas_cost_avax) OVER (ORDER BY day_)
FROM (
  SELECT
    DATE_TRUNC('day', block_date)                              AS day_
    , "from"
    , COUNT(*)                                                 AS transaction_count
    , SUM(CAST(gas_price * gas_used AS DECIMAL)) / POW(10, 18) AS total_gas_cost_avax
  FROM avalanche_c.transactions
  WHERE "to" = 0x07880d914bad719175099a362a88606eb975ffdc
  GROUP BY 1, 2
)

Query and full output: dune.com/queries/7020324/10941117

That does not prove intent on its own. But it does create a legitimate question: were these transactions meaningful market-making activity, irrelevant external contract activity, or an attempt to farm Retro9000 through artificial on-chain load?

That part is still open.