Overview
The Solana blockchain stores data in a sequential, append-only ledger. This is great for data integrity and transaction throughput, but comes at a significant cost: it makes querying historical data very inefficient and prohibitively slow. Complex operations often involve filtering, aggregation, or joining data from multiple sources. In these cases, making direct queries to Solana is impractical for most real-world applications. To solve this, most businesses build private indexes of Solana’s historical data.What does indexing Solana data mean?
Indexing is the process of querying data from the Solana blockchain and storing it in a backend database (e.g., PostgreSQL, ClickHouse) that can then be used to readily serve customer requests without needing to directly query the blockchain using Solana RPC calls. An indexer typically does four things:- Backfill historical data: use archival RPC methods to query all historical data
- Stream new data: process new blocks when they are confirmed by the network
- Parse and transform data: extract relevant data from the confirmed blocks (e.g., transactions, state changes, etc.)
- Organize data into a database: update the index with the new data
Why do most companies build Solana indexes?
Companies build Solana indexes because their business depends on providing fast, real-time access to purpose-specific blockchain data that native RPCs don’t offer (e.g. NFT sale history). Companies also leverage custom indices to combine off-chain data (e.g., Centralized Exchange prices, KYC information, etc.) with their on-chain data.Wallet Example
For example, if a Solana wallet needs to quickly return a user’s token accounts and balances, querying Solana directly withgetTokenAccountsByOwner and getTokenAccountBalance is too slow and could make their product unusable. Instead, wallets will typically maintain their own indexes of customer addresses, tokens, and account balances.
Trading Example
Similarly, a crypto trading firm may want to log all trading activity that happens on a specific trading pair (e.g., SOL-USDC) or specific market to backtest their trading algorithms. Directly querying the blockchain for this data would be far too slow for any practical trading analysis. Instead, quant traders may elect to build indexes for the SOL-USDC market, and keep it updated with the latest trades using real-time streaming products like LaserStream.Filtering Example
Imagine a user wants to filter transactions by specific criteria in their frontend application (e.g., by token type, transfer amount, date, or wallet address). Without an indexer, your app would need to scan through millions of transactions across 100s of thousands of blocks, checking each one against the filter criteria. This process is too slow for modern product user experiences.PnL Example
To calculate a trader’s profit and loss (PnL), you would need to:- Find every transaction associated with their wallet in a given timeframe
- Filter out swap transactions and label them as buys or sells
- Determine how many fees the user paid during each swap
- Get the historical price data for each token at the time of each trade
- Aggregate the PnL of each transaction to calculate the trader’s total PnL
Step 1: Get the historical data
The first step to building a Solana index is getting all the historical data that you care about. There are three main ways to do this:- getTransactionsForAddress (recommended)
- getSignaturesForAddress and getTransaction
- getBlock
Method 1: getTransactionsForAddress (recommended)
ThegetTransactionsForAddress RPC method allows you to fetch the full transaction details for an arbitrary segment of blockchain data. Due to its powerful filtering abilities, you won’t waste time retrieving data that is not needed for your index, and because of its reverse search functionality you can get transactions in chronological order.
Steps to use this method
- Determine the timeframe that you need data from and set the filter accordingly
- Set
transactionDetailstofullto get all transaction details - Configure
tokenAccountsfilter to include associated token account transactions if needed - Paginate through the results using
paginationToken - On each iteration, extract the data you need and store it in your database
Benefits of using getTransactionsForAddress
The main advantages of using the gTFA endpoint are speed and simplicity. With slot and time-based filters, token account support, reverse search, and pagination, you can get any data you want, from any time in Solana’s history, all with a single call without complex looping or retry logic. UnlikegetSignaturesForAddress, it can also include transactions involving associated token accounts owned by the address.
Method 2: getSignaturesForAddress and getTransaction
Before the release of gTFA, the standard approach for querying historical data was to recursively loop over signatures usinggetSignaturesForAddress (from newest to oldest) and then calling getTransaction to extract the full transaction details.
Steps to use this method
Here are the basic steps to use this method:- Call
getSignaturesForAddress - Store the signature of the last received transaction of this call
- For the next call to
getSignaturesForAddress, set thebeforeparameter to this signature - Repeat this in a loop for as long as needed
- For each transaction signature retrieved this way, call
getTransactionto get its full transaction details - Insert the relevant data into your database
Downsides of this method
Unfortunately, to use this method you need to:- Start at the newest transaction and work backwards
- Make one additional RPC call for each transaction
- Build a thread-safe queue to handle concurrent processing
- Build logic for retries and backoffs to prevent missed data and getting rate limited
- Does not include transactions involving associated token accounts owned by the address
getTransactionsForAddress instead.
Method 3: Use getBlock
ThegetBlock method is most effective when a high percentage of transactions in your target blocks are relevant to your analysis, such as indexing the transactions of frequently used Solana programs like DFlow’s Aggregator, the Pump.fun program, or Solana’s Token program.
Steps to use this method
The basic process for querying historical data withgetBlock includes:
- Decide on a time range to query
- Convert this time range to slot numbers
- Fetch the corresponding blocks sequentially (forward or backward)
- For each block, filter the transactions that are relevant to your index
- Store the relevant information from them in your index
Step 2: Sync Solana data with your database
After fetching historical data, you need to transform it and store it efficiently in a database. Your storage choice should be tailored to your specific use case — there’s no one-size-fits-all solution. The right database depends on the size of your dataset, latency requirements, query patterns, and team expertise.Option 1: SQL Databases
Storing Solana data in relational databases like PostgreSQL is recommended for most use cases. SQL is flexible, ubiquitous, and easy to learn. Modern relational databases can scale to beyond 100M+ rows, while still offering you the benefits of ACID compliance, complex joins, and powerful secondary indices. Use SQLite for prototyping, local development or when you want zero configuration with a single file database. It’s ideal when your dataset stays under a few gigabytes. Use PostgreSQL for production applications that need data replication, concurrent access from multiple clients, or advanced features like full-text search and JSON operators. For most production-level Solana indexers, PostgreSQL is our recommended choice.Implementation example:
As an example, we will show how to store token transfers in a PostgreSQL database. First, create a table:Option 2: Columnar Databases
Columnar databases are optimized for analytical queries, aggregations, and high-volume time-series data. If you need to index several billion transactions, columnar databases like ClickHouse or Cassandra are your best option. Use ClickHouse when you need real-time analytical queries on large datasets — it’s optimized for fast reads, aggregations, and time-series analysis. Use Cassandra when you need extremely high write throughput, effortless horizontal scaling and high fault tolerance. This makes it ideal for continuously ingesting massive volumes of Solana data.Implementation example:
We will show how to store token transfers in a ClickHouse database. For this purpose, create a table that uses the MergeTree table engine. It is designed for high ingest rates, so it’s ideal for indexing. Use this command:(token_mint, date) is set as both the primary key and sorting key. ClickHouse will order the data on disk according to your sort key. This is optimal for querying a single token mint, and narrowing down the response by date ranges.
Here’s an example query:
FixedString(N) data format which stores exactly N bytes. ClickHouse automatically compresses data, which reduces storage costs by 10-20x and improves query performance.
To optimize query performances, use Materialized Views to pre-compute common aggregations.
For example, you could pre-compute the daily transfer volume of tokens to be used by volume-related charts on a dashboard.
Option 3: Data Lakes
Data lakes are ideal for storing massive amounts of raw and processed blockchain data for long-term archival and analytical queries. A simple implementation uses the Parquet data format with Amazon Athena. Parquet is a column-oriented data file format designed for efficient data storage and retrieval. Amazon Athena is an interactive query service that allows you to analyze data stored in Amazon S3 using standard SQL without the need to set up infrastructure or load data into a separate database.Implementation example:
We want to create an archive of token transfers and query them. First, we need to store them in S3: Create a bucket namedsolana_index and partition your token transfer data by time using this key structure:
Use Indexing Frameworks
Use Carbon and similar frameworks to avoid writing boilerplate code and set up your indexer in hours rather than days.Key features:
- Pre-built decoders for popular programs (Token program, DeFi protocols, Metaplex)
- Configurable data sources (RPC, LaserStream, Enhanced WebSockets)
- Built-in support for both backfill and real-time streaming
- Outputs to multiple storage backends (Postgres comes out of the box)
- Fully customizable: you can set up your own data sources, decoders and data sinks
Step 3: Keep your index up to date
After backfilling historical data, you need a real-time streaming solution to keep your index up to date with new blockchain activity. Without this, your index becomes stale.Method 1: LaserStream (recommended)
We recommend LaserStream gRPC as your default choice for all production indexing use cases. It’s purpose-built for reliable, ultra-low-latency, and fault-tolerant data streaming. Some benefits of using LaserStream include:- 24-Hour historical replay: if your indexer disconnects, LaserStream automatically replays all missed transactions from where you left off
- Automatic reconnection: our LaserStream SDKs (Rust, Go, JS/TS) seamlessly handle network interruptions for you
- Node failover: your LaserStream connection aggregates data from multiple nodes simultaneously, ensuring maximum uptime
How to Use LaserStream for Indexing
Use thesubscribe method to subscribe to blockchain events.
Here are some best practices:
- Narrow your filter as much as possible: Only subscribe to the data you actually need to index to minimize bandwidth consumption and processing needed.
- Use the
confirmedcommitment level: This balances latency and finality. Theprocessedlevel may be too unreliable, whilefinalizedadds ~13 seconds of latency - Set
failed: falseunless you specifically need to track failed transactions - Exclude vote transactions (
vote: false) as they are not relevant for indexing
Method 2: Use Enhanced WebSockets
Enhanced WebSockets are powered by the same infrastructure as LaserStream, and it is a cost-effective real-time streaming alternative to LaserStream gRPC. You should use Enhanced WSS when:- Your application can tolerate occasional data gaps
- Real-time updates are important, but not mission-critical
- You have existing infrastructure to detect and backfill missing data
- Budget constraints are significant and you need to minimize streaming costs
- You’re prototyping or testing before committing to LaserStream
- Speed: Enhanced WSS are fast, but still slower than LaserStream
- Reliability: No historical replay guarantee. If your WebSocket disconnects, you’ll need to manually detect and backfill gaps using RPC methods
- Complexity: Requires additional monitoring infrastructure to ensure data completeness
How to Use Enhanced WebSockets for Indexing
To update an index that stores all token transfers, you would subscribe totransactionSubscribe like this:
Related Methods
getTransactionsForAddress
Fetch full transaction history for an address with filtering (recommended)
getSignaturesForAddress
Get transaction signatures for an address
getTransaction
Retrieve full transaction details by signature
getBlock
Fetch all transactions in a specific block
getTokenAccountsByOwner
Get all token accounts owned by a wallet
getTokenAccountBalance
Get the token balance for a specific account
Get started
Building a robust Solana index and backfilling data requires solving three core challenges:- Efficiently fetching historical data
- Transforming and storing data for quick retrievals
- Keeping indexed Solana data updated in real time
Next steps:
- Sign up for a free Helius account to get API access
- Read the gTFA documentation for information about backfilling
- Explore the LaserStream quickstart guide for real-time streaming