Estimated time: 5 minutes read
What is Blockchain Data Indexing?
Blockchain data indexing is the process by which data is extracted, organized, and stored from the blockchain to enable its efficient and fast querying. A blockchain indexer monitors blocks, transactions, and smart contract events, decodes them, and structures them in databases optimized to answer complex queries.
The indexing process usually includes extracting new data from blockchain nodes, processing and transforming it into structured formats with additional metadata, storing it in databases, creating indexes for efficient searches, and finally exposing that data through APIs for querying from decentralized applications (DApps).
A typical use case could be the systematic querying of events emitted by a specific smart contract. In my article on events in Solidity, I explore the common uses of this powerful blockchain feature for recording information.
Since access to the event history via RPC call is usually very limited to just a few blocks (as it is a call that can consume many node resources), indexing presents itself as an efficient and practical alternative to raw querying of the data available on the blockchain.
How to index data with The Graph?
The Graph is one of the first indexing protocols, launched around 2018 as a solution to the difficulties developers faced in efficiently accessing and querying data on the blockchain, especially on Ethereum.
Given its long track record in the Web3 ecosystem, it is one of the most interesting options when choosing a provider of indexed data. It stands out especially for its versatility, as it allows any developer to configure and deploy their own indexer (called a subgraph), tailored to their specific needs, on a decentralized network.
The basic process for indexing data with The Graph consists of:
- Initializing and building the subgraph in the local development environment.
- Deploying the subgraph in Graph Studio, The Graph's testing environment.
- Testing and debugging the subgraph.
- Integrating the subgraph into the DApp.
- Publishing the subgraph on The Graph's decentralized network.
Subgraph Initialization
The first requirement is to install the command line environment, the Graph CLI. It is recommended to install it globally:
npm install -g @graphprotocol/graph-cli@latest
Next, you must initialize the subgraph. I recommend running the following command:
graph init $subgraph-package-name --skip-git
Where "subgraph-package-name" represents the name you will give in your code repository to the specific package that will model the subgraph.
The "--skip-git" option is useful to prevent the CLI from interacting with your Github repository to automatically add the sources.
Next, you will need to enter relevant data for the subgraph initialization via the command line: the blockchain where the smart contract to be indexed is located, the subgraph "slug", the smart contract address, and some other options.
From that moment, the source code files that will allow you to model the subgraph with all its features are generated.
Subgraph Construction
Usually, the source files generated after initializing the subgraph serve as a template to adjust to the needs of each case.
There are three particularly important files that should be customized:
- subgraph.yaml: the subgraph manifest file where its formal description is found. Among other things, for each data source (smart contract address), the entities to be indexed, the events that are the source of the data, and their handlers in the mappings file are described.
- schema.graphql: the schema of the subgraph entities. The elements and their data types are declared.
- Mappings file in AssemblyScript: contains the event handlers with the logic to index the entities defined in the subgraph.
Once the first two files are ready, you must run the following command:
graph codegen
After this step, when the mappings file is also ready, the subgraph is built
graph build
With this command, the subgraph sources are compiled in the "build" directory.
Subgraph Deployment
This step is not essential but is highly recommended to validate the subgraph before publishing it on the decentralized network.
It consists of installing a working version of the subgraph in Subgraph Studio, the development environment of The Graph platform. The main advantage is that it is a completely free operation, with no need to define any API-KEY or subscribe to any paid plan.
Once deployment is complete, you will have an operational endpoint to run queries on the subgraph from your DApp. Main limitations: the subgraph does not appear listed in the public search engine, the Graph Explorer, and its use is limited to 3000 requests per day.
The prerequisite for deployment is to create the subgraph in the Subgraph Studio control panel. To do this, you need to connect a wallet.
The main data to provide are the subgraph "slug" (which will uniquely identify it) obtained from the given name and the description, useful for later display in the Graph Explorer.
Once the subgraph is created (in draft status), you will be able to view the "deploy-key", necessary for the following deployment command:
graph deploy $subgraph-slug
The interactive console will ask for an identifier for the version of the subgraph to be deployed. In general, it is advisable to archive obsolete versions of subgraphs deployed from Subgraph Studio, although the platform itself manages this automatically to avoid keeping many active versions of the same subgraph.
After deployment, the subgraph moves to "deployed" status in Subgraph Studio. From the control panel, you can see the synchronization status (how many blocks have been processed since the initial deployment block of the smart contract that serves as the data source) and the test endpoint URL. You can also view the emitted logs to check if any errors need to be fixed.
Conclusion: first steps for blockchain data indexing
Indexing data is a fundamental step to have the precise information from the blockchain available to DApps.
Since on-chain data storage in EVM networks is limited by design (transaction costs), access to aggregated or structured data from smart contracts by raw data consumption is, in most cases, unfeasible.
To address this shortcoming, various platforms and protocols have emerged in recent years to facilitate the task through data indexing.
One of the most powerful and qualified is The Graph. In this article, we have seen how to create and deploy a subgraph that indexes the precise data for consumption by a particular DApp.
In upcoming articles, I will continue describing the next steps until finishing with the publication of the subgraph on The Graph's decentralized network.
Have you ever needed to access blockchain data that requires prior indexing? Do you use any existing platform or protocol, or do you prefer to develop your own ad-hoc solution? Share your own experiences.
If you want advice on which strategy to follow or need help developing blockchain data indexing, I can help you. Contact me and tell me about your case. Thank you very much!