Hive Punks Update
What a day! I want to appoligize for the issues today, I can't say sorry enough for the events today.
I will take a moment to expain what happened, what went wrong, what didn't and how things are going.
As I said in our talk, Punks on Hive is built on Hive Engine using Hive Engine NFT functionality along with IPFS. IPFS if you are not familiar, is a decentralized storage system but it can be slow. That being said, IPFS is the defacto standard for NFTs and storing media assets and metadata.
Knowing about these issues, I planned ahead to IPFS to stamp on the NFT for posterity purposes the media asset (aka image), metadata (attributes), and transactional data. This data is available on every NFT, and can be viewed using Hive Engine if you know how to do this.
The problem is IPFS is slow, so I moved this data to a third party system (Postgres database) on Supabase. This is an open source Firebase clone. I did a ton of testing prior to launch and even pushed beyond the max distribution of Hive Punks with Real-Time Rarity enabled and performance was amazing. Despite how complicated it is to calculate rarity on the fly, it was working great.
When Punks on Hive launched, the first 1,500 Punks went really well and everything happened within seconds of them being minted. Then Supabased started to act up, it kept freezing, and would allow for a few inserts then went offline for 5-15 minutes. This happened shortly before I went on stage at Hivefest.
The problem go worst as time went on, not only were the delays longer, it was getting further behind on the incoming minting. The minting of Punks didn't slow down and 25% of the total supply was minted in just a couple hours of launch and before I even got on stage but the problems with just kept snowballing.
After some digging, I found the problem was related to uploading image asset. The process Punks on Hive uses is to immediately generate the attributes, image, and metadata and store it on IPFS, then take that IPFS folder and stamp it to a freshly minted NFT on Hive Engine. Another process monitors Hive Engine for new PUNK NFT tokens, and then takes the IPFS data and pulls the image and stores it in a image bucket for fast access and take the metadata to populate a database. This process ensures the IPFS data only has to be accessed once and can eliminate the issues with performance regarding IPFS.
The problem is Supabase kept going down during this process, even those these image files are only 2KB and the metadata and database inserts are equally tiny, the entire system would go down after 2-10 inserts. About 15KB-30KB of data, around the size of a typical email message without images. After repeatedly trying to find a solution and prevent these outages, I realized if I stopped using them for image storage, the problem was less extremely. It was still bad and unreliable and with no solution currently available, but it allowed the process of extracting and update metadata to move forward.
A few hours ago I gave up trying to upload images to them, and started using Backblaze B2 to store the images and had it process all images generated so far in parallel with the metadata catching up The images uploaded and got caught up in fairly short time (they are tiny after all) but the metadata still lags behind by quite a bit. During this process, every single Hive Engine block has to be downloaded and analyzed, which is a very slow process when you fall behind but something that can be done in real-time when you are caught up.
Every time I would start catching up, Supabase would crash or become non-responsive.
At this point, it is still catching up and is 8400 blocks behind. Everyone who minted an NFT received it almost instantly, the Hive and Hive Engine side of things worked well and was quite successful.
As you can see here on the right side, it is caught up and processing incoming transactions and quickly turning them into NFTs, even storing the IPFS data and stamping the NFTs only seconds after they are minted. On the left side you can see the process of indexing Hive Engine blocks and extracting the NFT data into Postgres database and S3 image buckets is far behind.
Throughout the day I tried all sorts of things to eliminate the issues with Supabase, even before the problem I had artificial delays of around 1 second per mint to assure nothing gets rate limited or hammers any resource too hard regardless of how many transactions happened. I fully expected any rush would cause it to fall behind, but it would quickly catch up during the dips between. There were no dips, but even so it would have had no problem keeping up if it didn't keep crashing. It was only when I stopped using their image hosting was it even usable.
To put things in perspective, 3,415 images only takes up 7MB. This shouldn't remotely be a problem. I even updated to Supabase's Pro tier despite the fact I wasn't even remotely near their usage limits.
Right now I have a process that pulls down the entire database of every punk, scans for any of them that I don't have an image stored from them and uploads the image to B2. The Punks front end has been updated to serve images from B2. This allowed it to come back to life, but the metadata is still stored on Supabase and anytime it has a hiccup, it causes the site to act up as well. Right now I'm trying to let it catch up so all the minted NFTs can be entered into the database. The entire database is currently around 1.5, small enough to nearly fit on a floppy disk.
Once the process is done, and everything is caught up, I will look into other options. Unfortunately a large portion of the site was built around Supabase, and I can't just rip it out in five seconds. Also, almost 50% of the Hive Punks have been minted, we are currently at 4,208 out of 10,000 Punks minted. Once all 10,000 Punks are minted, a database isn't even needed, all the metadata can be stored in a JSON or CSV file like most NFT projects do. All the images still need to be hosted somewhere so no one has to deal with IPFS problems.
One observation throughout this, the Hive blockchain and even Hive Engine performed well and had virtually no impact despite the massive amount of transactions done today. Most blockchains would have been on their knees trying to process 4,200 NFTs in half a day.
I can't express how sorry I am about how this transpired. I've been up for a couple days now preparing a great experience for HiveFest and do something no other NFT project attempted, and it was all thwarted, not by the blockchain, but by a third party service. What really kicks me in the gut is how well everything worked during the 2-3 weeks of development and not a single problem showed up even with 10K+ Punks on testnet and how pathetically small this dataset is to even be a problem. I could literally buy a box of floppies and store all the images and metadata for the entire project if every single one was minted. I know part of the problem is so many people viewing so many of these tiny images at once. I had plans to limit how many you can view at one time, but this broke the ability to see Punks update in real time.
As of right now, 4,234 Punks have been requested, and 4,234 Punks have been minted.
The metadata extraction is about 7,800 blocks behind, and trying it's best to catch up. Things have been fairly stable for the last hour but looks like the database needs another reboot. Still 1 hour is far better than 10-20 seconds most of the day.
I am able to keep up pushing the images as fast as they get entered into the database. If the database doesn't keep crashing, it will catch up, but every time it crashes it sets it back a bit.
I hate that this has put a dark cloud on what I think is a really cool twist on something that was was done very differently on other chains and most couldn't even handle doing it this way and it wasn't the chain that failed us.
UPDATE: The more I think about it, the more I think I know the problem.
The database is silly small, even with 10K punks it is likely only like 5MB, the problem I believe lies in the fact I took advantage of Supabase's real-time features, which really can't handle hunreds or thousand users updating frequently, even when the data is tiny. I believe if I never tried to do real-time updates, it would have never been a problem. The problem wouldn't exist if I generated all 10K ahead of time like all NFT projects I know of. You would think a floppy disk worth of data, wouldn't be a major problem for a service promoting "real-time updates" with millions in funding.
With this in mind, I turned on Infinite scrolling and disabled real-time updates, you will have to hit F5 now to see updates. If my theory is right, this should resolve the issue allow things to catch up.
UPDATE 2: Been on the phone with Supabase and looks like someone has been attacking the instance trying to brute force it. Research ongo