{"id":270,"date":"2025-12-18T19:55:52","date_gmt":"2025-12-18T19:55:52","guid":{"rendered":"https:\/\/avluz.com\/blog\/?p=270"},"modified":"2025-12-18T20:12:50","modified_gmt":"2025-12-18T20:12:50","slug":"we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale","status":"publish","type":"post","link":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/","title":{"rendered":"We Hit 6 Billion MongoDB Documents (And Lived to Tell the Tale)"},"content":{"rendered":"\n<p><\/p>\n\n\n\n<p>So here&#8217;s the thing about running a database at scale &#8211; nobody tells you about the weird stuff until you&#8217;re already knee-deep in it. At <a href=\"https:\/\/avluz.com\" target=\"_blank\" rel=\"noreferrer noopener\">Avluz.com<\/a>, we crossed 6 billion documents in our <a href=\"https:\/\/www.mongodb.com\" target=\"_blank\" rel=\"noreferrer noopener\">MongoDB<\/a> cluster this year, and honestly? It was equal parts terrifying and fascinating.<\/p>\n\n\n\n<p>We started on AWS like everyone else. Three years in, our monthly bill hit $7,500 and our CFO was giving me <em>that look<\/em> in every meeting. We moved everything to <a href=\"https:\/\/www.ovhcloud.com\" target=\"_blank\" rel=\"noreferrer noopener\">OVH<\/a>, spent four months optimizing with help from <a href=\"https:\/\/www.genspark.ai\" target=\"_blank\" rel=\"noreferrer noopener\">GenSpark AI<\/a>, and now we&#8217;re paying $2,180\/month for better performance.<\/p>\n\n\n\n<p>Here&#8217;s what actually happened.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The &#8220;Oh Crap&#8221; Moment<\/h2>\n\n\n\n<p>Picture this: It&#8217;s 2AM, I&#8217;m getting Slack alerts that queries are timing out, and our main collection just hit 4.8 billion documents. Our carefully-tuned indexes? Useless. Our query times went from &#8220;pretty good&#8221; to &#8220;are you even trying?&#8221; in about two weeks.<\/p>\n\n\n\n<p>That&#8217;s when we knew we had to do something drastic. The AWS bills were bad enough, but watching our p95 query times climb to 8 seconds? That was the real pain.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Why We Ditched AWS for OVH<\/h2>\n\n\n\n<p>Look, AWS is great. Their managed services are top-notch. But when you&#8217;re running dedicated <a href=\"https:\/\/www.mongodb.com\" target=\"_blank\" rel=\"noreferrer noopener\">MongoDB<\/a> instances and you know what you&#8217;re doing, the cost difference is just&#8230; brutal.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.genspark.ai\/api\/files\/s\/jGwJCBzA?cache_control=3600\" alt=\"Cost Comparison\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-1.png\" alt=\"\" class=\"wp-image-272\" srcset=\"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-1.png 1024w, https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-1-300x169.png 300w, https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-1-768x432.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Check out these numbers:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>AWS:<\/strong> $7,500\/month for our replica set<\/li>\n\n\n\n<li><strong>GCP:<\/strong> $6,800\/month (we checked)<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/www.ovhcloud.com\" target=\"_blank\" rel=\"noreferrer noopener\">OVH<\/a>:<\/strong> $2,180\/month for the same specs<\/li>\n<\/ul>\n\n\n\n<p>That&#8217;s not a typo. Same hardware &#8211; 32 cores, 256GB RAM per node, NVMe storage. The catch? <a href=\"https:\/\/www.ovhcloud.com\" target=\"_blank\" rel=\"noreferrer noopener\">OVH<\/a> doesn&#8217;t hold your hand as much. But that&#8217;s fine because we were already managing everything manually anyway.<\/p>\n\n\n\n<p>The real kicker? <a href=\"https:\/\/www.ovhcloud.com\/en\/network\/vrack\/\" target=\"_blank\" rel=\"noreferrer noopener\">OVH&#8217;s vRack private network<\/a> between servers is completely free. AWS was charging us $459\/month just for replication traffic between nodes. That&#8217;s $5,500 a year on network transfers alone. For data that never even leaves the datacenter!<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Our Sharding Disaster (And Recovery)<\/h2>\n\n\n\n<p>When you hit billions of documents, <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/sharding\/\" target=\"_blank\" rel=\"noreferrer noopener\">sharding<\/a> isn&#8217;t optional. But man, can you screw it up.<\/p>\n\n\n\n<p>Our first attempt:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>sh.shardCollection(\"avluz.events\", { user_id: 1 })\n<\/code><\/pre>\n\n\n\n<p>Seemed logical, right? Every query filters by user_id, so let&#8217;s shard on that. Except we have power users who generate 10x more data than normal users. Within a week, some shards were at 900GB while others were chilling at 120GB.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.genspark.ai\/api\/files\/s\/rpXqtDyC?cache_control=3600\" alt=\"Sharding Distribution\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-2.png\" alt=\"\" class=\"wp-image-273\" srcset=\"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-2.png 1024w, https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-2-300x169.png 300w, https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-2-768x432.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Queries on the hot shards were dying. The <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/core\/sharding-balancer-administration\/\" target=\"_blank\" rel=\"noreferrer noopener\">MongoDB balancer<\/a> was moving chunks around constantly, making everything worse. It was a mess.<\/p>\n\n\n\n<p>I spent three days reading documentation, blog posts, and MongoDB&#8217;s forums. Then I just asked <a href=\"https:\/\/www.genspark.ai\" target=\"_blank\" rel=\"noreferrer noopener\">GenSpark<\/a>:<\/p>\n\n\n\n<blockquote class=\"wp-block-quote is-layout-flow wp-block-quote-is-layout-flow\">\n<p>&#8220;I have 6B MongoDB documents with user_id and timestamp. Queries filter by user_id and date range. My shards are super unbalanced. What do I do?&#8221;<\/p>\n<\/blockquote>\n\n\n\n<p>It came back with a <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/core\/sharding-shard-key\/#compound-shard-keys\" target=\"_blank\" rel=\"noreferrer noopener\">compound sharding key<\/a> strategy I hadn&#8217;t even considered:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>sh.shardCollection(\"avluz.events\", { \n  user_id: \"hashed\", \n  timestamp: 1 \n})\n<\/code><\/pre>\n\n\n\n<p><a href=\"https:\/\/www.mongodb.com\/docs\/manual\/core\/hashed-sharding\/\" target=\"_blank\" rel=\"noreferrer noopener\">Hashing the user_id<\/a> distributes power users evenly across shards. The timestamp as secondary key helps with our range queries.<\/p>\n\n\n\n<p>Resharding 6 billion documents took 72 hours of nail-biting, but the results were immediate:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Shard sizes went from all over the place to within 85GB of each other<\/li>\n\n\n\n<li>Query latency dropped from 2.4 seconds to 78ms<\/li>\n\n\n\n<li>The balancer finally calmed down<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">The Index Audit From Hell<\/h2>\n\n\n\n<p>We had 47 indexes across our collections. I knew some were probably useless, but which ones? Going through <a href=\"https:\/\/www.mongodb.com\" target=\"_blank\" rel=\"noreferrer noopener\">MongoDB<\/a> logs manually would&#8217;ve taken weeks.<\/p>\n\n\n\n<p>So I dumped our <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/reference\/operator\/aggregation\/indexStats\/\" target=\"_blank\" rel=\"noreferrer noopener\">index stats<\/a> and slow query logs into a file and asked <a href=\"https:\/\/www.genspark.ai\" target=\"_blank\" rel=\"noreferrer noopener\">GenSpark<\/a> to analyze it. Twenty minutes later, it told me:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>12 indexes had literally zero accesses in 30 days<\/li>\n\n\n\n<li>8 were redundant (covered by other <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/core\/indexes\/index-types\/index-compound\/\" target=\"_blank\" rel=\"noreferrer noopener\">compound indexes<\/a>)<\/li>\n\n\n\n<li>6 had fields in the wrong order for our query patterns<\/li>\n<\/ul>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.genspark.ai\/api\/files\/s\/EBsdCljZ?cache_control=3600\" alt=\"Index Optimization\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"576\" src=\"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-3.png\" alt=\"\" class=\"wp-image-274\" srcset=\"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-3.png 1024w, https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-3-300x169.png 300w, https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-3-768x432.png 768w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>I&#8217;ll be honest &#8211; I felt pretty dumb. Some of these were obvious in hindsight. But when you&#8217;re managing dozens of indexes across multiple collections, obvious things stop being obvious.<\/p>\n\n\n\n<p>We dropped the useless indexes and reordered the problematic ones. Results:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Index storage: 840GB \u2192 380GB (55% reduction!)<\/li>\n\n\n\n<li>Write performance: +28% faster<\/li>\n\n\n\n<li>My stress level: way down<\/li>\n<\/ul>\n\n\n\n<p>The whole process took maybe four hours of actual work. Doing this manually would&#8217;ve been a multi-week project involving spreadsheets, meetings, and probably a few arguments.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">That Time We Melted Our Cache<\/h2>\n\n\n\n<p>MongoDB&#8217;s <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/core\/wiredtiger\/\" target=\"_blank\" rel=\"noreferrer noopener\">WiredTiger storage engine<\/a> cache is supposed to be magical. By default it uses 50% of your RAM. We have 256GB per server, so that&#8217;s 128GB of cache. Should be plenty, right?<\/p>\n\n\n\n<p>Wrong.<\/p>\n\n\n\n<p>Our <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/administration\/analyzing-mongodb-performance\/#cache-hit-ratio\" target=\"_blank\" rel=\"noreferrer noopener\">cache hit ratio<\/a> was stuck at 72%. That means 28% of reads were going to disk. With billions of documents and NVMe storage, it wasn&#8217;t <em>terrible<\/em>, but it wasn&#8217;t great either. Queries were averaging 145ms when we knew they could be faster.<\/p>\n\n\n\n<p>I mentioned this to <a href=\"https:\/\/www.genspark.ai\" target=\"_blank\" rel=\"noreferrer noopener\">GenSpark<\/a> while troubleshooting something else, and it suggested bumping the cache to 80% of RAM. I was skeptical &#8211; doesn&#8217;t the OS need memory? But the logic made sense: we&#8217;re not running anything else heavy on these boxes, and Linux is pretty good at managing tight memory.<\/p>\n\n\n\n<p>Changed one config line:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>storage:\n  wiredTiger:\n    engineConfig:\n      cacheSizeGB: 205\n<\/code><\/pre>\n\n\n\n<p>Restarted the nodes one by one (because downtime is scary), and watched the metrics:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Cache hit ratio: 72% \u2192 94%<\/li>\n\n\n\n<li>Disk IOPS: dropped by 77%<\/li>\n\n\n\n<li>Query latency: 145ms \u2192 78ms<\/li>\n<\/ul>\n\n\n\n<p>Sometimes the simple fixes are the best ones. The <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/reference\/configuration-options\/#mongodb-setting-storage.wiredTiger.engineConfig.cacheSizeGB\" target=\"_blank\" rel=\"noreferrer noopener\">MongoDB documentation<\/a> recommends 50% but notes you can go higher if your workload benefits from it. Ours definitely did.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Connection Pool Shenanigans<\/h2>\n\n\n\n<p>This one almost took us down completely. We have 200 application servers connecting to MongoDB. Each one had a <a href=\"https:\/\/www.mongodb.com\/docs\/drivers\/node\/current\/fundamentals\/connection\/connection-options\/#connection-pool-options\" target=\"_blank\" rel=\"noreferrer noopener\">connection pool<\/a> of 100 connections because&#8230; honestly? That&#8217;s what some blog post recommended three years ago and nobody questioned it.<\/p>\n\n\n\n<p>Do the math: 200 servers \u00d7 100 connections = 20,000 connections trying to hit MongoDB.<\/p>\n\n\n\n<p>MongoDB started refusing connections around 15,000. Things got weird. Queries would randomly fail. Connections would hang. Our on-call person (me) was having a bad week.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.genspark.ai\" target=\"_blank\" rel=\"noreferrer noopener\">GenSpark<\/a> suggested dropping the pool size way down:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>const client = new MongoClient(uri, {\n  maxPoolSize: 25,      \/\/ was 100\n  minPoolSize: 5,\n  maxIdleTimeMS: 30000\n})\n<\/code><\/pre>\n\n\n\n<p>And bumping MongoDB&#8217;s <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/reference\/configuration-options\/#mongodb-setting-net.maxIncomingConnections\" target=\"_blank\" rel=\"noreferrer noopener\">connection limit<\/a>:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code><\/code><\/pre>\n\n\n\n<pre class=\"wp-block-code\"><code>net:\n  maxIncomingConnections: 5000\n<\/code><\/pre>\n\n\n\n<p>Now we&#8217;re at about 5,000 total connections across all apps. MongoDB&#8217;s CPU usage dropped 40%. Connection errors disappeared. Query latency improved 35%.<\/p>\n\n\n\n<p>Turns out you don&#8217;t need a massive connection pool per instance. You just need enough to handle your concurrent queries. Who knew? (Everyone who actually read the documentation properly, probably.)<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Dashboard That Actually Matters<\/h2>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/www.genspark.ai\/api\/files\/s\/ThGqcIkO?cache_control=3600\" alt=\"Performance Dashboard\"\/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large\"><img loading=\"lazy\" decoding=\"async\" width=\"1024\" height=\"572\" src=\"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-4-1024x572.png\" alt=\"\" class=\"wp-image-275\" srcset=\"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-4-1024x572.png 1024w, https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-4-300x167.png 300w, https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-4-768x429.png 768w, https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/image-4.png 1376w\" sizes=\"auto, (max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>After dealing with all this, I realized we were monitoring way too much noise. Most metrics don&#8217;t matter until they&#8217;re already broken. We rebuilt our primary dashboard to show just five things:<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Query time (p95):<\/strong> Currently 78ms. If it hits 200ms, something&#8217;s wrong<\/li>\n\n\n\n<li><strong>Cache hit ratio:<\/strong> Sitting at 94%. Below 90% means we&#8217;re thrashing disk<\/li>\n\n\n\n<li><strong>Active connections:<\/strong> 3,847 right now. Over 4,500 and we start investigating<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/www.mongodb.com\/docs\/manual\/replication\/#replication-lag\" target=\"_blank\" rel=\"noreferrer noopener\">Replication lag<\/a>:<\/strong> 2.1 seconds. Over 10 seconds means a node is struggling<\/li>\n\n\n\n<li><strong>Disk space per shard:<\/strong> Alert at 15% free (learned this one the hard way)<\/li>\n<\/ol>\n\n\n\n<p>Everything else is in a secondary dashboard that we check during incidents. But this five-metric view? It tells us 95% of what we need to know at a glance.<\/p>\n\n\n\n<p><a href=\"https:\/\/www.genspark.ai\" target=\"_blank\" rel=\"noreferrer noopener\">GenSpark<\/a> helped us design this after I fed it six months of incident logs and asked &#8220;which metrics actually predicted our outages?&#8221; Turns out most of them didn&#8217;t. These five did.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">How AI Actually Saved Us Weeks<\/h2>\n\n\n\n<p>Let me be real about the <a href=\"https:\/\/www.genspark.ai\" target=\"_blank\" rel=\"noreferrer noopener\">GenSpark<\/a> thing. It didn&#8217;t write our code. It didn&#8217;t magically fix our database. But here&#8217;s what it did:<\/p>\n\n\n\n<p><strong>Index optimization:<\/strong> Would&#8217;ve taken me two weeks of analyzing logs, testing changes, measuring results. With GenSpark analyzing patterns? Four hours.<\/p>\n\n\n\n<p><strong>Schema redesign:<\/strong> We were restructuring our product catalog. Normally this is weeks of research, testing different approaches, measuring performance. GenSpark gave us three solid approaches with pros\/cons in minutes. We tested the best one, it worked, done in three days.<\/p>\n\n\n\n<p><strong>Query optimization:<\/strong> Our analytics queries were slow. I&#8217;d spend a day staring at <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/reference\/method\/db.collection.explain\/\" target=\"_blank\" rel=\"noreferrer noopener\">EXPLAIN output<\/a> trying to figure out why. Now I paste the EXPLAIN into GenSpark, it tells me &#8220;you&#8217;re doing a collection scan on 2M documents, add this index,&#8221; and I&#8217;m done in an hour.<\/p>\n\n\n\n<p><strong>Connection tuning:<\/strong> This would&#8217;ve been pure trial and error. Test a pool size, monitor for a day, try another. GenSpark gave us a sensible starting point based on our query patterns and we only had to tweak it once.<\/p>\n\n\n\n<p>Total time saved on this project? About 7-8 weeks of work compressed into a week and a half. That&#8217;s not hype &#8211; that&#8217;s me being able to ship this whole migration in four months instead of six.<\/p>\n\n\n\n<p>The key is knowing what to ask and when. GenSpark isn&#8217;t magic. But it&#8217;s like having a senior database engineer available 24\/7 to sanity-check your ideas and point out things you missed.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">The Weird Stuff Nobody Warns You About<\/h2>\n\n\n\n<p><strong>Aggregation queries that lie:<\/strong> MongoDB will happily try to <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/core\/aggregation-pipeline\/\" target=\"_blank\" rel=\"noreferrer noopener\">aggregate<\/a> billions of documents in memory, fail silently when it hits the <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/core\/aggregation-pipeline-limits\/#aggregation-pipeline-memory-restrictions\" target=\"_blank\" rel=\"noreferrer noopener\">memory limit<\/a>, and return incomplete results. Always use <code>{ allowDiskUse: true }<\/code> on big aggregations. Always.<\/p>\n\n\n\n<p><strong>The balancer is chaos:<\/strong> When resharding, the <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/core\/sharding-balancer-administration\/\" target=\"_blank\" rel=\"noreferrer noopener\">balancer<\/a> moves chunks between shards automatically. Sounds great! Except it hammers your cluster and makes everything slow. Schedule <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/tutorial\/manage-sharded-cluster-balancer\/#schedule-the-balancing-window\" target=\"_blank\" rel=\"noreferrer noopener\">balancing windows<\/a> or you&#8217;ll get surprise performance hits at random times.<\/p>\n\n\n\n<p><strong>Backups at this scale are weird:<\/strong> Our full backup is 14.5TB. Restoring from backup takes 8 hours. We test this quarterly because the one time you don&#8217;t test is the one time you&#8217;ll need it and discover it&#8217;s broken.<\/p>\n\n\n\n<p><strong><a href=\"https:\/\/www.ovhcloud.com\/en\/support-levels\/\" target=\"_blank\" rel=\"noreferrer noopener\">OVH support<\/a> is&#8230; different:<\/strong> They&#8217;re helpful, but you need to know what you&#8217;re doing. If you&#8217;re used to AWS holding your hand, OVH will make you Google some stuff. That&#8217;s the tradeoff for paying a third of the price.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What We&#8217;d Do Differently<\/h2>\n\n\n\n<p>If I could go back and redo this whole thing?<\/p>\n\n\n\n<ol class=\"wp-block-list\">\n<li><strong>Start with <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/core\/hashed-sharding\/\" target=\"_blank\" rel=\"noreferrer noopener\">hashed compound sharding keys<\/a>.<\/strong> Don&#8217;t wait until you have problems.<\/li>\n\n\n\n<li><strong>Audit indexes every quarter.<\/strong> They accumulate like junk in a garage. Use <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/reference\/operator\/aggregation\/indexStats\/\" target=\"_blank\" rel=\"noreferrer noopener\">$indexStats<\/a> regularly.<\/li>\n\n\n\n<li><strong>Set up <a href=\"https:\/\/www.genspark.ai\" target=\"_blank\" rel=\"noreferrer noopener\">GenSpark AI<\/a> queries earlier.<\/strong> Would&#8217;ve saved us from at least two mistakes.<\/li>\n\n\n\n<li><strong><a href=\"https:\/\/www.ovhcloud.com\" target=\"_blank\" rel=\"noreferrer noopener\">OVH<\/a> from day one?<\/strong> Maybe. AWS was nice for the early days when we didn&#8217;t know what we were doing. But once we hit any real scale, the cost difference is just too big to ignore.<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">The Numbers That Matter<\/h2>\n\n\n\n<p>Just to close this out with some actual data:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Total documents:<\/strong> 6.2 billion (and growing 85M\/day)<\/li>\n\n\n\n<li><strong>Storage:<\/strong> 14.5 TB compressed with <a href=\"https:\/\/www.mongodb.com\/docs\/manual\/core\/wiredtiger\/\" target=\"_blank\" rel=\"noreferrer noopener\">WiredTiger<\/a><\/li>\n\n\n\n<li><strong>Queries per day:<\/strong> 1.2 million<\/li>\n\n\n\n<li><strong>p95 query time:<\/strong> 78ms<\/li>\n\n\n\n<li><strong>Monthly cost:<\/strong> $2,180 on <a href=\"https:\/\/www.ovhcloud.com\" target=\"_blank\" rel=\"noreferrer noopener\">OVH<\/a> vs $7,500 on AWS<\/li>\n\n\n\n<li><strong>Time to migrate:<\/strong> 4 months<\/li>\n\n\n\n<li><strong>Times we thought it would never work:<\/strong> at least 6<\/li>\n\n\n\n<li><strong>Times we almost gave up and just paid AWS:<\/strong> 2<\/li>\n\n\n\n<li><strong>Current stress level:<\/strong> manageable<\/li>\n<\/ul>\n\n\n\n<p>Running 6 billion documents at <a href=\"https:\/\/avluz.com\" target=\"_blank\" rel=\"noreferrer noopener\">Avluz.com<\/a> taught us that scaling isn&#8217;t about perfect architecture or having infinite budget. It&#8217;s about making smart tradeoffs, knowing when to ask for help (even from <a href=\"https:\/\/www.genspark.ai\" target=\"_blank\" rel=\"noreferrer noopener\">AI<\/a>), and being willing to spend a weekend resharding your database when you have to.<\/p>\n\n\n\n<p>Also, monitor your cache hit ratio. Seriously.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>So here&#8217;s the thing about running a database at scale &#8211; nobody tells you about the weird stuff until you&#8217;re already knee-deep in it. At Avluz.com, we crossed 6 billion&#8230;<\/p>\n","protected":false},"author":1,"featured_media":277,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_yoast_wpseo_title":"Scaling MongoDB to 6 Billion Documents: Our Proven Tips","_yoast_wpseo_metadesc":"Discover how we managed 6 billion MongoDB documents, optimized costs, and scaled efficiently. Read our journey and learn actionable strategies today!","_yoast_wpseo_canonical":"","footnotes":""},"categories":[128],"tags":[],"class_list":["post-270","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-technology"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v26.4 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\n<title>Scaling MongoDB to 6 Billion Documents: Our Proven Tips<\/title>\n<meta name=\"description\" content=\"Discover how we managed 6 billion MongoDB documents, optimized costs, and scaled efficiently. Read our journey and learn actionable strategies today!\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Scaling MongoDB to 6 Billion Documents: Our Proven Tips\" \/>\n<meta property=\"og:description\" content=\"Discover how we managed 6 billion MongoDB documents, optimized costs, and scaled efficiently. Read our journey and learn actionable strategies today!\" \/>\n<meta property=\"og:url\" content=\"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/\" \/>\n<meta property=\"og:site_name\" content=\"Avluz Blog\" \/>\n<meta property=\"article:publisher\" content=\"https:\/\/www.facebook.com\/trueleafarts\/\" \/>\n<meta property=\"article:published_time\" content=\"2025-12-18T19:55:52+00:00\" \/>\n<meta property=\"article:modified_time\" content=\"2025-12-18T20:12:50+00:00\" \/>\n<meta property=\"og:image\" content=\"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/UzeWNAWF.jpg\" \/>\n\t<meta property=\"og:image:width\" content=\"1376\" \/>\n\t<meta property=\"og:image:height\" content=\"768\" \/>\n\t<meta property=\"og:image:type\" content=\"image\/jpeg\" \/>\n<meta name=\"author\" content=\"avluz_admin\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"avluz_admin\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"10 minutes\" \/>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Scaling MongoDB to 6 Billion Documents: Our Proven Tips","description":"Discover how we managed 6 billion MongoDB documents, optimized costs, and scaled efficiently. Read our journey and learn actionable strategies today!","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/","og_locale":"en_US","og_type":"article","og_title":"Scaling MongoDB to 6 Billion Documents: Our Proven Tips","og_description":"Discover how we managed 6 billion MongoDB documents, optimized costs, and scaled efficiently. Read our journey and learn actionable strategies today!","og_url":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/","og_site_name":"Avluz Blog","article_publisher":"https:\/\/www.facebook.com\/trueleafarts\/","article_published_time":"2025-12-18T19:55:52+00:00","article_modified_time":"2025-12-18T20:12:50+00:00","og_image":[{"width":1376,"height":768,"url":"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/UzeWNAWF.jpg","type":"image\/jpeg"}],"author":"avluz_admin","twitter_card":"summary_large_image","twitter_misc":{"Written by":"avluz_admin","Est. reading time":"10 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["Article","BlogPosting"],"@id":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/#article","isPartOf":{"@id":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/"},"author":{"name":"avluz_admin","@id":"https:\/\/avluz.com\/blog\/#\/schema\/person\/f1e2b9512283bef82394d290e0e3307c"},"headline":"We Hit 6 Billion MongoDB Documents (And Lived to Tell the Tale)","datePublished":"2025-12-18T19:55:52+00:00","dateModified":"2025-12-18T20:12:50+00:00","mainEntityOfPage":{"@id":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/"},"wordCount":1808,"commentCount":0,"publisher":{"@id":"https:\/\/avluz.com\/blog\/#organization"},"image":{"@id":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/#primaryimage"},"thumbnailUrl":"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/UzeWNAWF.jpg","articleSection":["Technology"],"inLanguage":"en-US","potentialAction":[{"@type":"CommentAction","name":"Comment","target":["https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/#respond"]}]},{"@type":"WebPage","@id":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/","url":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/","name":"Scaling MongoDB to 6 Billion Documents: Our Proven Tips","isPartOf":{"@id":"https:\/\/avluz.com\/blog\/#website"},"primaryImageOfPage":{"@id":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/#primaryimage"},"image":{"@id":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/#primaryimage"},"thumbnailUrl":"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/UzeWNAWF.jpg","datePublished":"2025-12-18T19:55:52+00:00","dateModified":"2025-12-18T20:12:50+00:00","description":"Discover how we managed 6 billion MongoDB documents, optimized costs, and scaled efficiently. Read our journey and learn actionable strategies today!","breadcrumb":{"@id":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/"]}]},{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/#primaryimage","url":"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/UzeWNAWF.jpg","contentUrl":"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/12\/UzeWNAWF.jpg","width":1376,"height":768,"caption":"Scaling MongoDB from AWS to OVH: How we managed 6 billion documents, reduced costs by 71%, and achieved 78ms p95 query times"},{"@type":"BreadcrumbList","@id":"https:\/\/avluz.com\/blog\/we-hit-6-billion-mongodb-documents-and-lived-to-tell-the-tale\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/avluz.com\/blog\/"},{"@type":"ListItem","position":2,"name":"We Hit 6 Billion MongoDB Documents (And Lived to Tell the Tale)"}]},{"@type":"WebSite","@id":"https:\/\/avluz.com\/blog\/#website","url":"https:\/\/avluz.com\/blog\/","name":"Avluz Blog","description":"","publisher":{"@id":"https:\/\/avluz.com\/blog\/#organization"},"potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/avluz.com\/blog\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"},{"@type":"Organization","@id":"https:\/\/avluz.com\/blog\/#organization","name":"Avluz Blog","url":"https:\/\/avluz.com\/blog\/","logo":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/avluz.com\/blog\/#\/schema\/logo\/image\/","url":"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/11\/logo-icon.png","contentUrl":"https:\/\/avluz.com\/blog\/wp-content\/uploads\/2025\/11\/logo-icon.png","width":640,"height":163,"caption":"Avluz Blog"},"image":{"@id":"https:\/\/avluz.com\/blog\/#\/schema\/logo\/image\/"},"sameAs":["https:\/\/www.facebook.com\/trueleafarts\/"]},{"@type":"Person","@id":"https:\/\/avluz.com\/blog\/#\/schema\/person\/f1e2b9512283bef82394d290e0e3307c","name":"avluz_admin","image":{"@type":"ImageObject","inLanguage":"en-US","@id":"https:\/\/avluz.com\/blog\/#\/schema\/person\/image\/","url":"https:\/\/secure.gravatar.com\/avatar\/5343b3513501d0dbd126f2396b1db6f91a8bd540bbd087428e52c660b9c5c74b?s=96&d=mm&r=g","contentUrl":"https:\/\/secure.gravatar.com\/avatar\/5343b3513501d0dbd126f2396b1db6f91a8bd540bbd087428e52c660b9c5c74b?s=96&d=mm&r=g","caption":"avluz_admin"},"sameAs":["https:\/\/avluz.com\/blog"],"url":"https:\/\/avluz.com\/blog\/author\/avluz_admin\/"}]}},"_links":{"self":[{"href":"https:\/\/avluz.com\/blog\/wp-json\/wp\/v2\/posts\/270","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/avluz.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/avluz.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/avluz.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/avluz.com\/blog\/wp-json\/wp\/v2\/comments?post=270"}],"version-history":[{"count":1,"href":"https:\/\/avluz.com\/blog\/wp-json\/wp\/v2\/posts\/270\/revisions"}],"predecessor-version":[{"id":276,"href":"https:\/\/avluz.com\/blog\/wp-json\/wp\/v2\/posts\/270\/revisions\/276"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/avluz.com\/blog\/wp-json\/wp\/v2\/media\/277"}],"wp:attachment":[{"href":"https:\/\/avluz.com\/blog\/wp-json\/wp\/v2\/media?parent=270"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/avluz.com\/blog\/wp-json\/wp\/v2\/categories?post=270"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/avluz.com\/blog\/wp-json\/wp\/v2\/tags?post=270"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}