Google Index Updates
% Indexing pages to be included in search results
The page returns HTTP Code: 403
Server status: BAD!
Google uses the map-reduce pattern. This is when the original selection (index) can be cut into an infinitely large number of partitions. You can cut by hash from hostname. This makes it possible to run your search query not on 1 host but on 1000 hosts at once and then just return the sorted union of the first top n relevant results. In addition, google can cache responses. This reduces the load on duplicate searches.
This pattern is known. It's just that google was the first to set the task of abandoning super-expensive and resource-intensive servers and switched to using many cheap servers but connected in a search grid. In addition, file systems like hdfs make it possible to make an infinitely large file system on ordinary hard drives. This FS certainly has disadvantages. In particular, it may not be consistent. But for a periodically updated text index, this is the norm. Like eventual consistancy.