manybook : fix line break.

This commit is contained in:
Dmitry Voronin 2023-09-30 07:31:19 +03:00
parent e429e0de0e
commit 5ec0c51494

View file

@ -6,7 +6,9 @@ So I got about half a million books. They all came in a non-searchable format -
## Challenges.
First one, like I said before, is working with hundreds of thousands of small files. Sure you can just throw more hardware for any issue. But what if one day I'd need to use this enormous library on something like a laptop or RPI? Sounds like nightmare. The typical go-to solution in a case of many files is to split them in chunks (directories). This worked out well - they already came to me in a group of \~2 GB archives. Keeping them in same groups makes it easy to add new archives - I just add another library as the same "chunk". I'd lose track of book movements if I ever decided to merge them.
Second one is, obviously, search. Each archive ends up being a single Calibre library. I've tried to add all 500k books to a single library, and it didn't end up well. Calibre doesn't allow you to search across multiple libraries at once. And even if it did, I'd still compress each group back, so it is easier to store and also enables checksumming in my case. Solution is dead simple - storing each Calibre database in a text format. It is super easy to search thru text! Calibre allows exporting its database as a CSV file, which allows me to use something like grep (or ripgrep in my case) to easily look up in which group the book I need is located. Neat.   Third one is optional, but storing one big file is much easier than storing thousands of smaller ones. Here is where archival tools come up handy.
Second one is, obviously, search. Each archive ends up being a single Calibre library. I've tried to add all 500k books to a single library, and it didn't end up well. Calibre doesn't allow you to search across multiple libraries at once. And even if it did, I'd still compress each group back, so it is easier to store and also enables checksumming in my case. Solution is dead simple - storing each Calibre database in a text format. It is super easy to search thru text! Calibre allows exporting its database as a CSV file, which allows me to use something like grep (or ripgrep in my case) to easily look up in which group the book I need is located. Neat.
Third one is optional, but storing one big file is much easier than storing thousands of smaller ones. Here is where archival tools come up handy.
## Solutions.
As for grouping, I've tried to keep things look the same as they came to me. Those archives I got were named in ranges, like `000000-134445.zip`. So I kept this numbering with a little chage. I was horrified by a thought that one day they will release books past 1 million, and it will create an extra digit. It'd ruin alphabetical sorting!! And I just added prefixes like `000001_000000-134445`. I can process and store a million books, but no way I can do this for a million of such groups! Even this collection is about 400 GB compressed. This is *"good enough"* for a forseable future.