HACKER Q&A
📣 deanebarker

How do I easily catalog a couple thousand physical books?


I own a couple thousand books. I'd like to catalog them all. I have a child who is a broke college student, so I was thinking of paying them to do it over break.

What's the most efficient way to do this from an INTAKE standpoint? I need to get all the ISBNs into a database of some kind.

(The only other info I need is whether or not it's a hardcover or software -- that's something only the physical copy can tell me, everything else I should be able to get from the ISBN.)

I don't want my daughter to have to find and key all the ISBNs in. Can they be scanned in some way? Is the ISBN in the UPC code? Could I buy a cheap bar code scanner and just have her scan away?


  👤 kgodey Accepted Answer ✓
I have 3,476 books cataloged at the moment on https://www.librarything.com/. I bought one of their barcode scanners (https://www.librarything.com/more/store/cuecat) to do my initial cataloging but you could also use the scan feature on their mobile app.

I prefer LibraryThing to Goodreads because LibraryThing focuses more on cataloging than social features. Their team also builds software for actual libraries. They source book data from almost 5,000 external sources so it's easy to map ISBN information with the correct edition and cover. You can also get your data out pretty easily, they offer exports in multiple formats.

EDIT: For most books, you can scan the barcode on the back to get the ISBN. Mass market paperbacks seem to usually have separate UPCs. The ISBN barcode is often located on the reverse side of the front cover, so you want to scan that one instead of the one on the back.


👤 ravila4
Barcode Scanner Phone app + Calibre

I've used this (Android) barcode scanner before: https://play.google.com/store/apps/details?id=com.sukronmoh....

On Calibre, simply go to "Add Books" > "Add books by ISBN", and paste a list of ISBNS. It will automatically download metadata and images for them.


👤 PaulHoule
I sampled books from my personal collection and found many didn't have an ISBN. Officially ISBN started in 1970, and I just found one on a book that was printed in 1972, I'm not sure what the adoption curve was like.

ISBN is also not guaranteed to be a primary key. It's designed to serve the needs of new book sellers. If a book goes out of print the publisher can reissue the same ISBN to a new book. It's unusual, but South End Press notably was resentful about paying for new blocks of ISBN numbers and recycled ISBNs to "stick it to the man".

Some books have an ISBN barcode on them

https://en.wikipedia.org/wiki/International_Standard_Book_Nu...

but you don't want to waste time with a "cheap" barcode reader made in China and sold on EBay. I have played around with those and find that they read barcodes when they feel like reading barcodes and it is quicker to type the codes in.


👤 dhosek
I'll second the recommendation for LibraryThing. I wanted to be able to shelve books by LoC call number and this has a pretty good (although not always complete) lookup for most call numbers (I have learned how to generate a call number for books that don't have them and also discovered that the University of Chicago Library doesn't use the same cutter numbers that the LoC and most other libraries use).

LibraryThing's mobile app will scan barcodes just fine.

The gotcha is that mass-market paperback before sometime in the 80s (I probably have the date wrong) do not have the ISBN in them. These will need to be entered manually (not to bad with the mobile app which has a dedicated ISBN keyboard). It can also look up by the LOC catalog number (which is not the call number but rather a consecutively assigned number which can be found on the copyright page of books published starting some time in the 1960s).

ISBN, by the way, will tell you the format of the book. Paperback and hardcover books have separate ISBNs.


👤 anfractuosity
I used this android app to scan the ISBNs of my books - https://play.google.com/store/apps/details?id=com.eleybourn....

It's OSS: https://github.com/eleybourn/Book-Catalogue

From what I recall it also pulls additional info about the book from online.


👤 undoware
I had a gig doing this once in grad school. Here's my method. It worked great:

First, close the library (or library section) until your work is complete. It's critical that the books not go wandering or get rearranged during this process.

Next, grab an SLR (or equivalent mirrorless) camera with video mode. Set it to video mode. In good lighting, play it over the shelves, one by one, from left to right. Slowly.

Make sure the spines are all legible. This is your set-of-books.

Set yourself or someone else up transcribing the titles from the recording, in the order shelved. Check it a couple of times. If you missed a book, or couldn't read the spine from the recording, add it here.

Once you are certain your list is accurate and complete, print (or put on your phone) the list of books. (Still in the order shelved.)

Now, again, working top-down, left-to-right, take books out in sets of eight. (I like eight because it's a nice round number, it's near miller's magic number, and it's also a number of books I can typically carry.)

For each 'byte' of eight books, take your SLR and, in photo mode, take a pic of the frontmatter page of that book -- the one containing the date of publication, and, most critically, the ISBN.

Put the eight books back on the shelf and take another eight. Repeat until complete. Be sure not to miss a book.

Now you have a list of books and a set of pics. Guess what? They are the same length and in the same order. So, book 1 on your list is the first pic on your SLR. And so on.

Now, you can OCR those pics for the ISBN. As backup/redundancy you can grab other info as well, e.g. publisher, etc. to sanity-check the results of your ISBN lookup.

Congratulations, you now have enough information -- a title and and ISBN -- for e.g. Google Books to pull up the rest of the info, which you can sanity check against the other deets you OCRed out of the frontmatter page.

Final tip: Calibre has a book information lookup thingy; it wasn't what I used back in the day, but AFAIK it should work great. It may be possible for you to simply populate the Calibre book list with titles and ISBNs, and have it just magically whisk other details -- date of publication etc. -- into the appropriate fields. Again, you can cross-check these (either exhaustively or spot-check) against the OCRed contents of the frontmatter pages, which (again) you associated with a book title in the initial step.

Happy librarianing!


👤 AndrewLiptak
There was a great bit of free software that I had downloaded a bunch of years ago, LibraryDB, which allowed you to set up your own software. If memory serves, you could hook up a laser scanner and scan the barcode, which would make things pretty easy for you, but I can't seem to find the software online anymore.

But to be honest, I've always found cataloging and data entry to be a lot of fun, and there's something meditative about entering a book's title, date, author, ISBN, etc. into a system. I found that it helped me figure out what I have and think about why I'm keeping some books. It also led to some neat discoveries about certain books. I found a couple that had been signed that I'd never realized had been signed!


👤 mystixx
I use the free version of Libib.com [1]. Both the Android and iOS apps work just great. The app has an integrated barcode scanner and automatically looks up for the book's info. You can even export the catalog as csv.

[1] https://www.libib.com


👤 cratermoon
Instead of spending money on hardware and software that will be flakey, time-consuming, and error-prone, put the money to good use.

Call up the nearest college or university with a library science program. Ask them for the names of students specializing in curation and cataloging. Contacts the students and tell them how much money you have, how many books you want to catalog, and arrange for them to take care of it.


👤 xyzzy21
Honestly?

I have about 5,000 books. I just sat down with a spreadsheet and entered the particulars for every one like any librarian in existence over the last 2000 years would do. I did this over a few months with 1-8 hours of effort in a burst.

A lot of things still have no quick and automated solution.

When I'm doing accounting and financial analysis, it's still primarily a numbers grind of data entry, cross checking and analysis.


👤 tomjen3
I did that for about 70 books recently. I purchased a scanner for very little (about 200 dkk, no idea what it would cost in USD), put it in the USB on my laptop and opened a spreadsheet. The handheld scanner automatially "enters" a newline once it has written the ISBN number, so the only thing you have to do is move onto the next book. You can scan books faster than you can pull them out and put them back.

Actually getting the book info from the ISBN was very time consuming, I didn't know about Library Thing.

You can almost certainly use a smartphone, but that will be a lot slower, comes with the risk of dropping it and then you don't have a bar code scanner left over for other projects.


👤 pronoiac
As I put some books into storage, the LibraryThing app on my phone could use the camera to scan the isbn bar codes for my books, and I could tag them with "box 6" and the like. Searching for books without isbns wasn't hard.

👤 spaztastical
I have been searching around off and on. I found a HN thread for years ago too!

It seems that people have favourites but there is nothing leading the charge. I know someone that has a barcode reader and wrote ee's own code to parse it for a database, even printing personal barcodes to handle repeats (sentimental value, different format).

I think ISBNs sometimes are different soft vs hard, BUT ISBNs are new and sometimes get reused, plus the barcode might be for a category or "book/zine of series" and not a particular copy.

I know Amazon, and thus goodreads, can now do search by cover - it is amazon, but it does avoid a lot of issues. Goodreads might be best for you just for phone and cover scanning. It is, however, slow. You can use shelves/tags for locations but it is ... slow.

I think the best system would be a combo of cover scanning, cataloging your photo in the entry, and notes of signed/writen notes. BUT for location tying it to a barcode reader that can scan the new shelf's barcode and the book, but that requires personally barcoding books (please use removable stickers!) and is a bit over involved.

Personally, the weird case I really want and am not sure how to peice together is a way to pull good spine photos/digitals. I know mu books by spine so I'd love to see if a book catalog could have a spine database and be used to make a virtual library.

edit: Found the thread. Last I read it did not have anything particularly good, but a few workable options. https://news.ycombinator.com/item?id=19817219


👤 mattowen_uk
I don't have the code to hand at the moment, but when my partner and I catalogued all our fiction books (over 2,000) I discovered that you can search Amazon via ISBN, so the workflow was:

1. Scan book ISBN via a standard barcode reader (in keyboard emulation mode) [1]

2. The barcode reader 'types' in the ISBN into the GUI of my homebrew app along with a new-line character.

3. The app takes this ISBN and searches Amazon via it's API, then parses the results. If there's more than 1 result it just picks the first one (which is 99% correct), if it's not correct there's an option to override or enter details manually.

4. The app then queries the amazon API again for the book details, and places them all into a record in a SQL DB.

5. For reasons I can't quite remember the app also goes to the actual product page of the book on Amazon and grabs the high resolution picture of the book cover (which only some times matches the edition of the book we actually own) and stores that locally.

6. There's a web app for searching the DB that we use when we can't remember which books we actually own (first world problems!)

7. Any new book that enters the house is catalogued BEFORE it is even allowed in the Library.

It's a basic set-up, but other than the Amazon API, which is abstracted via my own functions, it's not dependant on any other (closed or FOSS) book management software.

---

[1] A cheap one like this will do it: https://amzn.to/3CyqaAC


👤 ripperdoc
I'm not sure of the value for me to catalog all books just for the titles. What I would like is whenever I search Google, it would also search my books and tell me which book to open. It feels like a lot of knowledge sits on my shelves that I'm not really using. Not sure if that exists, but obviously the hard part is getting access to full-text contents of books - scanning pages by myself would be extremely time consuming.

👤 quercusa
My family has used ReaderWare for years. It's got smart cataloging across multiple sources (including LoC for old books w/o an ISBN) and runs on a number of platforms. We use an old USB CueCat.

https://www.readerware.com/index.php/products/details/books_...


👤 Poiesis
While I haven't used it (and it's a Mac-only app with a companion iOS app for scanning), I've heard many people rave about Delicious Library from Delicious Monster: http://www.delicious-monster.com

👤 nsm
I cobbled together something similar using a couple of javascript libraries. It is a really simple web-app that you can open on any phone, and it uses the phone camera to scan the barcode. It saves the results to a Google Sheets (what I wanted). The code is public, mostly because I couldn't bring myself to clean it up. If you are interested, I could make it public. I wrote about it at https://nikhilism.com/post/2021/tracking-books-i-read-using-...

👤 chubot
Honest question: what’s the benefit of cataloguing books? I’m surprised so many people do it.

I have 3 tall bookshelves of books but don’t really feel the need to catalogue them. I sort them by topic and sometimes physical size and it seems fine.


👤 poxwole
I have a few thousand books myself so I wrote this. https://github.com/konsbn/xlibris this is almost exactly what you want

👤 pahool
The "Handy Library" Android app includes an ISBN scanner and allows you to import and export collection data.

You could also check out some of the tooling and APIs around openlibrary.org. Unfortunately, I think it's basically a moribund project, but may have sufficient tools for your needs. I know they have a list feature, but I don't think ingestion is particularly easy; nor am I sure of the import/export functionality around their lists.

edit: I'd forgotten about librarything (mentioned in another comment). They have better tooling that openlibrary.


👤 lawrenceyan
As an interesting anecdote, the history of book digitization and its implications in fair use / copyright in regards to what you're trying to do is actually pretty storied (primarily litigated between Google and the Authors Guild over the course of the past decade).[0]

[0] - https://cdlib.org/services/pad/massdig/mass-digitization-his...


👤 rekabis
I’ve been using BookCrawler (iOS) for the last decade-plus to keep track of what I get (to avoid accidentally purchasing multiple copies), but I’ve discovered that at around 3,000 entries both the search functionality as well as all internal stats suddenly took a vicious and permanent dirt nap, and never recovered.

This threshold was reached a few years ago, so who knows how many books I have now. Everything else in that app works spectacularly fine, so… ¯\_(ツ)_/¯


👤 yldreader
Another vote for LibraryThing.com and a barcode reader or smartphone. I scanned in ~1100 books, and only ran into 15-20 without ISBNs. All but 3-4 of those were easily input by title. But I like it because its very easy to keep up, and isn't a moneygrubbing advertising/tracking site like Goodreads. The phone app is very handy when wandering in bookstores-- easy to check if you own such and such a title.

👤 mindcrime
Using the Goodreads app for the initial data loading might be the easiest way. You don't even have to specifically scan the barcode, it can often identify your book from the cover alone. I'll defer to what @Jtsummers said about getting your data out of their database though, as I have not tried that part myself (yet).

👤 Jtsummers
If you're willing to use it, Goodreads lets you scan books to add them to your collection. I've not used it in a while, but I believe there are easy enough ways to get your data out once entered into it. I did that once a long time ago, I imagine it hasn't gotten too much worse since then.

👤 BigBalli
https://MyBookList.club Scan the ISBN and it adds it to your library. Then you can also export to CSV if you want.

Disclaimer: I made the app. Happy to give free promo codes to anyone interested in trying and give feedback!


👤 metaloha
You could put together an OCR app for their phone that could scan the title and author from the spine of the book (or cover) and do a lookup against something like the Google Books API or Open Library to get the ISBN (or store the work in your account on that service).

👤 spaztastical
This might have some ideas, but it looked like no particularly good solutions: https://news.ycombinator.com/item?id=19817219

👤 marzetti
'My Library' by Julien Keith, is great for Android users... Over 1m downloads and 4.6 star average... My own library is only about 600 odd books, but it was easy to use, interesting, and great now it's done!

👤 reilly3000
I have a lovely $5 USB barcode scanner that was pulled from an old time clock system. It’s fast, simple, and acts as a keyboard. That plus Calibre would be good, but a spreadsheet would work just fine too.

👤 rbobby
Buy a cheap hand held battery operated wireless barcode scanner (cheap on AliExpress). These work really well for scanning stacks of books... pick the book up, zap, put the book down. You have to config the scanner to operate in "keyboard" mode or some such... basically what you scan gets typed as if from a keyboard.

I used a simple Excel macro for data capture and lookup. Basically when a cell changed (book was scanned) it would request the book data from outpan.com. If outpan didn't know the upc beep and return to the cell, otherwise decode the response (json) and populate the spreadsheet row.

Here's the excel macro (why I used the B column instead of the A column is a longer story):

    Private Sub Worksheet_Change(ByVal Target As Range)
        If Target.Cells.Count <> 1 Then
            Exit Sub
        End If
        
        If Application.Intersect(Range("B2:B99999"), Range(Target.Address)) Is Nothing Then
            Exit Sub
        End If
        
        Dim Ean
        Ean = CStr(Target.value)
        
        Dim Url
        Url = "https://api.outpan.com/v2/products/" + Ean + "?apikey=[haha get your own key haha]"
    
        Dim HttpRequest
        Set HttpRequest = CreateObject("MSXML2.XMLHTTP")
        HttpRequest.Open "GET", Url, False
        HttpRequest.Send
            
        Set json = New VbsJson
        Set o = json.Decode(HttpRequest.ResponseText)
        If Not IsEmpty(o("error")) Then
            Beep
            ActiveCell.Offset(-1, 0).Select
        Else
            booktitle = o("name")
            If IsNull(booktitle) Then
                Beep
                ActiveCell.Offset(-1, 0).Select
            Else
                If IsVarArrayEmpty(o("attributes")) Then
                    Author = ""
                    PublishedOn = ""
                Else
                    If IsEmpty(o("attributes")("Author(s)")) Then
                        Author = ""
                    Else
                        Author = o("attributes")("Author(s)")
                    End If
                    If IsEmpty(o("attributes")("Publication Date")) Then
                        PublishedOn = ""
                    Else
                        PublishedOn = o("attributes")("Publication Date")
                    End If
                End If
                
                Cells(Target.Row, Target.Column - 1).value = Cells(Target.Row - 1, Target.Column - 1).value
                Cells(Target.Row, Target.Column + 1).value = booktitle
                Cells(Target.Row, Target.Column + 2).value = Author
                Cells(Target.Row, Target.Column + 3).value = PublishedOn
            End If
        End If
    End Sub
    
    
    Function IsVarArrayEmpty(anArray As Variant)
        Dim i As Integer
        If IsObject(anArray) Then
            IsVarArrayEmpty = False
        Else
            On Error Resume Next
            i = UBound(anArray, 1)
            If Err.Number = 0 Then
                If i < 0 Then
                    IsVarArrayEmpty = True
                Else
                    IsVarArrayEmpty = False
                End If
            Else
                IsVarArrayEmpty = True
            End If
        End If
    End Function

edit: you will need VbsJson from http://demon.tw/my-work/vbs-json.html (why that's a chinese page I don't know all I know was it was a single file json parser that was easy to work with for this).

edit2: I used this solution to scan and log 750 books in a couple of hours? Maybe 3? It went pretty quick.


👤 Dopameaner
I am awed that you have a couple of thousand physical books.

1) How long did it take for you to collect all of them?

2) What are the books mostly centered on? tech? polticis? fiction?

3) Do you happen to know whats your read? and to be read stat?


👤 NotAnOtter
You definitely want to pick up a handheld bar code scanner and dump all the data into a csv.

From there it would take a few hours of playing with the data to get it in whichever form you prefer


👤 tom-thistime
The bar code scanner idea (already mentioned by several people) is even better than it sounds. Almost magically good. Much, much better than manual entry.

👤 101008
ISBN are different for paperback and hardcovers, even for the same title, so you should be able to get that info from the ISBN :)

👤 throwaway2214
you can write something with https://serratus.github.io/quaggaJS/ I used it in the past, with few tweaks it is very accurate.

and then you can query goodreads or amazon to find the actual book


👤 tantalor
Donate them to a library. Then come back later after the librarian has cataloged them.

👤 _moof
How about this:

1. Make a snapshot of a secondhand book store's online inventory.

2. Sell all the books to them.

3. Diff the inventory.

4. Buy them all back.

Voila!


👤 silicon2401
similar question, is there a good way to do this for video games? I probably have a couple hundred physical games and would loves to not have to manually build a spreadsheet of them

👤 deknos
isbn scanner and tellico, i did this. you can add databases to tellico to ask for your book and it will add the info to your local database.

👤 mongol
Could be worth it to buy a USB scanner.

👤 adamnemecek
I'm interested in using RFID tags to help me locate my books. Does anyone do this?

👤 winsbe01
Apps like QRBot [1] have the ability to scan ISBNs (and barcodes generally), and have a "history" feature that keeps track of what you've scanned and lets you export (to CSV, among others). The app is free on both iPhone and Android (there is a paid version, don't know what extras it has or if it's just ad-free), but may want to verify how much history gets stored before you go scan-crazy.

From a US perspective (may apply elsewhere), for books published relatively recently (within the last ~20 years or so), the ISBN is often part of the barcode on the back of the book (ISBN-13s (the updated standard) start with 978, so this is a good clue that the barcode is an ISBN). For a period of time prior to that (and perhaps still applicable to Mass Market Paperbacks), there is a barcode on the back that is NOT an ISBN, but there is an ISBN barcode on the inside front cover. I've not discovered any systematic way to pull an ISBN out of a non-ISBN barcode (though I haven't dug too far -- my collection hasn't reached 4 digits yet and I've been happy to type when scanning wasn't an option).

Once you have the ISBNs, I like to query against the Open Library API [2], which is a part of the Internet Archive. The information in there is fairly robust, if inconsistent (the capitalization of titles is sometimes as printed on the title page, sometimes Library of Congress format, other minor things). They have a lot of data points available, such as cross-referenced IDs with Goodreads and LibraryThing, but again, this is community-supported data, so YMMV as to completeness or accuracy.

Another note -- many books have separate ISBNs for hardcover editions, trade paperback editions, mass market editions, eBooks, etc (and sometimes don't have an ISBN at all for things like Book of the Month Club editions). I don't know if this is a requirement, or a luxury that big publishers have, but it is something I've noticed (you'll sometimes see multiple ISBNs listed on the copyright page, along with their formats -- also you may see related editions on Indiebound [3], along with their ISBNs). A cursory glance at Open Library doesn't seem to have a data point distinction for this (which is unfortunate), so you may still have to note this, but theoretically it may be possible to get this information from the ISBN directly at some point.

Source for ^^: I read a lot, have a lot of books, briefly ran a (failed) specialized online bookstore, and wrote a CLI tool [4] for myself to solve this very issue.

[1]: https://qrbot.net/locale/en/ [2]: https://openlibrary.org/dev/docs/api/books [3]: https://www.indiebound.org/ [4]: https://github.com/winsbe01/booki


👤 pixel_tracing
You should ask Ancestry.com how they catalog hundreds of thousands of records…

Hint: it’s not manual and it’s completely automated.