HACKER Q&A
📣 spike021

How can I back up an old vBulletin forum without admin access?


I'm part of a car community where our vBulletin forum contains a wealth of information.

What we've run into:

1. Admin/owner is nearly / absolutely unreachable, which causes a variety of issues. Mainly we cannot even request a traditional backup of the database underneath the forum software.

2. As with anything, the forums don't get much active engagement other than older forum regulars. However, Google searches easily find useful posts for things like DIY maintenance, modification installs, test data from driving with ECU tunes, track day experiences, etc.

3. It's easy to point people on other social networks at posts by their URL, but due to neglect the website constantly has problems making access increasingly complicated and inconsistent.

Ideally, it'd be nice to find a way to scrape everything as closely as possible into a manageable database.

Even more ideally, if we could convert said scraped data into a format that is easily publishable to a new platform, that would be handy. Even if the new platform is static and simply renders the old threads.

I can't imagine we are the only forum that is experiencing problems like this with most forums probably dying in the last decade.

Has anyone gone through this kind of archival process with vBulletin before?

Thanks.


  👤 ComputerGuru Accepted Answer ✓
Backing up to WARC, HTML, whatever is great for posterity but not much more than that.

Assuming you're a member of the organization and therefore licensed to use the content (but merely unable to access it): Purely hypothetically speaking, if an admin is this mia and obviously not on top of the job, the odds are probably high that they've neglected maintenance. Old PHP server running out-of-date PHP applications... not the most secure combo in the world. I wouldn't be surprised if there were some magic strings you could send to the server to get it to regurgitate the contents of the database in a more developer-friendly, strongly-typed fashion which you could import to myBB or XenForo and continue chugging along..


👤 TedDoesntTalk
The format you want is WARC. Even the Library of Congress uses it. There are many many WARC scrapers. I'd look at what the Internet Archive recommends. A quick search turned up this from the Archive Team and Jason Scott https://github.com/ArchiveTeam/grab-site (https://wiki.archiveteam.org/index.php/Who_We_Are) but I found that in less than 15 seconds of searching so do your own diligence.

👤 h_o_o_t
One liner:

wget --mirror --convert-links --adjust-extension --page-requisites --no-parent --execute robots=off --wait=0.2 --domains example.com https://example.com


👤 fullspectrumdev
Two options:

1. Scrape it with wget or httrack or similar tool.

2. If the owners not really around it’s probably behind on its security patches, and there’s some relatively recent-ish vB exploits that would let you gain code execution and take a backup the “extremely illegal way” of the entire database, site, etc.

I recommend 1, but 2 is amusing to ponder briefly over a coffee ;)


👤 parasti
We (community for a video game) lost over a decade of accumulated community content due to an unreachable owner. This happened as I was considering scraping, but did not get a chance to implement it. Internet Archive has been a godsend - a lot of public content that was served in a text-based format is available from there. Due to a PHP misconfiguration, even a bunch of binary files were archived because they were being served with PHP errors.

👤 quesera
I would make a read-only archive first, using `wget --mirror`.

This will fix relative paths, download assets, etc and can be published as-is on a new site. I'm ignoring copyright questions in the interest of archiving fragile data.

Then I'd use an HTML parser against the local archive to extract the individual posts, if the additional work was justified.


👤 mikolajw
You can try forum-dl, a forum scraping tool I've been writing for this purpose: https://github.com/mikwielgus/forum-dl

It's single-threaded, alpha-quality software, and still isn't compatible with many forums and themes. But it can export WARCs and may just happen to work for you.


👤 metalforever
The V-Bulletin software publishes the database credentials inside of the web root as part of a settings file. You should be able to gain access to the web root from the web host with the correct validation or access to a company email address. From there, search where the settings php file is containing the credentials to the database and read them out. Use them to log into the database. I don't know why the other comments aren't suggesting this as a first step. You should not need a scraper to read the content back out. I've performed this procedure many times with VBulletin and other similar software like PHPBB and SMF, admittedly a decade back at this point.

👤 merelysounds
> This is a gem to help extract data from vBulletin Forums, specifically those which you have no control over.

https://github.com/lloydpick/vbulletin

This is a very old tool, it’s hard to say if it will work; then again, seems very relevant too so worst case it could provide an inspiration.


👤 jasongill
You think that your car make/model is bad, the one for mine is the same but also has an expired SSL certificate and the outgoing email is broken, so it's slowly getting deindexed from Google and you can't sign up for an account (can't get the verification email) so there's no way to access the attachments, etc. It's sad, honestly.

I ran a car forum (sold to VerticalScope 15+ years ago) and it's still chugging along on the same version of PunBB that I had it on when I left, so it seems that even the "experts" haven't found a simple way to migrate between forum softwares


👤 nneonneo
Over a decade ago I helped quite a few people migrate their forums off places like Proboards, ActiveBoards, and many other "free" forum hosts to their own hosts using phpBB/SimpleMachinesForum etc.; many such hosts had highly customized forum software and no ability to download the database in any usable format. Copies of my converters might still be floating around on the Internet. At least one of these free hosts used something fairly similar to vBulletin, IIRC.

The process is in principle not difficult: scrape the site (I recommend a dedicated scraper for that), then go through and extract everything relevant into a SQL database formatted the way your target forum software expects. The hardest part was recovering BBCode formatting in a usable fashion. Unfortunately my converters were written back when I didn't understand HTML parsing terribly well, so they're a hodgepodge of ugly regexes and handrolled string parsing.


👤 drekipus
This reminds me one of my first programming gigs, the owner of a shop lost his password to his online store front, and he wanted to get off it and get onto Shopify, so I had to write a python scraper to save everything for him and upload into Shopify

👤 mikeInAlaska
When some of my favorite google groups forums were going away, I wrote a perl scraper that started grabbing materials from my groups. Eventually Google perceived it as unwanted or suspicious contact, and shut off access to google for the entire company of 2000 I worked for at the time. Fortunately this was on a timer, but I was sweating bullets.

👤 thexa4
I've backed up a forum[1] by crawling it using wget and creating a WARC file.

I hosted it again by writing a python script[2] to serve responses from that WARC file again and put it behind nginx with caching enabled.

[1] https://forums.empiresmod.com/index.php

[2] https://gitlab.com/thexa4/warc-server

[2, deb package] https://gitlab.com/thexa4/warc-server/-/jobs/5213679726/arti...


👤 mobilemidget
In the spirit of the name of this website, there have been plenty of RCE reports describing how to hack/crack a vBulletin. If the owner is not there, I guess he also doesn't run any software updates?

Though this suggestion might not be acceptable in the eyes of many.


👤 rglullis
If you are looking for a place to host this data, I can gladly help you to bring it to https://gearhead.town (a Lemmy instance that I set up to migrate the reddit car communities to the Fediverse)

👤 shrubble
You should be able to determine who the hosting company is, and offer to pay to keep the site up.

If the hosting company is paid they will make and keep a backup for you but under the permissions/access of the original owner.

If the hosting company gets permission to add you as admin to the site from the site owner, who may not be in touch with you, but may respond to the hosting company, then, (since you are paying the hosting company they will be happy to keep you around) you are home free.


👤 timtamboy63
I wrote something 10 years ago that scrapes a vBulletin forum into a Rails app and exposes a UI so the data is accessible. Happy to share the code with you if you'd like

👤 toyg

   1. Scrape it. Plenty of options here.
   2. set up a new forum. 
      I think the current state of the art is Discourse but I could be wrong.
   3. automate the recreation of posts and threads with some backend script
      (will depend on which sw you picked). 
      On each post, add a link to the original.
   4. Tell everyone about your superduper clone, move the old-timers over.
   5. ...
   6. Profit.

👤 al_borland
There is only one admin? I was an admin on about a half dozen vB sites back in the day (none of them mine) and we had a lot of redundancy there, and they were mostly just to BS with people. I find it surprising a forum of any notable size would have an admin running it as the single point of failure. That’s disappointing.

It sounds like you have some options here. Best of luck.


👤 mminer237
I wonder if it wouldn't be best legally and practically to just have archive.org scrape it for you and link people to that.

👤 marginalia_nu
Maybe reach out to Archive Team, their mission as far as I understand it is to try to preserve stuff like this.

👤 asdefghyk
Is it on archive.org ( The wayback machine) ? In case the online version/ site suddenly disappears. Owner could have passed away or the email whatever may no longer be checked .... etc

👤 dethos
I guess your best chance is to use something like https://archivebox.io/.

👤 kissgyorgy
Download the whole website as it is and host it as a static site somewhere else.

    wget --mirror

👤 gsich
I think vbb has an archive site, similar to how mailing lists look, not sure what the subpath is.

👤 aspectmin
Where was the vBulletin board hosted, and do you have any kind of shell access to the server?

👤 qprofyeh
Think before trying out unlawful exploits like others suggested, please. In a liberal world, offering money might do the trick. Not always, but worth the mention.

👤 tomschwiha
Maybe Httrack may be usefull, it can copy a full website in a folder including ressources: https://www.httrack.com/

👤 bluedino
Would Internet Brands buy it?

👤 ppiwo
Zilvia?

👤 _u0u9
I don't know for vBulletin, only for Invision Power Board, sorry dude.

👤 fexelein
It seems to me like you don’t own this data. If you want to preserve this data, try again with the owner?