The categories are needing some re-freshing. Most of them go back to the original site, and while the vast majority are still relevant, there will be some re-naming and some scrubbing, as well as, and the point of this post, some new ones.
I just added a new parent category called “Hardware.” There is a vibrant and active DIY culture in machines, robotics, transporation etc. that capture the very essence of do-it-yourself. Hell, I live right by the MIT robotics museum in Cambridge, there is an amazing amount of cool things people should know and learn about with regards to not just robotics, but machines, and vehicles. I think its cool. Humans are pretty clever, and its neat when they do clever things.
The question has come up more than a few times, what is up with all the dead links?
Well, I went so far as to add an FAQ entry, but let me go into more detail on what is up, and more importantly, what is being done.
The data in the database is about 10 years old. I actually think of that as a bit of an achievement on my part (especially after all of the problems I’ve had over the years), but that is also a problem. Sites move, projects die, owners move on, things change. Because diysearch does not spider the web, the only way to keep links current is to put trust in each individual link owner to keep the data current. I know this hasn’t been terribly easy in the past, but that has all changed now. Keeping your data current is as easy as logging in.
So, that’s now, what about then? The old data is going to be pruned. What I’ve done is, wrote a scipt that will validate each and every URL in the database (nearly 20,000 as of this writing). The validation is two tiered. The first tier basically just checks to make sure the URL is a valid URL (following the proper syntax). The second tier actually is a bit more complicated. This is where the indexing engine tries to connect to the target URL, which then interrogates the HTTP return code. If its a good return code, then the URL is marked “approved” in the database and is included in the index. If the HTTP return code is determined to be a bad one (i.e. 404 or 501) the URL is flagged as “not approved.”
The URL is not deleted from the database. It is simply flagged and is not included in the index. The owner can come back, at any time, make corrections and when the index job runs again, and assuming the owner fixed the mistake, the URL will then be marked as “approved.”
Continue Reading »
Well, that went well.
Really, the code push is complete. The new site is up and running. Yes, I’ve discovered some bugs, some display problems, I believe, in IE 6, but nothing too terribly wrong. There are some features not implemented yet. They include:
- Newsletter Opt-in/Opt-out (there isn’t a newsletter feature yet, so no one will receive anything anyway)
- Self-service banner insertion program (this is coming)
- Updated content (for merch [i don’t need you], FAQ)
- Misc admin features (that only I see, but eventually will matter when I start sharing the admin functions)
- Bug reporting page to give folks access to mantis so they can report problems/issues
There are other things I am sure, all minor in scope. I need to design the t-shirts (yes, there will be BLACK t-shirts, finally) and few other tid-bits to keep me busy. I urge people to get in touch with me if they spot problems, I’ll give you access to the bug tracking system so you can submit bugs.
Well, as anyone who cares to pay any attention, the technical redeployment (aka launch) of the new application (aka web site) is tonight. So far progress is going well. I had to fix a few problems regarding the messaging system (the site uses messaging, via JMS queues, to handle IPN responses) which are now fixed in the development environment. A quick deploy and re-configuration of the UAT environment, a test or two and we’ll be all set to start the code migration.
But first, I will be running the data scrub process, which will migrate the existing backed-up legacy data into the new production database. This shouldn’t take too much time, in that its just a few SQL scripts to write and execute.
Its currently 8:05 p.m. (EDT) and the site should be up in a few hours. If anything changes that, I will post.
Update: its 1:21 a.m. (EDT) and all of the code and data is pushed into production. The search engine is being indexed right now (so search is not functional until that is complete). Everything else is up and running. There’s content that is still not entirely complete. What else? I think that is it. I am sure other things will crop up. Its the nature of new applications like this.
I took a few minutes and created a Diysearch Google group. I am thinking that this might be a good tool corridnate testing, future development, volunteer efforts, stuff like that. I am not sure just how useful it will be, but I’m willing to give it a go.
Well, check it out, I will be more than happy to send out invitations so you can post. I figured this was a better idea than setting up some kind of bloated web forum system. Yeah, just what I need, another piece of shoddy open source web application nonsense. *sigh* I know its not all crap. In fact, I’ve used some really great quality stuff… and I’m a HUGE supporter of open source software (hell, I use it everyday), but lately I’ve just gotten bitter. I know, a few rotten apples does not spoil the bunch. Sorry, all better now. Wow, that was really off topic.
Anyway. groups! Maybe it’ll be useful. Maybe not.