Someone Searching For Something

Today I've been researching the world of Search Engines. And it's been interesting. Although I haven't gotten to the real meat of how it all works, I have found some interesting stuff. Most of it was found while searching through technical documents on Google's technology. Here's what I found:

  • Directory structure is very important. When Google is looking for related pages, if it doesn't find related pages at <a href="">[/geshifilter-code] then it will next look for related pagest to <a href="">[/geshifilter-code] and keep going up as necessary. (see Finding Related Pages on the World Wide Web)
  • Google and other companies have gone to great pains to make sure you're just not copying somebody's content, so it doesn't pay. Of course, they're probably getting even better so that they don't have to index the Gigabytes of copies on the 'net. (see Finding Replicated We Collection)
  • If a site is hosted on the same IP address, it is closely related. If it's on the same server, it may be related. That's a good reason to have your own server. (see A Comparison of Techniques to Find Mirrored Hosts on the WWW)
  • Oh, and some things are too darn confusing (like this paper on finding good sites)

The interesting part is that all of the material I was reading today was a couple years old, if not downright ancient (as in, 1999 or 2000). But hey, if one understands the very basics of the search engines, they'll more easily get the new technologies that are on top of it, I'd wager. All of the sudden, those SEO books written in 2004 that are on sale at don't look so bad. But still, so much has changed since then.

I should be including a link for a page about the Search Engine Optimization that is built into Drupal, but I can't find one immediately.



Dano- why are you researching this? for you work?

Yeah, work. Well, it'll help with too, for sure. might give you better results.

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <img>
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <c>, <cpp>, <drupal5>, <drupal6>, <java>, <javascript>, <php>, <python>, <ruby>. The supported tag styles are: <foo>, [foo].
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.