Technical Details for IungamBot
Statuses of our web crawlers:
IungamBot (PERL) is currently running (since 2024-05-10 01:02:34 PDT).
IungamBot (PHP) is currently sleeping (since 2024-05-10 01:02:34 PDT).
IungamBotJr is currently exclusive (since 2024-05-10 01:02:34 PDT).
IungamBot-Photos is currently running (since 2024-05-10 01:02:34 PDT).
IungamBot-Update is currently idling (since 2024-05-10 01:02:34 PDT).
Progress of our web crawlers:
4846 docs have been crawled, giving us 4653 docs cached into our temporary queue.
So, 193 docs either returned 300, 400, or 500 or were excluded from our search.
Document Index
- What exactly is IungamBot?
- How to get IungamBot not to index part or any of your website
- How to ask IungamBot to exclude certain pages
- Technical information for the IungamBot web crawler
- How to get your website indexed by the Iungam Search engine
- What if IungamBot isn't obeying your robots.txt file?
- Find out what IungamBot looks for when it indexes your website
- A list of protocols for the Iungam search engine
- What to do if IungamBot has already added pages you don't want on our search engine
- Find out what the statuses of the IungamBots mean
- Find out more about the different web crawlers that makes up IungamBot
- Do you have any questions that aren't answered here?
What exactly is IungamBot?
IungamBot is the web crawler that scours the internet for (primarily) aviation documents to add to the Global Aircraft ("GAC")
Iungam Search Engine. Iungam is Latin for "I shall unite", which perfectly describes the purpose of this engine --- to unite
the pages of the internet in one place for easy access. Iungam is composed mostly of member websites, but IungamBot may
sometimes follow links to outside websites and could begin adding those sites to our query. This is a good thing! We are currently
trying to break free of the members-only definition so that we can offer an even better service to all of our visitors.
Many people wonder how you pronounce "Iungam". Clasically Iungam is pronounced as "yungum", but later the Italians invented the
'J' as a consonantal 'i'-- so it can also be pronounced "jungum". Iungere (Jungere) is ultimately the source for the English join, subjugate, and juxtapose.
How to get IungamBot not to index part or any of your website
If you have access to the root of your server (i.e. you aren't hosted on geocities, for instance, where your top level is www.geocities.com/usrname/), you can create a file
called 'robots.txt' and put this in the root of your website. For instance,
http://www.foo.com/bar/robots.txt would not work, but
http://www.foo.com/robots.txt is correct.
Inside the file, you should include the following syntax:
#this will exclude all robots from the entire server
User-agent: *
Disallow: /
#this will allow all robots access to the entire server
User-agent: *
Disallow:
#this will disallow all robots from accessing anything in /cgi-bin/
User-agent: *
Disallow: /cgi-bin/
#this will disallow IungamBot from accessing the file /logbook.doc
User-agent: IungamBot
Disallow: /logbook.doc
#this will exclude IungamBot access to the entire server
User-agent: IungamBot
Disallow: /
|
How to ask IungamBot to exclude certain pages
IungamBot will follow the noindex, nofollow, noarchive instructions in the standard robot meta tag. If you put these meta tags in the head of your document, you can tell IungamBot not to index, archive or follow links
on that particular page. The following syntax is looked for:
#IungamBot can index, follow, and archive the document
<meta name="robots" content="all" />
#IungamBot will retrieve the document but will not add it to our database
<meta name="robots" content="noindex" />
#IungamBot will not take a cache snapshot of the current document.
# Keep in mind that this will significantly lower this document's
# appearance in the results on our search engine.
<meta name="robots" content="noarchive" />
#IungamBot will not follow any links on the current page
<meta name="robots" content="nofollow" />
|
This robots meta tag will pertain to various robots that visit your page. If you want more options or if you want to set restrictions
strictly for IungamBot, use the iungambot meta tag instead of the robots tag. For example:
#IungamBot will not add the document to our database or follow links on
# it, but other robots may
<meta name="iungambot" content="noindex,nofollow" />
|
Technical information for the IungamBot web crawler
Technical information
robot-id: iungambot
robot-name: IungamBot
robot-cover-url: http://search.globalaircraft.org/
robot-details-url: http://search.globalaircraft.org/iungambot/about.pl
robot-owner-name: Charles Munson
robot-owner-url: http://www.globalaircraft.org/
robot-owner-email: server@globalaircraft.org
robot-status: active
robot-purpose: indexing
robot-type: browser
robot-platform: linux
robot-availability: data
robot-exclusion: yes
robot-exclusion-useragent: iungambot
robot-noindex: yes
robot-host: *.globalaircraft.org
robot-from: yes
robot-useragent: GAC IungamBot/1.x
robot-language: php, perl
robot-description: used to build databases of aviation docs for the Iungam search engine
robot-history: Developed by the Global Aircraft Organization
robot-environment: research
modified-date: Wed Jul 02 10:50:38 EST 2003
modified-by: server@globalaircraft.org
|
How to get your website indexed by the Iungam Search engine
Currently only GAC Members can submit a request to add their website to our search engine. If you would like to add your site please
see the
Add Website form.
What if IungamBot isn't obeying your robots.txt file?
There can be a few reasons why IungamBot is being disobedient. One reason is that either the syntax of your robots.txt file is
incorrect or it isn't placed in the top directory of your server. Another reason could be that IungamBot chose not too look for this
file in the first place, or perhaps it got lost while trying to find it. Another reason could be that your website was crawled before
you put this file into place. In this case, please wait a few weeks for IungamBot to realize that you have put these new restrictions into effect.
Find out what IungamBot looks for when it indexes your website
Currently the IungamBot will search for and follow all HREF and SRC links. Then it will strip out all HTML and JScript and save this as the cache (i.e.
if you didn't specify noarchive). If some links aren't being picked up or followed, then IungamBot might have gone over its maximum follow
allowance, or your HTML had too many errors in it to successfully bring about these links. Broken tags will usually cause many problems
for IungamBot when trying to follow links. Broken tags are:
#A broken tag spans across two lines
<body text="#000000" bgcolor="#FFFFFF"
>
|
A list of protocols for the Iungam search engine
The current list of Iungam Search protocols is as follows:
This option will give you the cache of a certain document in our search engine
cache:URL of page [search terms[ search terms]]
ex) cache:www.globalaircraft.org/planes/yf-17_cobra.pl f-17 cobra
This option will return all documents under the given domain;
it is sub-domain sensitive
site:domain root [search terms[ search terms]]
ex) site:www.globalaircraft.org f-15 eagle
|
What to do if IungamBot has already added pages you don't want on our search engine
The only option currently available is to simply wait for IungamBot to figure out that you don't want this page to be indexed any longer. This
may take a few weeks. If IungamBot hasn't removed this document within a month please contact the bot's administrator at
server@globalaircraft.org.
Find out what the statuses of the IungamBots mean
Status |
Description |
Running |
This status means that the IungamBot is currently running, or it was
running the last time our script checked the bot.
|
Idling |
This status means that either the bot was running a while ago and is currently
not doing anything, or it is currently awaiting instructions from the bot master.
|
Offline (sleeping) |
This status means that the bot is not currently running, and is not accepting
any commands from the bot master.
|
Retired |
This means that the bot is no longer in standard use - however, the bot may
be tweaked to run as any of the IungamBots to help out if one is bogged down with
work at the time.
|
Not Responding |
This would imply that the bot isn't responding to our requests for a status. Either
the bot is retaliating or it has run into an internal error and needs to be checked by the bot master.
|
|
Find out more about the different web crawlers that makes up IungamBot
The IungamBot crawler is currently composed of five systems. All of
the following bots will go under the user-agent of "GAC IungamBot/1.x", with 'x' representing the particular
bot's ID number. For a more in-depth view of each system please refer to the chart below:
ID |
Name |
Language |
Description |
1 |
IungamBot |
PERL |
This web crawler is responsible for scouring the internet for new documents that we don't currently list and
sending these links to other IungamBots which will complete the appropriate routine to adding this
to our database. This also has the job of actually grabbing the pages
and creating a cache of them. It adds each page to our database along with any information
needed for our server.
|
2 |
IungamBot |
PHP |
This crawler is used as a mirror to any IungamBot to help relieve the duties of a
bot which is bogged down with work. |
3 |
IungamBotJr |
PERL |
IungamBotJr is reserved for administration use only. This means that only certain pages
will be searched with this bot -- but when run it will create a cache of a page and add it to our database with
full stats needed. |
4 |
IungamBot-Photos |
PHP |
This web crawler is responsible for retrieving images off of the internet from
the query created by IungamBot and adding them to our database. Currently this bot is restricted
to grab images off of globalaircraft.org and other trusted websites only. |
5 |
IungamBot-Update |
PERL |
The job of this crawler is to go through our current database of documents
and update them. This will update the cache for the documents, update the age of the documents,
and could remove documents if they are no longer active. |
|
Do you have any questions that aren't answered here?
Copyright © 2021 Global Aircraft.
Privacy Policy -
Terms of Service -
Help