What does double slash (//) directory mean in robots.txt?

up vote
2
down vote

favorite

You will get the following output with:

curl https://www.ibm.com/robots.txt

I delete many lines, keeping only part of it.

User-agent: *

Disallow: //

Disallow: /account/registration

Disallow: /account/mypro

Disallow: /account/myint



# Added to block site mirroring

User-agent: HTTrack

Disallow: /

#

I understand that / means root directory, but what does double slash // directory mean here in robots.txt?

edited Nov 27 at 1:52

JakeGould

30.9k1093137

asked Nov 27 at 1:44

scrapy

1885

2

It could be a typo, I can't find a single reference to a double slash in any of the official Robot Exclusion documents.
– Michael Frank
Nov 27 at 1:51

@MichaelFrank Typo or a coding fluke made by an automated system generating a robots.txt on demand.
– JakeGould
Nov 27 at 2:00

add a comment |

up vote
2
down vote

favorite

You will get the following output with:

curl https://www.ibm.com/robots.txt

I delete many lines, keeping only part of it.

User-agent: *

Disallow: //

Disallow: /account/registration

Disallow: /account/mypro

Disallow: /account/myint



# Added to block site mirroring

User-agent: HTTrack

Disallow: /

#

I understand that / means root directory, but what does double slash // directory mean here in robots.txt?

edited Nov 27 at 1:52

JakeGould

30.9k1093137

asked Nov 27 at 1:44

scrapy

1885

2

It could be a typo, I can't find a single reference to a double slash in any of the official Robot Exclusion documents.
– Michael Frank
Nov 27 at 1:51

@MichaelFrank Typo or a coding fluke made by an automated system generating a robots.txt on demand.
– JakeGould
Nov 27 at 2:00

add a comment |

up vote
2
down vote

favorite

You will get the following output with:

curl https://www.ibm.com/robots.txt

I delete many lines, keeping only part of it.

User-agent: *

Disallow: //

Disallow: /account/registration

Disallow: /account/mypro

Disallow: /account/myint



# Added to block site mirroring

User-agent: HTTrack

Disallow: /

#

I understand that / means root directory, but what does double slash // directory mean here in robots.txt?

edited Nov 27 at 1:52

JakeGould

30.9k1093137

asked Nov 27 at 1:44

scrapy

1885

You will get the following output with:

curl https://www.ibm.com/robots.txt

I delete many lines, keeping only part of it.

User-agent: *

Disallow: //

Disallow: /account/registration

Disallow: /account/mypro

Disallow: /account/myint



# Added to block site mirroring

User-agent: HTTrack

Disallow: /

#

I understand that / means root directory, but what does double slash // directory mean here in robots.txt?

linux home-directory

edited Nov 27 at 1:52

JakeGould

30.9k1093137

asked Nov 27 at 1:44

scrapy

1885

edited Nov 27 at 1:52

JakeGould

30.9k1093137

asked Nov 27 at 1:44

scrapy

1885

edited Nov 27 at 1:52

JakeGould

30.9k1093137

edited Nov 27 at 1:52

JakeGould

30.9k1093137

edited Nov 27 at 1:52

JakeGould

30.9k1093137

asked Nov 27 at 1:44

scrapy

1885

asked Nov 27 at 1:44

scrapy

1885

asked Nov 27 at 1:44

scrapy

1885

2

It could be a typo, I can't find a single reference to a double slash in any of the official Robot Exclusion documents.
– Michael Frank
Nov 27 at 1:51

@MichaelFrank Typo or a coding fluke made by an automated system generating a robots.txt on demand.
– JakeGould
Nov 27 at 2:00

add a comment |

2

It could be a typo, I can't find a single reference to a double slash in any of the official Robot Exclusion documents.
– Michael Frank
Nov 27 at 1:51

@MichaelFrank Typo or a coding fluke made by an automated system generating a robots.txt on demand.
– JakeGould
Nov 27 at 2:00

It could be a typo, I can't find a single reference to a double slash in any of the official Robot Exclusion documents.
– Michael Frank
Nov 27 at 1:51

@MichaelFrank Typo or a coding fluke made by an automated system generating a robots.txt on demand.
– JakeGould
Nov 27 at 2:00

add a comment |

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

This seems like a mistake:

Disallow: //

The thing is that the robots.txt spec—as outlined here—clearly states:

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".

But some people claim that is not the case such as this site that states that Google can handle pattern matching:

Pattern matching: At this time, pattern matching appears to be usable by the three majors: Google, Yahoo, and Live Search. The value of pattern matching is considerable. Let’s look first at the most basic of pattern matching, using the asterisk wildcard character.

But regardless of that, the // means a literal directory of a directory with no name attached to that directory since there is no wildcard (*) globbing or anything there. And // just seems odd.

My guess is it’s a mistake of some sort. Yes, an IBM webmaster can make mistakes! But I would also guess that the robots.txt is automatically generated by some system and somehow a path such as /*/ was converted to // when the robots.txt was automatically generated by the system.

answered Nov 27 at 1:58

JakeGould

30.9k1093137

Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
– grawity
Nov 27 at 5:51

@grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
– JakeGould
Nov 27 at 16:28

add a comment |

Your Answer

StackExchange.ready(function() {
var channelOptions = {
tags: "".split(" "),
id: "3"
};
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function() {
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled) {
StackExchange.using("snippets", function() {
createEditor();
});
}
else {
createEditor();
}
});

function createEditor() {
StackExchange.prepareEditor({
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
imageUploader: {
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
},
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
});

}
});

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

StackExchange.ready(
function () {
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fsuperuser.com%2fquestions%2f1378614%2fwhat-does-double-slash-directory-mean-in-robots-txt%23new-answer', 'question_page');
}
);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

This seems like a mistake:

Disallow: //

The thing is that the robots.txt spec—as outlined here—clearly states:

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".

But some people claim that is not the case such as this site that states that Google can handle pattern matching:

Pattern matching: At this time, pattern matching appears to be usable by the three majors: Google, Yahoo, and Live Search. The value of pattern matching is considerable. Let’s look first at the most basic of pattern matching, using the asterisk wildcard character.

But regardless of that, the // means a literal directory of a directory with no name attached to that directory since there is no wildcard (*) globbing or anything there. And // just seems odd.

answered Nov 27 at 1:58

JakeGould

30.9k1093137

Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
– grawity
Nov 27 at 5:51

@grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
– JakeGould
Nov 27 at 16:28

add a comment |

up vote
1
down vote

accepted

This seems like a mistake:

Disallow: //

The thing is that the robots.txt spec—as outlined here—clearly states:

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".

But some people claim that is not the case such as this site that states that Google can handle pattern matching:

Pattern matching: At this time, pattern matching appears to be usable by the three majors: Google, Yahoo, and Live Search. The value of pattern matching is considerable. Let’s look first at the most basic of pattern matching, using the asterisk wildcard character.

But regardless of that, the // means a literal directory of a directory with no name attached to that directory since there is no wildcard (*) globbing or anything there. And // just seems odd.

answered Nov 27 at 1:58

JakeGould

30.9k1093137

Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
– grawity
Nov 27 at 5:51

@grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
– JakeGould
Nov 27 at 16:28

add a comment |

up vote
1
down vote

accepted

This seems like a mistake:

Disallow: //

The thing is that the robots.txt spec—as outlined here—clearly states:

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".

But some people claim that is not the case such as this site that states that Google can handle pattern matching:

Pattern matching: At this time, pattern matching appears to be usable by the three majors: Google, Yahoo, and Live Search. The value of pattern matching is considerable. Let’s look first at the most basic of pattern matching, using the asterisk wildcard character.

But regardless of that, the // means a literal directory of a directory with no name attached to that directory since there is no wildcard (*) globbing or anything there. And // just seems odd.

answered Nov 27 at 1:58

JakeGould

30.9k1093137

This seems like a mistake:

Disallow: //

The thing is that the robots.txt spec—as outlined here—clearly states:

Note also that globbing and regular expression are not supported in either the User-agent or Disallow lines. The '*' in the User-agent field is a special value meaning "any robot". Specifically, you cannot have lines like "User-agent: bot", "Disallow: /tmp/*" or "Disallow: *.gif".

But some people claim that is not the case such as this site that states that Google can handle pattern matching:

Pattern matching: At this time, pattern matching appears to be usable by the three majors: Google, Yahoo, and Live Search. The value of pattern matching is considerable. Let’s look first at the most basic of pattern matching, using the asterisk wildcard character.

But regardless of that, the // means a literal directory of a directory with no name attached to that directory since there is no wildcard (*) globbing or anything there. And // just seems odd.

answered Nov 27 at 1:58

JakeGould

30.9k1093137

answered Nov 27 at 1:58

JakeGould

30.9k1093137

answered Nov 27 at 1:58

JakeGould

30.9k1093137

answered Nov 27 at 1:58

JakeGould

30.9k1093137

Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
– grawity
Nov 27 at 5:51

@grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
– JakeGould
Nov 27 at 16:28

add a comment |

Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
– grawity
Nov 27 at 5:51

@grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
– JakeGould
Nov 27 at 16:28

Either that, or the entry is there specifically to prevent mistake URLs with a redundant slash from being indexed.
– grawity
Nov 27 at 5:51

@grawity Fair enough but I am not too sure what the benefit would be to have a URL that is example.com//thing as some odd method of obscuring data from crawlers.
– JakeGould
Nov 27 at 16:28

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Super User!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

Some of your past answers have not been well-received, and you're in danger of being blocked from answering.

Please pay close attention to the following guidance:

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Sign up or log in

StackExchange.ready(function () {
StackExchange.helpers.onClickDraftSave('#login-link');
});

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

This page is only for reference, If you need detailed information, please check here

搜尋此網誌

Gfrktyl