Project

General

Profile

where to serve robots.txt

Added by Mark Petryk over 6 years ago

I have a question about serving up a robots.txt file. I see that my server gets hit fairly regularly for

http://us2.fundandgrow.com/robots.txt

How do I configure the server to cough up this file on request?

I have placed a robots.txt file in my docroot folder, but it doesn't seem to access it when requested, whereas, if I try to access some other file out of the docroot folder, this is what I get;

http://us2.fundandgrow.com/images/fundandgrow-header-logo.png

There is no special handler for that images file. My docroot directive looks like this;

---docroot="docroot;.,/bootstrap,/css,/resources,/images,/xls" \


Replies (8)

RE: where to serve robots.txt - Added by Stefan Arndt over 6 years ago

I think your docroot should be

---docroot=\"docroot;/,/bootstrap...

('/' instead of '.')

BUT

I would not do that, since this might leak your entire application (maybe also your binary and configs). So be carefull (I don't know, if you can adjust the pathes so that there is no harm)!

In case you are using nginx or any other proxy:

I am using nginx to serve the letsencrypt challenge files while the server is running. You might use the same method:

http {
  server {
    listen 80;
    server_name http://your.domain;

    location /.well-known/acme-challenge/ {
      alias /srv/http/challenges/;
      try_files $uri =404;
    }
    location / {
      return 301 https://your.domain;
    }
  }

RE: where to serve robots.txt - Added by Mark Petryk over 6 years ago

Thank you, Stefan.

I think we're cool on binaries and such as. We've moved everything out of docroot and now it only contains things related to web content and so on. So, therefore, the binaries and all configs and such are at . and all website content is at ./docroot so we shouldn't suffer any leaks. I will try your suggestion.

What's also got me a little throwd off is we are getting requests for robots.txt on our sub-pages, and I'm not quite sure how to serve up that file in those sub-pages. Perhaps this is where nginx can help?

Thanks for the tip on nginx. I really want to make use of that application, but haven't had a chance to get it going yet.

~mark

RE: where to serve robots.txt - Added by Mark Petryk almost 6 years ago

I'm really not having any luck with this. If I set my docroot to;

setting docroot to /

---docroot=\"docroot;/,/bootstrap...

('/' instead of '.')

Then the site doesn't' run at all. If I leave docroot as '.' then the robots.txt file doesn't get returned.

So, back to my original question; How do I get Wt to serve up static files like 'robots.txt' and the like.

RE: where to serve robots.txt - Added by Stefan Arndt almost 6 years ago

Well, not sure how, but I thought '---docroot=\".;/,/bootstrap' has worked for me. But it does NOT right now.

Nevertheless I just tried "docroot=.;/robots.txt,/resources" and this serves the robot.txt just fine.

RE: where to serve robots.txt - Added by Mark Petryk almost 6 years ago

Ah, tricky dicky... I didn't know I could list individual files there. That's helpful!

RE: where to serve robots.txt - Added by Mark Petryk almost 6 years ago

Something seems off, still.

If I go to my site here;

https://fundandgrow.com/robots.txt

It asks me to 'download' the file. Whereas, if I navigate to any other site, such as;

http://support.lorimarksolutions.com/robots.txt

Then the file simply displays in the browser (as I expect it would). What's the difference here?

RE: where to serve robots.txt - Added by Mark Petryk almost 6 years ago

Based on some of my searching through redmine, it seems like Wt isn't designed to serve up static files, and instead other servers are better suited to the task. So, through haproxy (which is what I have Wt running behind) I am able to marshal requests for various static files over to an apache server.

haproxy.cfg

56 frontend www-http·

57 bind *:80·

58 mode http·

59 option httplog·

60 ·

63 acl robots_txt path_beg /robots.txt·

82 use_backend apache8080 if robots_txt·

93 ·

104 backend apache8080·

105 mode http·

106 server ap-8080 localhost:8080·

107 ·

RE: where to serve robots.txt - Added by Stefan Arndt almost 6 years ago

The difference should be the Content-Type-header. Some insight: http://stackoverflow.com/questions/6293893/how-to-force-files-to-open-in-browser-instead-of-download-pdf

I would try using Wt::WFileResource next and add it as global resource (WServer::addResource()). You can specify a mime type for WFileResource (which would be text/plain).

    (1-8/8)