Troubleshooting Slow SRS External Reponse Time 2010-10-11

Background

  • Attempts to load the SRS application externally are painfully slow. Whereas internal requests take one second or less, the screenshot below shows that every page element seems to be taking 4+ seconds to complete:

    ScreenShot315.jpg
  • Requests are currently following this pathway: Internet → RESC3WEB → RESC3LINUX01 → RESRICDEV02
  • A sample problematic URL is http://www.c3big.org/srs_test/sign_in

Troubleshooting

  • Interesting that the duration every request seems to be a quantum multiple of approximately 4 seconds. I've seen this in the past with connetion timeout-related issues (e.g., for every requests, some layer also tries to do thing X but in a synchronous manner (so that when thing X doesn't work, the side effect is that every request has the added latency of attempting/failing thing X).
  • Whatever thing X is,it appears to support some parallelism (requests #4 and #5 above appear to complete in parallel)
  • Remote desktoping to RESC3WEB and attempting the internal URL http://10.70.1.132/srs_test/sign_in also shows latency (sporadically??)
  • Next troubleshooting approach is to capture traffic inbound/outbound to RESC3LINUX01 to determine if the latency is upstream or downstream
    • On RESC3LINUX01, ran tcpdump to capture all TCP/IP traffic:
      /usr/sbin/tcpdump -s 0 -p -i any -w /home/kramere/slow-srs-01.cap
    • Downloaded capture file and opened for dissection in WireShark
    • Configured Wireshark to decode TCP source and destination port 8080 traffic as HTTP
    • Observation: RESC3LINUX01 has no trouble resolving DNS for resricdev02.crii.org
    • Using the following filter expression to (1) eliminate Zabbix agent traffic, (2) filter down to just RESC3WEB and RESRICDEV02:
      dns or ((tcp.dstport != 10050 and tcp.srcport != 10050) and (ip.addr eq 192.168.41.21 or ip.addr eq 10.70.251.42))
    • IO Flow Graph shows inbound HTTP request flowing from REC3WEB to RESC3LINUX01 and then to RESRICDEV02 in 0.002 seconds (e.g., Frames #130, #133, #137, #140)
    • Frames #137, #138 - Upon initial TCP connect, initial ACK from RESRICDEV02 is instantaneous so likely not a purely network-level issue
    • HTTP request delay between RESC3LINUX01 and RESRICDEV02 is quite evident (GET request in Frame #140; First bytes back (including 200 response code) in Frame #1358, over 8 seconds later)
    • Conclusion: Request is "stalled" on RESRICDEV02 (since our packet capture shows no network-evident cause/event) so we now want to capture some traffic on RESRICDEV02 to see its side of the story.
  • Captured traffic inbound/outbound to RESRICDEV02 via WireShark
    • Saved capture file to \\resricdev02\c$\ric-install\slow-srs-02.cap
    • Filter expression: dns or ((tcp.dstport = 10050 and tcp.srcport = 10050) and (ip.addr == 10.70.1.132))
    • Request inbound in Frame #8 doesn't stream out first bytes of response until Frame #72/75, 4.4 seconds later. This is much too high.
    • No TCP/IP traffic is evident that would explain the delay. I therefore conclude this must be something happening above the network layer (or interactions with other hosts)
  • Created additional reverse proxy config on RESC3LINUX01 that points to a static page in Glassfish, http://10.70.1.132/glassfish-test, and found that this page loads consistently fast. Concluding the issue may be within the Rails stack and/or application and/or config.
  • Examining Rails app config on RESRICDEV02. Not finding content in the log (\\resricdev02\c$\RIC-Install\srs\log\test.log) as I would expect.
  • Examining Glassfish logs located in \\resricdev02\c$\glassfishv3\glassfish\domains\domain1\logs
    • Verified that there doesn't appear to be overlap/conflict due to other deployed apps (e.g., CHTN ETL, PhaST automation)
    • Filtered server.log to only show Rails/Rack log output (lines containing "com.sun.grizzly.jruby.rack.JRubyRuntime")
    • Individual requests are completing within expected timeframe (16ms, 0ms, 16ms, 0ms, 0ms, 141ms, 0ms) but requests are "stacked out" 4+ seconds (2010-10-12T07:33:59.904, 7:34:04.436, 7:34:09.374, 8:07:02.594, 8:37:59.038, 8:47:56.408, 8:48:00.955) as seen browser-side. This is good confirmation that the problem/issue resides inside (Glassfish + JRuby + Rails + Rails app + Configuration)
  • On a hunch, examining Host headers as they pass between hosts
    • RESC3EWB to RESC3LINUX01 host-related headers:
      X-Forwarded-For: 10.70.2.87
      X-Forwarded-Host: www.c3big.org
      X-Forwarded-Server: 192.168.41.21
      Host: 10.70.1.132
    • RESC3LINUX01 to RESRICDEV02 host-related headers:
      Host: resricdev02.crii.org:8080
      X-Forwarded-For: 10.70.2.87, 192.168.41.21
      X-Forwarded-Host: www.c3big.org, 10.70.1.132
      X-Forwarded-Server: 192.168.41.21, 127.0.1.1
  • Disable "Clearrance" before filter(s) in the off chance the problem is related to something that library does...
    • Removed the following lines from application_controller.rb:before_filter :update_activity_time before_filter :authenticate
    • Nope, didn't fix anything
  • Noticing that this version of Rails uses E-Tag, If-None-Match, etc. headers for better caching. This is probably why the issue appears to be sporadic only.
  • This posting juggles a few neurons about the possibility that the X-Forwarded-... headers could be impacting/altering Rails's request handling.
  • Actually, first, on another hunch (based on seeing some forwarding header references to RESC3LINUX01), I added the following entry to \\resricdev02\c$\WINDOWS\system32\drivers\etc\hosts (on RESRICDEV02):
    10.70.1.132     resc3linux01
  • Success! Fast response times (consistently) yay!

    ScreenShot321.jpg
Topic attachments
I Attachment Action Size Date Who Comment
jpgjpg ScreenShot315.jpg manage 64.6 K 11 Oct 2010 - 17:48 EricKramer  
jpgjpg ScreenShot321.jpg manage 71.2 K 12 Oct 2010 - 15:21 EricKramer  
Topic revision: r2 - 12 Oct 2010 - 15:21:57 - EricKramer
 
This site is powered by the TWiki collaboration platformCopyright © by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback