1. HOME
  2. Information
  3. Current Trouble
  4. 【Supercomputer】Slow response of the job scheduler of the system A/B/C and the provisional measures

コンテンツ

Current Trouble

【Supercomputer】Slow response of the job scheduler of the system A/B/C and the provisional measures

publication date : Jun.12, 2017


The job scheduler of the system A / B / C has encountered an error which the response time of the commands of qsub and the qstat is slowing down affected by the elapse time and the use situation of the jobs. Accordingly, we are implementing provisional measures to recover by rebooting the job scheduler at a timing when the response time slow down more than a certain period.We apologize for any inconvenience this may have caused you.

If qsub or qstat is executed while rebooting the job scheduler, the following error will occur: If this message is displayed, please wait for a while and re-execute the command. Rebooting will be completed in about 1 to 5 minutes, but it may take some time to collect data such as memory data for failure investigation. This measure has no effect on the running jobs.

Connection refused
qsub: cannot connect to server jb.kudpc.kyoto-u.ac.jp (errno=111)

The schedule of rebooting is as follows.

Date of rebooting Systems
Sun. April 23, 5:55 p.m. - 5:58 p.m. A (Camphor 2)
Tue. May 2, 11:26 a.m. - 11:32 a.m. B (Laurel 2)
Tue. May 2, 11:35 a.m. - 11:35 a.m. C (Cinnamon 2)
Tue. May 9, 9:52 p.m. - 9:57 p.m. A (Camphor 2)
Thu. May 11, 1:37 p.m. - 1:41 p.m. B (Laurel 2)
Mon. May 15, 11:55 a.m. - 11:57 a.m. A (Camphor 2)
Mon. May 15, 3:51 p.m. - 3:55 p.m. B (Laurel 2)
Fri. May 19, 9:09 a.m.-9:10 a.m. C (Cinnamon 2)
Tue. May 23, 11:57 a.m.-11:58 a.m. A (Camphor 2)
Thu. June 1, 8:33 a.m.-8:36 a.m. A (Camphor 2)
Fri. June 2, 8:35 a.m.-8:36 a.m. C (Cinnamon 2)
Sun. June 11, 6:00 a.m.-9:50 a.m. B (Laurel 2)
Tue. June 13, 8:51 a.m.-8:52 a.m. A (Camphor 2)
Wed. June 21, 3:43 a.m.-3:47 a.m. B (Laurel 2)
Fri. June 23, 12:10 p.m.-12:12 p.m. A (Camphor 2)
Fri. June 23, 12:25 p.m.-12:26 p.m. C (Cinnamon 2)
The system was restored. We apologize for the inconvenience and trouble that you may have had.

*In the maintenance of August 2017, we have implemented permanent measures for this trouble.
Date of occurrence 2017/04/05 09:00 ~2017/08/09 09:30
Inquiry Supercomputing Section, IT Services Division, Information Management Department, Kyoto University
E-mail:consultkudpc.kyoto-u.ac.jp
Inquiry Form

Back to Current Trouble

 

Copyright © Institute for Information Management and Communication, Kyoto University, all rights reserved.