Skip to content
GitLab
  • Explore
  • Sign in
  • EDA Guides
  • eda-servers-guide
  • Wiki
  • monitoring

monitoring · Changes

Page history
Update monitoring authored May 06, 2021 by Daniele Jahier Pagliari's avatar Daniele Jahier Pagliari
Hide whitespace changes
Inline Side-by-side
monitoring.md
View page @ cbc24dfb
......@@ -25,11 +25,17 @@ TODO
You should always monitor *all* your processes carefully. When you run a new script for the first time, *always* use the monitoring tools described above to make sure that you are using a reasonable amount of resources. If the script is a long-running one, repeat this check periodically to ensure that it doesn't have memory leaks or other resource-related issues.
What is a reasonable amount of resources? Except for storage, we do not impose hard limits.
##### What is a reasonable amount of resources?
Except for storage, we do not impose hard limits.
However, you should keep in mind that, typically, more than 10 people are actively running jobs on each server at all times (day and night, 7 days a week). So, if your processes alone take (say) 70\% of all cores and RAM memory, you are clearly not being respectful of others. Even worse, if you take 100\% of the resources, you could render the server completely unreachable, making it impossible for other users (or even for sysadmins that want to kill your processes) to connect.
##### Why are you allowed to do damage?
We chose not to setup the servers so that doing such kind of damage would be completely *impossible* for users, because this would require a mechanism (e.g. cgroups, private VMs, etc.) that would reduce the resources available to everyone in normal conditions (of positive cooperation).
So, while you have the possibility of behaving badly, this does not mean that you will not have consequences if you do. In fact, when a server is overloaded, sysadmins automatically receive a notification with the resource usage details of all processes and any possible misbehavior (especially if repeated), will be reported directly to Prof. Enrico Macii. Remember: *"errare humanum est, perseverare autem diabolicum"*
##### What happens if you do?
While you have the possibility of behaving badly, this does not mean that you will not have consequences if you do. In fact, when a server is overloaded, sysadmins automatically receive a notification with the resource usage details of all processes and any possible misbehavior (especially if repeated), will be reported directly to Prof. Enrico Macii. Remember: *"errare humanum est, perseverare autem diabolicum"*
Clone repository
Home

Server Information
Account
Connecting to the Servers
Storage Management and Quotas
Monitoring Resources
Gitlab
Software and Libraries
  • Additional Software on Philae
  • Remote Code Deployment
  • Python Virtual Environments
  • EDA Technology Libraries
  • Shared Datasets
Miscellaneous
  • Setting CUDA Drivers on Icaro

Sidebar