Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0007653opensim[REGION] OpenSim Corepublic2015-07-24 10:262015-08-06 14:41
Reportermoses 
Assigned ToDiva 
PrioritynormalSeverityminorReproducibilityalways
StatusassignedResolutionopen 
PlatformOSOS Version
Product Versionmaster (dev code) 
Target VersionFixed in Version 
Summary0007653: [Patch] Advanced Network Statistics
DescriptionPatch file attached.

This submitted patch provides all of the network metrics from ticket 0007632 (listed below) with a few changes:

1) All of the introduced metrics have configuration variables relocated from OpenSim.ini.example to config-include/AdvancedSimulationMetrics.ini

2) OpenSim.ini.example contains a new variable (AdvancedSimulationMetrics) that points to the AdvancedSimulationMetrics.ini. If the variable is not explicitly given a path in OpenSim.ini, the network statistics will not be calculated.

3) Inside config-include/AdvancedSimulationMetrics.ini is the AdvancedMetricsEnable. This variable is the master on/off switch for all of the introduced variables. By default, this value is FALSE and does not calculate the new variables.

4) AdvancedSimulationMetrics.ini includes individual on/off switches for the other metrics. If AdvancedMetricsEnable is false, none of the network metrics will be collected. If AdvancedMetricsEnable is true, individual metrics will be calculated only if their corresponding variable is explicitly set to TRUE. By default, each metrics' on/off switch is turned off.

5) The IP-to-Avatar reporting has been updated to follow the RemoteAdminPlugin example. Now, a password is required to access the IP data. The password is a variable in AdvancedSimulationMetrics.ini.


-------------Below is the description from Ticket 0007632-----------
The MOSES Team is submitting new and modified network-based statistics gathering to OpenSim. Mailing list concerns about inconsistent commit descriptions have been corrected in the submitted patch. There was also a concern of the Logging in Users metric. The metric counts how many users are actively logging in at that instant. The measure is incremented when a user first starts to enter a region, then decremented on the first texture update of the user. An alternative to this implementation was to decrement the user count at the EconomyDataRequest. However, this method would move the decrement code to an optional system of OpenSim. This approach is hazardous and not ideal as any OpenSim server not running the optional system or any viewer that choose not to send the required message would receive incorrect data. Therefore, we follow the more general approach of decrementing at the first texture update of the user.

----------------------
Patch Summary:

With the provided code changes, you will be able to accurately determine, through the JSON SimStats, how your simulator is performing network-wise with measures in packets, bytes, queue size, and latency. Particularly with the latency measures, we have added variables to the OpenSim.ini file (specified below) to allow you to turn these features on and off (off by default). Furthermore, we have added IP-to-avatar tracking that is outputted via a new JSON output stream (specified below). Below are details of each of the modified and newly added statistics.

Network Statistics:
 
PktsIn – Now the number of packets received are averaged inside the LLUDPClient before being sent to the SimStatsReporter. The time is calculated using a stopwatch. This is done to enforce an accurate measurement since the SimStatsReporter and the UDP message system are running on separate threads. The stat is reported as the number of packets per second.

PktsOut - Now the number of packets sent are averaged inside the LLUDPClient before being sent to the SimStatsReporter. The time is calculated using a stopwatch. This is done to enforce an accurate measurement since the SimStatsReporter and the UDP message system are running on separate threads. The stat is reported as the number of packets per second.

UDPIn – This is the same as PktsIn except that it tracks the number of bytes being received instead of the number of packets.

UDPOut – This is the same as PktsOut except that it tracks the number of bytes being sent instead of the number of packets.

UDPInError – Tracks the number of packets that are sent to RecordMalformedInboundPacket. Then divides by the time in the UDPClient before sending the average number of packets that were malformed to the SimStatsReporter. The stat is reported as the number of packets per second.

ClientPing – Average ping between OpenSim and a subset of its connected users. This will show a value of -1 if the ping reply status was not successful. A non-successful status may vary (e.g. Timeout, time limit exceeded, firewall issues). Check C#'s IPStatus class for more information on the various states: https://msdn.microsoft.com/en-us/library/system.net.networkinformation.ipstatus(v=vs.110).aspx [^] [^]

AvgPing – Average ping to an external server. This will show a value of -1 if the ping reply status was not successful. A non-successful status may vary (e.g. Timeout, time limit exceeded, firewall issues). Check C#'s IPStatus class for more information on the various states: https://msdn.microsoft.com/en-us/library/system.net.networkinformation.ipstatus(v=vs.110).aspx [^] [^]
NetFT – Split into 2 statistics which are NetEvtTime and NetQSize.

NetEvtTime – This is acquired inside the LLUDPServer class inside the IncomingPacketHandler. This records the time it took each time a packet is processed then divides by the number of packets. This is a moving average over N number of packets. Where N is equal to the number of frames being stored for Total, Simulation, and Physics frame time.

NetQSize – This is acquired at the same time as NetEvtTime. This is the size of the queue that is left to be processed. This is also a moving average based on N, where N is equal to the number of frames being stored for Total, Simulation, and Physics frame time.

Client IP Addresses – This is a list of clients and their IP Addresses. In order to access the list use the address format as follows
                [Server IP address]:[Port number]/[Custom URI name]
The default using the value in the example of Agent_Stats_URI would be
                127.0.0.1:9000/jsonUserStats

-------------------------------------------------
OpenSim.ini Changes:

 
The subheadings in this section correspond with the Sections inside of the OpenSim.ini file.
 
[Startup]

 
Agent_Stats_URI – Provides the web URI that will allow the developer to access the current clients and there IP Addresses. An example is
    ; Agent Login Stats URI

    ; Enable JSON agent data by setting a URI name (case sensitive)

    ; Returns regular agent stats (Name, IPAddress, Login)

    Agent_Stats_URI = "jsonUserStats"

 
 
[Statistics]
 
NumberOfFrames – This is the number of frames or packets that will be held onto and then averaged for total frame time, simulation frame time, physics frame time, NetEvtTime, and NetQSize.

ClientPingSubset – This is an integer that defaults to 1. It is the number of clients that will be pinged to create an average ping rate. This is done as a subset of the total number of clients.

ClientPingFrequency – The frequency with which we ping the client. This value is in seconds.

ExternalServer – Address of the external server to be pinged.

ExternalPingFrequency – The frequency with which we ping the external server. This value is in seconds.

PingExternalServerEnabled – A flag that tells opensim whether it should be pinging the external server.

PingClientEnabled – A flag that tells opensim whether it should be pinging the clients.

StatsUpdateEveryMS – An integer that will control how often the SimStatsReporter heartbeat will run for. This value should be given in milliseconds.
TagsNo tags attached.
Git Revision or version number
Run ModeStandalone (1 Region) , Standalone (Multiple Regions) , Grid (1 Region per Sim) , Grid (Multiple Regions per Sim)
Physics EngineBasicPhysics
Script Engine
EnvironmentMono / Linux32, Mono / Linux64, Mono / Windows, Mono / OSX, .NET / Windows32, .NET / Windows64
Mono VersionNone
Viewer
Attached Filespatch file icon mosesMetricsPhase3_2015Aug05.patch [^] (91,193 bytes) 2015-08-05 13:55 [Show Content]

- Relationships
has duplicate 0007632closedorenh [Patch] New Network Statistics 

-  Notes
(0028981)
orenh (administrator)
2015-07-25 23:17

Thanks for the patch, I will look at it later.

Please make sure to set the bug status to "Patch included" whenever you include a patch.
(0028989)
orenh (administrator)
2015-07-26 07:25

** Overall

This patch looks like it combines many different changes. It's very difficult to understand such mega-patches; it should be divided into several different patches for different features. For example:

* Change how existing statistics, such as "UDP Packet In Rate", are calculated.
* Client ping
* External server ping
* Number of users logging in
* Other statistics
Etc.

In your SCM, did you create all of these features at once, or over several commits? I'm guessing they were written one a a time; and it would be easier to evaluate them one at a time, too.

The terminology used is inconsistent. I've seen this feature referred to (in this patch) using all of the following names:
- Advanced Network Metrics
- Advanced Simulation Metrics
- AME
- Networking statistics

Please choose one name and use it everywhere. If you're really only going to include network statistics then "Network Statistics" would seem to be the best name. If you intend to extend this to other statistics, e.g. physics, avatar behavior, etc., then it would be best to select a more generic name up-front: e.g., "Advanced Metrics".

These features are complicated enough that they should be documented in the wiki ( http://opensimulator.org/wiki/Main_Page [^] ). One possible problem: the Wiki may be restricted for editing now due to some spam. If it's not currently up then please include the documentation as a Note in this Mantis entry, and I'll copy it to the wiki.

** AdvancedSimulationMetrics.ini

Why would someone enable the overall metrics feature, but disable any of these settings: AgentAddressEnabled, NetQSizeEnabled, NetEvtTimeEnabled? I can understand why someone wouldn't want to enable 'PingClientEnabled' or 'PingExternalServerEnabled', as those options can cause some slowdown, but as far as I can see the other options don't have any downsides. Do they? If not then it would be better to simplify the settings by removing the option to enable or disable each of these settings individually.

If someone enables 'AdvancedNetworkMetricsEnabled' then the rest of the options should immediately work, with reasonable default values. Currently they're all disabled, which means that anyone who wants to enable this feature has to understand and enable all of the sub-features individually.

Some of the comments in this file are repetitive and longer than they should be, which makes it difficult to separate the signal from the noise. For example, these lines serve no useful information and should be deleted:

    ;; This is a boolean value that will default in the code to false when no
    ;value is assigned inside of this file
    ;; The default value of this variable in this file is false

(We can see that it's a boolean value because of the default value provided; and it's standard for OpenSim settings to have the default value set in code as well, so that doesn't need to be mentioned explicitly.)

Some of the comments contain mixed useful and non-useful information. For example, for the setting 'AgentAddressPassword', the following lines provides the useful information (which can't be known otherwise) that an empty string means any password will be accepted:

    ;; The default value inside of this file is the empty string as that means
    ;any password will be accepted

The line widths should be much higher than in the current file. This will reduce line breaks and make it easier to comprehend the text. The width should be at least 100 characters, and can optionally be even longer if a small addition would prevent a line break.

Change "of" to "off" here: "This will turn of..."

It's problematic to put the settings "AgentAddressPort" (default value 9001) here. The reason is that if someone wants to customize the ports, e.g. in order to run multiple OpenSim instances on the same server, then they probably won't even think to look in this file for a port. It would be nice if this setting appeared next to 'http_listener_port' in OpenSim.ini (default value 9000). On the other hand, that would break encapsulation... Perhaps, next to 'http_listener_port', just add a comment that mentions that this setting ("AgentAddressPort") exists in a separate file.

** OpenSim.ini.example

Change "Include-AdvancedMetricsEnable" to "Include-AdvancedMetrics" (or whatever the final name will be). In other words, remove the word "Enable". The determination of whether this feature is enabled or disabled is made in the INI file; the mere *inclusion* of the file doesn't mean that the feature is enabled (since it's disabled by default). Also, once we've added an INI file to OpenSim, we generally don't expect it to be missing, because anyone who updates OpenSim to a version that includes this code would automatically get the INI file as well.

I'm guessing that you chose this filename because you saw the name "osslEnable.ini" nearby. But in that name the word "enable" has a different meaning: the entire INI file is just a list of OSSL functions, and whether they're enabled or disabled.

** Other

The metric for "Number of users currently logging in" seems awfully specific. Is this something MOSES needed for some specific reason? I'm just not sure that a metric like this, which requires significant work to obtain, is generally useful.

The difference between statistics that deal with Packets and statistics that deal with Bytes is unclear. Your Mantis comment talks about statistics with these names: PktsIn , PktsOut, UDPIn, UDPOut, UDPInError. But in the source code, I see UDPInRate, UDPOutRate, UDPErrorRate.

Any statistic that deals with the number or rate of packets should include the word "Packets" in its name. Any statistic that deals with the number or rate of bytes should include the word "Bytes" in its name. A generic name such as "UDPInRate" doesn't tell you what is being measured: bytes or packets.

There's some excessive line-breaks in the CS code, too. E.g., this code should have been written on just one line:

    sb[30].StatValue = (float) networkSumQueueSize /
        m_numberFramesStored;
(0028990)
orenh (administrator)
2015-07-26 07:47

I want to add: please don't be discouraged by this long feedback. I really do want to accept these changes, and future changes from MOSES as well. The main problem is the difficulty in understanding all of these changes at once, and the concern that as a result, accepting the patch as-is would screw up some of the existing functionality.
(0028992)
BlueWall (administrator)
2015-07-26 08:59

Thanks for the patches, interesting work.

The patches apply cleanly with a only few whitespace warnings. These are just indentions carrying over on blank lines between sections of code. IMHO, not an issue.

The code builds, but introduces some failures in the unit testing. Please check the unit tests and see that they pass or make corrections as needed in the code or unit tests. These tests are run on each commit to the OpenSimulator main repository and will cause the process to hang if any fail.

Thanks again for the work. I think these will help us make good decisions when working to improve performance of the system.
(0028993)
aiaustin (developer)
2015-07-26 10:17
edited on: 2015-07-26 12:34

Can I check something.... Must the port used for advanced metrics (default 9001) be completely separate to all other ports used for the http_listener_port (default port 9000) and for each region port (as in Regions/*.ini - we use ports 9000,9001,.. and upward for those) in this OpenSim.exe instance or can it be the same? May be worth indicating any requirement on this in the include file comments.

If any extra warnings are introduced please also fix those in the changes made... we are getting down to a small number after a lot of work by the devs... 17 only in core at present for Windows builds.

(0029098)
moses (reporter)
2015-08-05 13:56

Patch includes the proposed metrics, a fix to a broken unit test, and a spelling correction in the comments.
(0029104)
Diva (administrator)
2015-08-06 06:34

We appreciate what the MOSES group is doing here.

Speaking as a potential committer of this patch, I'll have to figure out how to assess that all of this works and doesn't break things. It will take me a while, and I'll likely have lots of questions, so bear with me. I will try to talk with mheilman on the IRC, but until then, here are some questions, just by looking at the diff.

I'm looking at the additions in Scene.cs (already a massive class), and thinking that all that new code would better go in a new class. (This is not a question, it's a comment; if I would take care of this patch, I would proceed with that refactoring before sending things upstream)

Also, I don't understand the changes in OpenSimBase.cs. What is that XmlRpc channel for? I don't see a clear explanation for it in the description in this mantis. Bullet 5) above doesn't explain. The provided code looks nothing like the RemoteAdmin plugin -- it isn't even a plugin. If this is a plugin-like thing, maybe it should be an explicit plugin, like RemoteAdmin, or simply an optional region module that provides that XmlRpc handler. But first I need to understand what this is doing. Why this extra channel? Who will be using it? How is this different from the JsonStats reporting? Is is so that this information about IPs is restricted, as opposed to all other stats?
(0029112)
Diva (administrator)
2015-08-06 13:07
edited on: 2015-08-06 13:08

Copy-pasting a conversation we had in IRC:

[12:32] <diva> ok, first Q: what's the xml rpc channel for?
[12:34] <mheilman> One of the new metrics collected is the connecting IP of the users in question. We went back and forth with Melanie about security, and how she would like it to be handled
[12:34] <mheilman> The xmlrpc channel is the remote access point to query that data without dumping it to a disk
[12:35] <mheilman> As per melanie, it is also access restricted, in the same way as the remote admin plugin
[12:35] <diva> ok. Was there a reason not to use the RemoteAdmin plugin already in place?
[12:36] <diva> I mean, just add that extra method
[12:37] <mheilman> The method(s) are metrics specific, with independent control in the metrics ini section
[12:37] <mheilman> we thought it was better to keep them divided
[12:40] <diva> ok. In that case it might be better to do it either as a mono addin (like RemoteAdmin) or as a region module -- although I'm not sure region modules can add http listeners. Melanie_ are you around? The patch is modifying OpenSimBase in ways that don't seem right -- too much new code in that class for just this optional feature
[12:41] <Melanie_> RemoteAdmin is not a suitable framework
[12:41] <Melanie_> the Moses approach is good
[12:41] <diva> exactly as is? With that new code in OpenSimBase?
[12:41] <Melanie_> can you link me to the patch? I haven't seen the actual code yet, just the concept discussions
[12:42] <diva> http://opensimulator.org/mantis/file_download.php?file_id=4350&type=bug [^]
[12:42] <diva> look at the additions in OpenSimBase in particular. Those are the ones that don't look right
[12:45] <Melanie_> the implementation has architectural issues
[12:46] <Melanie_> diva, you're right that code should not be there
[12:46] <diva> yeah
[12:46] <diva> this should either be a plugin like remote admin or else a region module
[12:46] <diva> can region modules create new http listeners?
[12:46] <Melanie_> i can only surmise that the reason so much code was included in OpenSimBase and BaseOpenSimServer is to allow for opening that server and servicing requests before regions ar eloaded, or when no regions are loaded
[12:47] <Melanie_> there are two ways to go from there
[12:47] <AliciaRaven> diva, it is posible for region modules to add their own listeners, its some thing i have done before
[12:47] <Melanie_> 1, create an OpenSim Plugin, using the same interface as the RemoteAdmin one, move all the code from OpenSimBase and BaseOpenSimServer there
[12:48] <Melanie_> or 2, have one port per loaded region and make it a region module
[12:48] <Melanie_> now, here's my take
[12:49] <Melanie_> there should be a StatsServerPlugin created using the OpenSim Plugin interface, which RemoteAdin also uses
[12:49] <Melanie_> that interface is intended to extend the base OpenSim functionlaity without modifying the deep down core code
[12:49] <diva> yes, that was exactly my reaction too
[12:49] <Melanie_> This Plugin should create the listener and serve requests, querying the region module registry for data from the regions
[12:50] <diva> I think this should be a plugin like remote admin
[12:50] <Melanie_> the stuff that was inserted into Scene.cs, et. al., should be moved into a region module
[12:50] <Melanie_> unless cause can be shown why it needs to be in Scene
[12:50] <diva> yep
[12:51] <diva> ok, we're on the same page
[12:51] <Melanie_> even then, if it's only about accessing data inaccessible to a region module, it is better to create general purpose accessors for that data in Scene
[12:51] <Melanie_> but do the actual work in a module
[12:51] <Melanie_> an OpenSim Plugin can access the sene list
[12:51] <Melanie_> and from each scene, the region module in question can be queried
[12:52] <Melanie_> since Plugins are always single instances; no plugin can be loaded twice; a Plugin can use a singleton-style architecture to allow access from a region modulw without an objet reference
[12:52] <Melanie_> e.g. StatsServerPlugin.Instance.DoSomething
[12:52] <Melanie_> if that is required
[12:53] <Melanie_> but that should not really be required
[12:53] <Melanie_> the server Plugin should instead iterate the scenes and query the data from the region modules loaded there
[12:53] <Melanie_> that ensures that, when people do not want this, it doesn't even cause extra code to be eecuted
[12:54] <Melanie_> depite the complex sound of this, it's actually a pretty straightforward refactor
[12:54] <diva> yes. mheilman, I'm not sure what you prefer. I/Melanie can take this patch and rewrite it completely or you can take this feedback to the writers of this code and tell them to rearchitect it. We'll probably be more effective at doing "the right thing" but probably slower...
[12:54] <Melanie_> all code that needs to be added already exists elsewhere in opensim
[12:54] <Melanie_> so it's mainly copypasta
[12:54] <diva> do you have time Melanie?
[12:54] <diva> I'm sort of busy this week with papers
[12:55] <Melanie_> not really much, unless paid for it
[12:55] <Melanie_> but i would contract for this or any other opensim work
[12:55] <Melanie_> if that is of interest to anyone
[12:58] <mheilman> I can take a copy of this up to Doug for any kind of decision. Responses from here in the lab were more hopeful of getting the work in, and tehn having tickets filed against it, but rewriting it as a plugin may be too large for that
[12:59] <mheilman> I my self would prefer attempting to pass this along to the decision powers, along with having Melanie's suggestions placed on the mantis for the patch
[13:00] <mheilman> There is additional bureaucracy between the developers and myself since the first patch.
[13:00] <mheilman> but they monitor the mantis
[13:00] <diva> yeah, this requires a complete rearchitecture of the solution. The code is essentially right, but it's in the wrong places. It will be very good when it's done. Those server plugins are there for precisely these kinds of uses.
[13:00] <diva> Let me copy-paste this conversation in the mantis

(0029113)
orenh (administrator)
2015-08-06 13:59

In order to streamline the acceptance of this patch, I have reassigned this issue to Diva. This acknowledges that she is currently engaged with the submitters, and I don't want to muddy up the works by adding my own stream of comments.
(0029114)
Diva (administrator)
2015-08-06 14:41

Just to be clear: orenh's comments are valid, and will need to be addressed. But it seemed that, at the bottom of this, the patch had deeper architectural issues that really need to be fixed by someone before this gets sent upstream. I can smell a plugin wanting to come out of this!
The terminology problems are a symptom of these deeper architectural issues.

- Issue History
Date Modified Username Field Change
2015-07-24 10:26 moses New Issue
2015-07-24 10:26 moses File Added: mosesMetricsPhase3 (1).patch
2015-07-25 23:17 orenh Note Added: 0028981
2015-07-25 23:17 orenh Assigned To => orenh
2015-07-25 23:17 orenh Status new => patch included
2015-07-26 07:20 orenh Relationship added has duplicate 0007632
2015-07-26 07:25 orenh Note Added: 0028989
2015-07-26 07:25 orenh Status patch included => patch feedback
2015-07-26 07:47 orenh Note Added: 0028990
2015-07-26 08:59 BlueWall Note Added: 0028992
2015-07-26 10:17 aiaustin Note Added: 0028993
2015-07-26 10:18 aiaustin Note Edited: 0028993 View Revisions
2015-07-26 10:21 aiaustin Note Edited: 0028993 View Revisions
2015-07-26 10:22 aiaustin Note Edited: 0028993 View Revisions
2015-07-26 10:23 aiaustin Note Edited: 0028993 View Revisions
2015-07-26 12:34 aiaustin Note Edited: 0028993 View Revisions
2015-08-05 13:54 moses File Deleted: mosesMetricsPhase3 (1).patch
2015-08-05 13:55 moses File Added: mosesMetricsPhase3_2015Aug05.patch
2015-08-05 13:56 moses Note Added: 0029098
2015-08-05 13:56 moses Status patch feedback => patch included
2015-08-06 06:34 Diva Note Added: 0029104
2015-08-06 13:07 Diva Note Added: 0029112
2015-08-06 13:08 Diva Note Edited: 0029112 View Revisions
2015-08-06 13:56 orenh Assigned To orenh => Diva
2015-08-06 13:56 orenh Status patch included => assigned
2015-08-06 13:59 orenh Note Added: 0029113
2015-08-06 14:41 Diva Note Added: 0029114


Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker