Mantis Bug Tracker

View Issue Details Jump to Notes ] Issue History ] Print ]
IDProjectCategoryView StatusDate SubmittedLast Update
0008884opensim[REGION] Physics Enginespublic2021-03-31 01:052021-04-01 16:08
ReporterKayaker Magic 
Assigned To 
PriorityhighSeveritymajorReproducibilityalways
StatusnewResolutionopen 
PlatformDell T1400 Server TowerOperating SystemUbuntu LinuxOperating System Version18.04
Product Version 
Target VersionFixed in Version 
Summary0008884: llRegionSayTo looses messages in YEngine if more than 65 llRegionSayTo calls are made in a single event.
DescriptionIn Yeti 0.9.2 with XEngine this works fine.
In Yeti 0.9.2 with YEngine, if a single event sends more than 65 messages with llRegionSayTo in a loop, only the first 65 arrive and the rest are discarded.

Watching the two scripts below work, apparently the sender script runs and exits the event before the receiver script is allowed to run. Then the receiver only gets the first 65 messages.
If you put a llSleep(0.1); call after every llRegionSayTo in the sender, then the receiver script is allowed to alternate with the sender and it works again. But many scripts written under XEngine worked without this sleep.
If you set the testsiz to 1000 in both programs then VERY STRANGE THINGS happen. The sender sends the first 678 or so messages, then the receiver and the sender take turns displaying chunks of messages OUT OF ORDER or repeating some messages. I wrote the code so it didn't care if the messages arrived out of order, but the receiver displays errors about messages it hasn't received yet then tells me it received them later. Despite jumping back and forth in time, the receiver still only gets the first 64 messages. Setting the watchdog timer in the receiver to a large value cleans up most, but not all of the scrambled timeline, but still only gets the first 65 messages.
Steps To ReproducePut the Sender script below in a prim, put the Receiver script in a second prim, click on the Receiver prim. It asks the Sender to reply with 100 small messages.

Additional Information    //Sender script
integer testsiz=100;
default
{
    state_entry()
    {
        llSay(0, "Click reciever to start");
        llListen(-1600,"",NULL_KEY,"");
    }
    listen(integer channel, string name, key id, string message)
    {
        integer i;
        for(i=0;i<testsiz;i++)
        {
            llOwnerSay("sending "+(string)i);
            llRegionSayTo(id,-1600,(string)i);
        }
    }
}


//Receiver script
integer testsiz=100;
list arr;

default
{
    state_entry()
    {
        llSay(0, "click to start");
        llListen(-1600,"",NULL_KEY,"");
    }
    touch_start(integer num)
    {
        integer i;
        arr=[];
        for (i=0;i<testsiz;i++)
            arr = arr+[-1];
        llRegionSay(-1600,"start");
    }
    listen(integer channel, string name, key id, string message)
    {
        integer i=(integer)message;
        llOwnerSay("got "+message);
        arr = llListReplaceList(arr,[i],i,i);
        llSetTimerEvent(5.0);
    }
    timer()
    {
        llSetTimerEvent(0.0);
        integer errors=0;
        integer i;
        for (i=0;i<testsiz;i++)
        {
            if (i!= llList2Integer(arr,i))
            {
                errors += 1;
                llOwnerSay("Missing value "+(string)i);
            }
        }
        llOwnerSay((string)errors+" errors");
    }
}
TagsNo tags attached.
Git Revision or version number4445bc1222bc799f657b474d3ebc61deebd7e9d9
Run Mode Grid (Multiple Regions per Sim)
Physics EngineubODE
Script EngineYEngine
EnvironmentMono / Linux64
Mono Version6.x
ViewerFS 6.4.12
Attached Files

- Relationships

-  Notes
(0037690)
UbitUmarov (administrator)
2021-03-31 05:15

llRegionSayTo posts listen events, and script event queue is limited to 65 entries,
That loop will just fill the events queue without any chance of any exec thread reaction.

a Short llSleep will help, specially on Yengine, because it is a execution yield, that thread will go do bits of other scripts before return to continue. Possible it will be doing each of those events.

didn't notice any out of order on my testing
(0037691)
tampa (reporter)
2021-03-31 06:36

If you have that many things to listen to a region-wide call just use the regular RegionSay and validate senders on the receiver end. But if you want to directly talk to other prims, why not use MessageObject which does not require opening listeners everywhere. There are throttles on these things for a reason.
(0037692)
Kayaker Magic (reporter)
2021-03-31 08:27

So how does XEngine avoid this problem? Unfortunately because it works with loops like this, there are 1000's of scripts out there that used to work and now fail when people switch to YEngine.
(0037693)
tampa (reporter)
2021-03-31 08:52
edited on: 2021-03-31 08:53

Event limits are imposed on XEngine as well, just in form of timed timeout rather than event count alone. Avoiding large events is key for efficient scripts that don't have a large negative performance impact on the region. The timeouts and limits set are to prevent the worst cases of this and are rather generous in many ways as it is still very much possible to within them knock a region to its knees. There are plenty of ways to construct inter-prim data exchange that works with a lot less overhead, MessageObject and dataserver for example.

Equally just because things worked a certain way before does not constitute that they were written well or even properly. YEngine in some ways enforces structures and programming conventions that are more in line with keeping scripts light and efficient. If written well they will work in either engine pretty much identically.

(0037694)
JeffKelley (reporter)
2021-03-31 10:49

Kayaker Magic : Your are queuing 100 messages at maximum speed when the queue size is 65. There is no guarantee the queue will be consumed fast enough on the receiving end to avoid an overflow. Actually, a faster script engine may makes things worst.

If you want to drive a communication link near maximum rate, you need some kind of flow-control.
(0037695)
kcozens (administrator)
2021-03-31 12:50

Have you checked that the sender script is able to send all 100 messages with 5 seconds? Test the receiver by having it only print out the messages it receives as it receives them. In the sample receiver script using llListReplaceList() to modify a list as messages comes in is going to take a lot of time and will probably result in a high script run-time. Just add to the end of list as messages are received. If the receiver needs to know how many scripts were received keep a count. The receiver shouldn't assume it got 100 messages. Check the length of the array or use the count of received messages.
(0037697)
Kayaker Magic (reporter)
2021-04-01 15:13

How about instead of quietly loosing messages, you display an error on the debug channel every time you toss a message? Then all the people with these 10 year old terrible scripts will at least know that they have to find someone to re-write them. Although they may still just blame YEngine and switch back to XEngine instead.
(0037698)
UbitUmarov (administrator)
2021-04-01 15:16

you mean 936 messages in one of your test cases?
nahh i don't think so ;)
(0037699)
tampa (reporter)
2021-04-01 15:21

That would require the "other side" somehow knowing that something was dropped, effectively what we have is a udp system, we'd need to request a read-receipt from each call, meaning even more data being sent about. Yeah don't see that working all too well.

I do recall I was doing loads of loops to refresh dynamic textures due to their lengthy calls the loops would run into timeout, so I made multiple one after another. Not ideal, better approach would be to have done that with a timer even if those are heavy as well.

There are limits to LSL both in regards to throttles as well as performance. Trying to strike a compromise between those two is not easy, but the aim has always been to provide performance within reasonable limits to both provide performance and prevent regions crashing from overload.
(0037700)
JeffKelley (reporter)
2021-04-01 16:08
edited on: 2021-04-01 17:16

> How about instead of quietly loosing messages, you display an error on the debug channel every time you toss a message?

Think of ll*Say as UDP. Unreliable. Not because it is buggy, but because it offers no guarantee that the message will be delivered, and when. If you want reliability, you have to add a protocol with retries, flow control etc ... and this is TCP. Then, your call to write (llSay) would block your script until the network stack is ready to accept a new message.

Like UDP, ll*Say may drop messages without warning, and overflowing a buffer in the path is a damned good reason to drop one.

I suspect that your script was working under XEngine by accident. Because the receiving process took less time than the sending one. If, for any reason, this is no longer true (and there are many such reasons for that in a multitasking environment), the asumption fails. Messages get lost.

You have exactly zero promise that a fragment of code will take more, or less, than a given time. So, the asumption that the receiver will eat data faster than the sender can spits them is just an impossible one.

In the absence of a flow-control mechanism, the only thing you can do is to slowdown the sender *hoping* (cross your fingers) that it will be slower than the receiver.


- Issue History
Date Modified Username Field Change
2021-03-31 01:05 Kayaker Magic New Issue
2021-03-31 05:15 UbitUmarov Note Added: 0037690
2021-03-31 06:36 tampa Note Added: 0037691
2021-03-31 08:27 Kayaker Magic Note Added: 0037692
2021-03-31 08:52 tampa Note Added: 0037693
2021-03-31 08:53 tampa Note Edited: 0037693 View Revisions
2021-03-31 10:49 JeffKelley Note Added: 0037694
2021-03-31 12:50 kcozens Note Added: 0037695
2021-04-01 15:13 Kayaker Magic Note Added: 0037697
2021-04-01 15:16 UbitUmarov Note Added: 0037698
2021-04-01 15:21 tampa Note Added: 0037699
2021-04-01 16:08 JeffKelley Note Added: 0037700
2021-04-01 16:09 JeffKelley Note Edited: 0037700 View Revisions
2021-04-01 16:10 JeffKelley Note Edited: 0037700 View Revisions
2021-04-01 17:16 JeffKelley Note Edited: 0037700 View Revisions


Copyright © 2000 - 2012 MantisBT Group
Powered by Mantis Bugtracker