Commit Graph

45 Commits (bd9c0a2e658e183bb8a321cdce546f10b6d76afe)

Author SHA1 Message Date
Andrey Lihatskiy 53ed0ec70c Merge branch 'main' into DRTVWR-567
# Conflicts:
#	doc/contributions.txt
2023-05-17 23:57:11 +03:00
Andrey Kleshchev 16712d2437 SL-19493 Fix inventory log spam 2023-03-30 23:44:40 +03:00
Nat Goodspeed 769bf46a3f SL-14399: Ditch overflow queue LLViewerAssetStorage::mCoroWaitList.
mCoroWaitList was introduced to prevent an assertion failure crash:
LLCoprocedureManager never expects to fill LLCoprocedurePool::mPendingCoprocs
queue. The queue limit was arbitrarily set to 4096 some years ago, but in
practice LLViewerAssetStorage can post way more requests than that.

LLViewerAssetStorage checked whether the target LLCoprocedureManager pool's
queue looked close to full, and if so posted the pending request to its
mCoroWaitList instead. But then it had to override the base LLAssetStorage
method checkForTimeouts() to continually check whether pending tasks could be
moved from mCoroWaitList to LLCoprocedureManager.

A simpler solution is to enlarge LLCorpocedureManager::DEFAULT_QUEUE_SIZE, the
upper limit on mPendingCoprocs. Since mCoroWaitList was an unlimited queue,
making DEFAULT_QUEUE_SIZE "very large" does not increase the risk of runaway
memory consumption.
2022-12-07 09:50:02 -05:00
Andrey Kleshchev 3f31901640 Merge master (DRTVWR-515) into DRTVWR-516-maint
# Conflicts:
#	autobuild.xml
#	doc/contributions.txt
#	indra/llcommon/llcoros.cpp
#	indra/llmessage/llcoproceduremanager.cpp
#	indra/newview/llfloaterfixedenvironment.cpp
#	indra/newview/llfloaterimsessiontab.cpp
2021-04-29 21:00:25 +03:00
Andrey Kleshchev f06ebd054b SL-14807 Missed a pool init in unused constructor, additional protections 2021-02-14 20:42:30 +02:00
Andrey Kleshchev 4a3e32e732 SL-14807 Adjusted unit test 2021-02-11 02:49:13 +02:00
Andrey Kleshchev 24d4517458 SL-14807 Viewer crashes when creating an experience 2021-02-10 01:10:36 +02:00
Andrey Kleshchev 71bca1d860 SL-14399 Enqueue into 'LLViewerAssetStorage::assetRequestCoro' failed 2020-11-26 22:27:48 +02:00
Andrey Kleshchev 5172f5d6d6 SL-14037 BugSplat Crash #646590: Enqueue failed in AIS 2020-10-01 22:36:52 +03:00
Andrey Kleshchev a42045994d SL-13555 'Second Life quit unexpectedly' error message 2020-08-28 01:30:37 +03:00
Andrey Kleshchev 2f52a37e6a SL-13811 Crash on coroprocedure 2020-08-28 00:33:25 +03:00
Andrey Kleshchev e4350fb9ef SL-13811 Crash on coroprocedure
Coroprosedure should stop on 'stop' exception
2020-08-20 23:44:45 +03:00
Andrey Kleshchev 58ba75f6dd SL-13783 Workaround for enqueueCoprocedure() crash #2 2020-08-18 16:23:59 +03:00
Andrey Kleshchev fa0cc7b6d2 Merged in SL-13783 and SL-13789 2020-08-17 20:49:56 +00:00
Andrey Kleshchev 80fe2157fe SL-13783 Workaround for enqueueCoprocedure() crash with asset storage 2020-08-17 21:52:28 +03:00
Andrey Kleshchev cca777fdf5 SL-13679 Event pump DupListenerName crash at login 2020-07-24 23:53:57 +03:00
Nat Goodspeed b7d60f650d DRTVWR-476: Fix LLCoprocedurePool::enqueueCoprocedure() shutdown crash. 2020-05-20 10:44:34 -04:00
Nicky Dasmijn 13b4bd5832 Make sure coproc gets destroyed after each iteration.
Making coproc scoped to the for loop will make sure the destructor gets
called every loop iteration. Keeping it's scope outside the for loop
means the pointer keeps valid till the next assigment that happens
inside pop_wait_for when it gets assigned a new value.

Triggering the dtor inside pop_wait_for can lead to deadlock when inside
the dtor a coroutine tries to call enqueueCoprocedure (this happens).
enqueueCoprocedure then will try to grab the lock for try_push but this
lock is still held by pop_wait_for.
2020-05-19 21:27:16 +02:00
Nat Goodspeed 003ba682a1 DRTVWR-476: Clean up reverting to boost::fibers::buffered_channel. 2020-05-19 14:38:14 -04:00
Nat Goodspeed 9d428662f8 DRTVWR-476: Revert "Use LLThreadSafeQueue, not boost::fibers::buffered_channel."
This reverts commit bf8aea5059.

Try boost::fibers::buffered_channel again with Boost 1.72.
2020-05-19 11:32:24 -04:00
Nat Goodspeed ce36ef8242 DRTVWR-476: Use LLThreadSafeQueue::close() to shut down coprocs.
The tactic of pushing an empty QueuedCoproc::ptr_t to signal coprocedure close
only works for LLCoprocedurePools with a single coprocedure (e.g. "Upload" and
"AIS"). Only one coprocedureInvokerCoro() coroutine will pop that empty
pointer and shut down properly -- the rest will continue waiting indefinitely.

Rather than pushing some number of empty pointers, hopefully enough to notify
all consumer coroutines, close() the queue. That will notify as many consumers
as there may be.

That means catching LLThreadSafeQueueInterrupt from popBack(), instead of
detecting empty pointer.

Also, if a queued coprocedure throws an exception, coprocedureInvokerCoro()
logs it as before -- but instead of rethrowing it, the coroutine now loops
back to wait for more work. Otherwise, the number of coroutines servicing the
queue dwindles.
2020-03-25 19:25:42 -04:00
Nat Goodspeed fc2437fb5d DRTVWR-476: Introduce LLCoprocedureManager::close(). Use in tests.
The new close(void) method simply acquires the logic from
~LLCoprocedureManager() (which now calls close()). It's useful, even if only
in test programs, to be able to shut down all existing LLCoprocedurePools
without having to name them individually -- and without having to destroy the
LLCoprocedureManager singleton instance. Deleting an LLSingleton should be
done only once per process, whereas test programs want to reset the
LLCoprocedureManager after each test.
2020-03-25 19:07:22 -04:00
Nat Goodspeed bf8aea5059 DRTVWR-476: Use LLThreadSafeQueue, not boost::fibers::buffered_channel.
We've observed buffered_channel::try_push() hanging, which seems very odd. Try
our own LLThreadSafeQueue instead.
2020-03-25 19:07:22 -04:00
Nat Goodspeed b461b5dcef DRTVWR-476: Manually count items in LLCoprocedurePool's pending queue.
Reinstate LLCoprocedureManager::countPending() and count() methods. These were
removed because boost::fibers::buffered_channel has no size() method, but
since all users run within a single thread, it works to increment and
decrement a simple counter.

Add count information and max queue size to log messages.
2020-03-25 19:06:13 -04:00
Nat Goodspeed cc6f1d6195 DRTVWR-476: Use shared_ptr to manage lifespan of coprocedure queue.
Since the consuming coroutine LLCoprocedurePool::coprocedureInvokerCoro() has
been observed to outlive the LLCoprocedurePool instance that owns the
CoprocQueue_t, closing that queue isn't enough to keep the coroutine from
crashing at shutdown: accessing a deleted CoprocQueue_t is fatal whether or
not it's been closed.

Make LLCoprocedurePool store a shared_ptr to a heap CoprocQueue_t instance,
and pass that shared_ptr by value to consuming coroutines. That way the
CoprocQueue_t instance is guaranteed to live as long as the last interested
party.
2020-03-25 19:05:17 -04:00
Nat Goodspeed 26c8ccfc06 DRTVWR-476: Back out changeset 40c0c6a8407d ("final" LLApp listener) 2020-03-25 19:02:24 -04:00
Nat Goodspeed cbf146f2b3 DRTVWR-476: Pump coroutines a few more times when we start quitting.
By the time "LLApp" listeners are notified that the app is quitting, the
mainloop is no longer running. Even though those listeners do things like
close work queues and inject exceptions into pending promises, any coroutines
waiting on those resources must regain control before they can notice and shut
down properly. Add a final "LLApp" listener that resumes ready coroutines a
few more times.

Make sure every other "LLApp" listener is positioned before that new one.
2020-03-25 19:02:24 -04:00
Nat Goodspeed 1345a02b21 DRTVWR-476: Terminate long-lived coroutines to avoid shutdown crash.
Add LLCoros::TempStatus instances around known suspension points so
printActiveCoroutines() can report what each suspended coroutine is waiting
for.

Similarly, sprinkle checkStop() calls at known suspension points.

Make LLApp::setStatus() post an event to a new LLEventPump "LLApp" with a
string corresponding to the status value being set, but only until
~LLEventPumps() -- since setStatus() also gets called very late in the
application's lifetime.

Make postAndSuspendSetup() (used by postAndSuspend(), suspendUntilEventOn(),
postAndSuspendWithTimeout(), suspendUntilEventOnWithTimeout()) add a listener
on the new "LLApp" LLEventPump that pushes the new LLCoros::Stopping exception
to the coroutine waiting on the LLCoros::Promise. Make it return the new
LLBoundListener along with the previous one.

Accordingly, make postAndSuspend() and postAndSuspendWithTimeout() store the
new LLBoundListener returned by postAndSuspendSetup() in a LLTempBoundListener
(as with the previous one) so it will automatically disconnect once the wait
is over.

Make each LLCoprocedurePool instance listen on "LLApp" with a listener that
closes the queue on which new work items are dispatched. Closing the queue
causes the waiting dispatch coroutine to terminate. Store the connection in an
LLTempBoundListener on the LLCoprocedurePool so it will disconnect
automatically on destruction.

Refactor the loop in coprocedureInvokerCoro() to instantiate TempStatus around
the suspending call.

Change a couple spammy LL_INFOS() calls to LL_DEBUGS(). Give all logging calls
in that module a "CoProcMgr" tag to make it straightforward to re-enable the
LL_DEBUGS() calls as desired.
2020-03-25 19:02:24 -04:00
Nicky 96e7e92e2e General cleanup. Delete commented out code. 2020-03-25 18:44:04 -04:00
Nicky a27281591d Replace boost::fibers::unbuffered_channel with boost::fibers::buffered_channel.
Using boost::fibers::unbuffered_channel can block the mainthread when calling mPendingCoprocs.push (LLCoprocedurePool::enqueueCoprocedure)
From the documentation:
- If a fiber attempts to send a value through an unbuffered channel and no fiber is waiting to receive the value, the channel will block the sending fiber.

This can happen if LLCoprocedurePool::coprocedureInvokerCoro is running a coroutine and this coroutine calls yield, resuming the viewers main loop. If inside
the main loop someone calls LLCoprocedurePool::enqueueCoprocedure now push will block, as there's no one waiting for a result right now.
The wait would be in LLCoprocedurePool::coprocedureInvokerCoro at the start of the while loop, but we have not reached that yet again as LLCoprocedurePool::coprocedureInvokerCoro
did yield before reaching pop_wait_for.
The result is a deadlock.

boost::fibers::buffered_channel will not block as long as there's space in the channel. A size of 4096 (DEFAULT_QUEUE_SIZE) should be plenty enough for this.
2020-03-25 18:44:04 -04:00
Nicky dc8d2779ab Do not use string/chrono literals, sadly that won't work with GCC (4.9) 2020-03-25 18:40:45 -04:00
Anchor 16453005bb [DRTVWR-476] - update cef, fix merge 2020-03-25 18:40:45 -04:00
Brad Kittenbrink 828223bf1b Implemented some code review suggested cleanups. 2020-03-25 18:39:21 -04:00
Brad Kittenbrink c26c2bc3f0 Improved aggregate init syntax for DefaultPoolSizes map. 2020-03-25 18:39:21 -04:00
Brad Kittenbrink b09aa6a2bf Improved shutdown behavior of LLCoprocedureManager 2020-03-25 18:39:21 -04:00
Brad Kittenbrink 997bdfc886 First draft of boost::fibers::unbuffered_channel based implementation of LLCoprocedureManager 2020-03-25 18:39:21 -04:00
Nat Goodspeed 66981fab0b SL-793: Use Boost.Fiber instead of the "dcoroutine" library.
Longtime fans will remember that the "dcoroutine" library is a Google Summer
of Code project by Giovanni P. Deretta. He originally called it
"Boost.Coroutine," and we originally added it to our 3p-boost autobuild
package as such. But when the official Boost.Coroutine library came along
(with a very different API), and we still needed the API of the GSoC project,
we renamed the unofficial one "dcoroutine" to allow coexistence.

The "dcoroutine" library had an internal low-level API more or less analogous
to Boost.Context. We later introduced an implementation of that internal API
based on Boost.Context, a step towards eliminating the GSoC code in favor of
official, supported Boost code.

However, recent versions of Boost.Context no longer support the API on which
we built the shim for "dcoroutine." We started down the path of reimplementing
that shim using the current Boost.Context API -- then realized that it's time
to bite the bullet and replace the "dcoroutine" API with the Boost.Fiber API,
which we've been itching to do for literally years now.

Naturally, most of the heavy lifting is in llcoros.{h,cpp} and
lleventcoro.{h,cpp} -- which is good: the LLCoros layer abstracts away most of
the differences between "dcoroutine" and Boost.Fiber.

The one feature Boost.Fiber does not provide is the ability to forcibly
terminate some other fiber. Accordingly, disable LLCoros::kill() and
LLCoprocedureManager::shutdown(). The only known shutdown() call was in
LLCoprocedurePool's destructor.

We also took the opportunity to remove postAndSuspend2() and its associated
machinery: FutureListener2, LLErrorEvent, errorException(), errorLog(),
LLCoroEventPumps. All that dual-LLEventPump stuff was introduced at a time
when the Responder pattern was king, and we assumed we'd want to listen on one
LLEventPump with the success handler and on another with the error handler. We
have never actually used that in practice. Remove associated tests, of course.

There is one other semantic difference that necessitates patching a number of
tests: with "dcoroutine," fulfilling a future IMMEDIATELY resumes the waiting
coroutine. With Boost.Fiber, fulfilling a future merely marks the fiber as
ready to resume next time the scheduler gets around to it. To observe the test
side effects, we've inserted a number of llcoro::suspend() calls -- also in
the main loop.

For a long time we retained a single unit test exercising the raw "dcoroutine"
API. Remove that.

Eliminate llcoro_get_id.{h,cpp}, which provided llcoro::get_id(), which was a
hack to emulate fiber-local variables. Since Boost.Fiber has an actual API for
that, remove the hack.

In fact, use (new alias) LLCoros::local_ptr for LLSingleton's dependency
tracking in place of llcoro::get_id().

In CMake land, replace BOOST_COROUTINE_LIBRARY with BOOST_FIBER_LIBRARY. We
don't actually use the Boost.Coroutine for anything (though there exist
plausible use cases).
2020-03-25 17:32:45 -04:00
Nat Goodspeed 4d10172d8b MAINT-5011: Catch unhandled exceptions in LLCoros coroutines.
Wrap coroutine call in try/catch in top-level coroutine wrapper function
LLCoros::toplevel(). Distinguish exception classes derived from
LLContinueError (log and continue) from all others (crash with LL_ERRS).

Enhance CRASH_ON_UNHANDLED_EXCEPTIONS() and LOG_UNHANDLED_EXCEPTIONS() macros
to accept a context string to supplement the log message. This lets us replace
many places that called boost::current_exception_diagnostic_information() with
LOG_UNHANDLED_EXCEPTIONS() instead, since the explicit calls were mostly to
log supplemental information.

Provide supplemental information (coroutine name, function parameters) for
some of the previous LOG_UNHANDLED_EXCEPTIONS() calls. This information
duplicates LL_DEBUGS() information at the top of these functions, but in a
typical log file we wouldn't see the LL_DEBUGS() message.

Eliminate a few catch (std::exception e) clauses: the information we get from
boost::current_exception_diagnostic_information() in a catch (...) clause
makes it unnecessary to distinguish.

In a few cases, add a final 'throw;' to a catch (...) clause: having logged
the local context info, propagate the exception to be caught by higher-level
try/catch.

In a couple places, couldn't resist reconciling indentation within a
particular function: tabs where the rest of the function uses tabs, spaces
where the rest of the function uses spaces.

In LLLogin::Impl::loginCoro(), eliminate some confusing comments about an
array of rewritten URIs that date back to a long-deleted implementation.
2016-08-18 17:33:44 -04:00
Nat Goodspeed 993f54f6e9 MAINT-5011: Try to enrich catch (...) logging throughout viewer.
Turns out we have a surprising number of catch (...) clauses in the viewer
code base. If all we currently do is

    LL_ERRS() << "unknown exception" << LL_ENDL;

then call CRASH_ON_UNHANDLED_EXCEPTION() instead. If what we do is

    LL_WARNS() << "unknown exception" << LL_ENDL;

then call LOG_UNHANDLED_EXCEPTION() instead.

Since many places need LOG_UNHANDLED_EXCEPTION() and nobody catches
LLContinueError yet, eliminate LLContinueError& parameter from
LOG_UNHANDLED_EXCEPTION(). This permits us to use the same log message as
CRASH_ON_UNHANDLED_EXCEPTION(), just with a different severity level.

Where a catch (...) clause actually provides contextual information, or makes
an error string, add boost::current_exception_diagnostic_information() to try
to figure out actual exception class and message.
2016-08-17 15:40:03 -04:00
Rider Linden 118e82e477 MAINT-6305: Serialize the AIS calls by reducing the queue size to 1, move the bake request out of the AIS queue. 2016-04-13 22:40:49 +01:00
Rider Linden 75c6549fde Set consistent terminology for yield/wait -> suspend for coroutines. 2015-09-18 11:39:22 -07:00
Rider Linden a75dca5a51 LL_ERRS_IF only seems to work on Microsoft... 2015-09-03 17:37:25 -07:00
Rider Linden 8913ed6692 Changes from code review with Nat 2015-09-03 16:59:00 -07:00
Rider Linden 92a8b6690e Use boost assign to initialize default pool sizes. 2015-09-02 13:48:46 -07:00
Rider Linden 96e343b49b MAINT-5575: Convert the Experience cache into a coro based singleton.
--HG--
branch : MAINT-5575
2015-09-01 16:13:52 -07:00