book
Article ID: CTX221364
calendar_today
Updated On:
Description
PVS servers were randomly offline and then recover automatically. The issue could happen several times a day on different PVS server. When PVS server was offline, sometimes we saw Streamprocess event 11 logged in event log, but sometime not.
Resolution
We add host record for the SQL server in the host file on PVS server to bypass NBNS query.
Problem Cause
From the CDF trace on the problem PVS server, we saw SQL connection pool exhaustion and the thread took about 3min 48sec to add additional 50 connections to the connection pool. Streamprocess will access DB every 4 sec by updating server status. The DB access threads gets connection failure (or no progress) for 4min during the connection creation, the DB -offline mode is triggered; and the DB-Ping continues while in DB-Offline mode until the DB is up and PVS server switches back to DB-online mode. So this is the designed recovery mechanism.
Trace one:
============
773359,1,2017/02/09 18:16:31:96184,1308,4928,-1,StreamDb,GuardedConnectionPool.cpp,115,Ardence::GuardedConnectionPool::GetFreeValidLiveConnection,4,CDF_INFO,"Couldn't find a free, valid connection. Will try to add more to the pool",""
775138,3,2017/02/09 18:20:52:21307,5124,4928,-1,StreamProcess,SSProtocolModule.cpp,1889,CSSProtocolModule::CheckMonitorThreads,1,CDF_ERROR,"1 threads appear to be hung, 0 threads in danger of hanging",""
775139,3,2017/02/09 18:20:52:21308,5124,4928,-1,StreamProcess,SSProtocolModule.cpp,1811,CSSProtocolModule::LogReport,1,CDF_ERROR,"1 thread(s) last known to be active at SSDBAccess.cpp:758 have been inactive from 265 to 265 seconds.",""
775140,3,2017/02/09 18:20:52:21334,5124,4928,-1,StreamProcess,SSProtocolModule.cpp,2935,CSSProtocolModule::DoTraceEvent,1,CDF_ERROR,"Terminating StreamProcess.",""
775141,3,2017/02/09 18:20:52:21335,5124,4928,-1,StreamProcess,StreamProcess.cpp,108,TerminateStreamProcess,2,CDF_INFO,"Terminating (TerminateStreamProcess) StreamProcess 7.6.3.9",""
775224,0,2017/02/09 18:20:56:71256,1308,4928,-1,StreamDb,GuardedConnectionPool.cpp,355,Ardence::GuardedConnectionPool::UnguardedAddConnectionsAndReturnFirstOne,4,CDF_INFO,"Added 50 connections to the connectionPool_, we currently have 52 entries",""
775225,0,2017/02/09 18:20:56:71256,1308,4928,-1,StreamDb,GuardedConnectionPool.cpp,125,Ardence::GuardedConnectionPool::GetFreeValidLiveConnection,8,CDF_INFO,"handing out connection = 2",""
Trace two:
============
2017/02/22 14:27:06:83063 5864 -1 StreamDb GuardedConnectionPool.cpp 115 Ardence::GuardedConnectionPool::GetFreeValidLiveConnection 4 Information Couldn't find a free, valid connection. Will try to add more to the pool
2017/02/22 14:30:54:21762 5864 -1 StreamDb GuardedConnectionPool.cpp 355 Ardence::GuardedConnectionPool::UnguardedAddConnectionsAndReturnFirstOne 4 Information Added 50 connections to the connectionPool_, we currently have 51 entries
From the network trace, we found that it took about 5 seconds to create each connection. The time was spent on the NBNS query sending from PVS server to SQL but without response (The query was blocked by the firewell between PVS servers and the SQL database), so timeout reached. NBNS query happened after 3 tcp handshake on port 1433 and before NTLM authentication took place.
Issue/Introduction
PVS servers randomly became offline and recover themselves due to the slowness to establish additional SQL connections.