User Tools

Site Tools


backendprocesses

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
backendprocesses [2007/10/09 16:25]
faltin
backendprocesses [2015/09/24 14:48] (current)
morten warn of obsolesence
Line 1: Line 1:
 ====== Back-end processes in NAV ====== ====== Back-end processes in NAV ======
  
-NAV has a number of back-end processes. This document gives an overviewlisting key information ​and  +<note warning>​This page has not been updated for several years, and may be quite outdated ​for NAV 4</​note>​
-detailed description ​for each process. We also give references to documentation found elsewhere on metaNAV. +
- +
- +
- +
-The following figure complements this document (the NAV 3.3 snmptrapd is not included in the figure): +
- +
-{{architecture1.png?​800|The ​NAV processes}} +
- +
- +
  
 +NAV has a number of back-end processes. This page attempts to give an overview of them.
  
 +{{:​backend-processes-3.10.png?​800|The NAV processes}}
  
 ===== nav list / nav status ===== ===== nav list / nav status =====
Line 24: Line 16:
    * [[backendprocesses#​collecting_statistics|cricket]] (includes makecricketConfig,​ Cricket collector and cleanrrds)    * [[backendprocesses#​collecting_statistics|cricket]] (includes makecricketConfig,​ Cricket collector and cleanrrds)
    * [[#​eventengine|eventengine]]    * [[#​eventengine|eventengine]]
-   * [[#getdevicedata|getDeviceData]] +   * [[#ipdevpoll|ipdevpoll]]
-   * [[#​iptrace|iptrace]]+
    * [[#​logengine|logengine]]    * [[#​logengine|logengine]]
    * [[#​mactrace|mactrace]]    * [[#​mactrace|mactrace]]
    * [[#​maintengine|maintengine]]    * [[#​maintengine|maintengine]]
-   * [[#​networkdiscovery_topology|networkDiscovery]] (physical and [[#​networkdiscovery_vlan|vlan discovery]]) 
    * [[#​pping|pping]]    * [[#​pping|pping]]
 +   * [[#​psuwatch|psuwatch]]
    * [[#​servicemon|servicemon]]    * [[#​servicemon|servicemon]]
    * [[#​smsd|smsd]]    * [[#​smsd|smsd]]
-   * [[#​thresholdmon|thresholdMon]] 
    * [[#​snmptrapd|snmptrapd]]    * [[#​snmptrapd|snmptrapd]]
 +   * [[#​thresholdmon|thresholdMon]]
 +   * [[#​topology|topology]]
  
  
 ====== Building the network model ====== ====== Building the network model ======
  
-===== getDeviceData ​===== +===== ipdevpoll ​=====
  
 ==== Key information ==== ==== Key information ====
  
-^ Process name         ​| ​getDeviceData ​ | +^ Process name         ​| ​ipdevpoll ​|
-^ Alias                | gDD / the snmp data collector  ​|+
 ^ Polls network ​       | Yes            | ^ Polls network ​       | Yes            |
 ^ Brief description ​   | Collects SNMP data from equipment in the netbox table and stores data regarding the equipment in a number of tables. Does not build topology. | ^ Brief description ​   | Collects SNMP data from equipment in the netbox table and stores data regarding the equipment in a number of tables. Does not build topology. |
-^ Depends upon         | Seed data must be filled in the netbox table, ​either by the Edit Database tool or by the autodiscovery contrib |+^ Depends upon         | Seed data must be filled in the netbox table, ​using the [[seedessentials|Seed ​Database tool]] 
-^ Updates tables ​      | netbox, netboxsnmpoid,​ netboxinfo, device, module, gwport, gwportprefix,​ prefix, vlan, swport, swportallowedvlan, netbox_vtpvlan ​+^ Updates tables ​      | netbox, netboxsnmpoid,​ netboxinfo, device, module, gwportprefix,​ prefix, vlan, interface, swportallowedvlan | 
-^ Run mode             | Daemon process. Thread based. ​+^ Run mode             | Daemon process | 
-^ Default scheduling ​  ​| ​Initial data collection for new netboxes ​is done every 5 minutesUpdate polls on existing netboxes ​is done every 6 hrs. Collection of certain OIDs for the netbox may deviate from this interval; i.e. the moduleMon OID is polled every hour. | +^ Default scheduling ​  ​| ​Polling ​is organized into jobs in ''​ipdevpoll.conf'',​ so is job scheduling. | 
-^ Config file | getDeviceData.conf | +^ Config file | ipdevpoll.conf | 
-^ Log files | getDeviceData.log og getDeviceData/​getDeviceData-stderr.log |  +^ Log files | ipdevpoll.log |  
-^ Programming language | Java | +^ Programming language | Python ​
-^ Lines of code        | Approx 8200 +^ Further doc          | |
-^ Further doc          | [[http://​metanav.uninett.no/​static/​reports/​tigaNAV.pdf|tigaNAV report chapter 5]]|+
  
  
Line 63: Line 52:
 ==== Details ==== ==== Details ====
  
-  * Initial OID classification ​\\ When gDD detects a new box that has a valid snmp read community (regardless of category), he will start initial OID classifiation. This is done by testing the netbox against all OIDs in the snmpoid table and in turn populating the netboxsnmpoid tableTesting is done based on attributes in snmpoid tablesee reference to further doc for details. Frequency will be set based on the snmpoid.defaultfreq. +  * jobs and plugins ​\\ All ipdevpoll'​s work is done by plugins ​Plugins are organized into jobsand jobs are scheduled ​for each active IP device individually
- +  * inventory job \\ Polls for inventory information every 6 hours (by default).  ​Inventory information includes interfaces, serial numbers, modules, VLANs and prefixes
-  * Plugin-based architecture ​\\ gDD has a plugin based architecture. Plugins fall into two types; device plugins and data plugins:  +  * profiler job \\ Runs every 5 minutes, profiling devices if deemed necessary ​NAV ​has an internal list of SNMP OIDs that are tested for compatibility with each device ​This ​is used to create ​sort of profile that says what the device supports ​the profile is typically used to produce a Cricket configuration that will collect statistics ​from proprietary OIDs
-     * Device plugins collects data with SNMP. Each device plugin is geared towards a particular type of equipment, supporting a particular subset of OIDs. See further doc for details. ​  +  * logging job \\ Runs every 30 minutes ​and collects ​log-like information from devices.  ​At the time being, only the arp plugin runs, collecting ARP caches from routers.  ​ARP data is logged to table, ​and aids in topology detection and client machine tracking.
-     * Data plugins updates NAVdb with data fed from the device plugins. A particular data plugin is responsible for a particular table (or set of tablesin the databaseSee further doc for details. +
- +
-  * Module monitor ​\\ The module monitor is a data plugin within gDDIt has the dedicated function ​of detecting outage of modules in operating netboxesWhen a module ​is detected down moduleDown event is posted on the event queue (eventq). +
- +
- +
-===== iptrace ===== +
- +
- +
- +
-==== Key information ==== +
- +
-^ Process name         | iptrace | +
-^ Alias                | IP-to-mac collector / arplogger| +
-^ Polls network ​       | Yes | +
-^ Brief description ​   | Collects arp data from routers and stores this information in the arp table| +
-^ Depends upon         | The routers (GW / GSW) must be in the netbox table. To assign prefixes to arp entries, gDD must have done router data collection. | +
-^ Updates tables ​      | arp | +
-^ Run mode             | cron | +
-^ Default scheduling ​  ​| ​every 30 minutes ​(0,30 * * * *). No threads | +
-^ Config file          | pping.conf | +
-^ Log file             | pping.log +
-^ Programming language | Perl| +
-^ Lines of code        | Approx 130 lines| +
-^ Further doc          | [[http://​metanav.uninett.no/​static/​reports//​NAVMe.pdf|NAVMe report ch 4.5.8]] (Norwegian) | +
- +
- +
-==== Details ==== +
- +
-  * iptrace understands proxy arp and will not store arp entries that are "​false"​. +
-  * The command line tool [[commandlinetools#​navclean.py|navclean.py]] offers ​means of deleting old arp (and cam) entries.+
  
  
Line 108: Line 67:
 ^ Polls network ​       | Yes | ^ Polls network ​       | Yes |
 ^ Brief description ​   | Collects mac addresses behind switch table data for all switches (cat GSW, SW, EDGE). The process also checks for spanning tree blocked ports. | ^ Brief description ​   | Collects mac addresses behind switch table data for all switches (cat GSW, SW, EDGE). The process also checks for spanning tree blocked ports. |
-^ Depends upon         ​| ​gDD must have completed ​the swport tables for the switches. |+^ Depends upon         ​| ​[[#​ipdevpoll|ipdevpoll]] ​must have created ​the swport tables for the switches. |
 ^ Updates tables ​      | cam (mac adresses), netboxinfo (CDP neighbors), swp_netbox (the candidate list for the physical topology builder), swportblocked (switch ports that are blocked by spannning for a given vlan). | ^ Updates tables ​      | cam (mac adresses), netboxinfo (CDP neighbors), swp_netbox (the candidate list for the physical topology builder), swportblocked (switch ports that are blocked by spannning for a given vlan). |
 ^ Run mode             | cron | ^ Run mode             | cron |
Line 115: Line 74:
 ^ Log file             | getBoksMacs.log | ^ Log file             | getBoksMacs.log |
 ^ Programming language | Java | ^ Programming language | Java |
-^ Lines of code        | Approx 1400 | +^ Further doc          | [[http://nav.uninett.no/​static/​reports/​NAVMore.pdf|NAVMore report ch 2.1]] (Norwegian),​ [[http://nav.uninett.no/​static/​reports/​tigaNAV.pdf|tigaNAV report ch 5.4.5 and ch 5.5.3]] |
-^ Further doc          | [[http://metanav.uninett.no/​static/​reports/​NAVMore.pdf|NAVMore report ch 2.1]] (Norwegian),​ [[http://metanav.uninett.no/​static/​reports/​tigaNAV.pdf|tigaNAV report ch 5.4.5 and ch 5.5.3]] |+
  
  
Line 149: Line 107:
  
 One notable improvement is the addition of the interface field in the swport table. It is used for matching the CDP remote interface, and makes this matching much more reliable. Also, both the cam and the swp_netbox tables now use netboxid and ifindex to uniquely identify a swport port instead of the old netboxid, module, port-triple. This has significantly simplified swport port matching, and especially since the old module field of swport was a shortened version of what is today the interface field, reliability has increased as well. One notable improvement is the addition of the interface field in the swport table. It is used for matching the CDP remote interface, and makes this matching much more reliable. Also, both the cam and the swp_netbox tables now use netboxid and ifindex to uniquely identify a swport port instead of the old netboxid, module, port-triple. This has significantly simplified swport port matching, and especially since the old module field of swport was a shortened version of what is today the interface field, reliability has increased as well.
-- 
- 
-===== networkDiscovery_topology ===== 
- 
- 
- 
  
 +===== topology =====
 ==== Key information ==== ==== Key information ====
  
-^ Process name         ​| ​networkDiscovery.sh topology+^ Process name         ​| ​navtopology ​
-^ Alias                | Physical Topology Builder |+^ Alias                | Physical ​and VLAN Topology Builder |
 ^ Polls network ​       | No | ^ Polls network ​       | No |
-^ Brief description ​   | Builds the physical topology ​of the network; i.e. which netbox is connected to which netbox. ​+^ Brief description ​   | Builds ​NAV's model of the physical ​network ​topology ​as well as the VLAN sub-topologies ​
-^ Depends upon         | mactrace fills data in swp_netbox representing the candidate ​list of physical ​neighborship. This is the data that the physical topology builder uses.|+^ Depends upon         | mactrace fills data in ''​swp_netbox'', ​representing the list of physical ​neighbor candidates. This is the data that the physical topology builder uses. |
 ^ Updates tables ​      | Sets the to_netboxid and to_swportid fields in the swport and gwport tables. | ^ Updates tables ​      | Sets the to_netboxid and to_swportid fields in the swport and gwport tables. |
 ^ Run mode             | cron | ^ Run mode             | cron |
 ^ Default scheduling ​  | every hour (35 * * * *) | ^ Default scheduling ​  | every hour (35 * * * *) |
 ^ Config file          | None |  ^ Config file          | None | 
-^ Log file             ​| ​networkDiscovery/​networkDiscovery-topology.html og networkDiscovery/​networkDiscovery-stderr.log   ​|  +^ Log file             ​| ​navtopology.log |  
-^ Programming language | Java | +^ Programming language | Python ​|
-^ Lines of code        | Approx 1500 (shared with vlan topology builder) | +
-^ Further doc          | [[http://​metanav.uninett.no/​static/​reports/​tigaNAV.pdf|tigaNAV report ch 5.5.4]] | +
  
 ==== Details ==== ==== Details ====
  
-FIXME This is cut and paste from the tigaNAV report. Consider a rewrite.+=== Physical topology ===
  
 +The topology discovery system builds NAV's view of the network topology based
 +on cues from information collected previously via SNMP.  ​
  
 +The information cues come from routers'​ IPv4 ARP caches and IPv6 Neighbor
 +Discovery caches, interface physical (MAC) addresses, switch forwarding tables
 +and CDP (Cisco Discovery Protocol). ​ The mactrace process has already
 +pre-parsed these cues and created a list of neighbor candidates for each port
 +in the network.
  
-The network topology discovery system automatically discovers the physical topology ​of the network monitored by NAV based on the data in the swp_netbox table collected by the cam logger. No major updates have been necessary except for adjustment to the new structure of the NAVdb; the basic algorithm remains the same. While the implementation of said algorithm is somewhat complicated as to gracefully handle missing data, the following is a simplified description:​ +The physical topology ​detection ​algorithm is responsible for reducing ​the list 
- +of neighbor candidates ​of each port to just one single device.
-   * We start with a candidate list for each swport port. These are the switches located behind a switch port and the goal of the algorithm is to pick the one to which it is connected directly. Some of the candidate lists, those of the switches one level up from the edge, will contain only one candidate. We can thus pick this as the switch directly connected and proceed to remove said switches from all other lists. After this removal there will be more candidate lists with only one candidate, and we can apply the same procedure again. +
-   * If we have the complete information about the network we could now simply iterate until all candidate lists were empty; however, to deal with missing information we sometimes have to make an educated guess of which is the directly connected switch. The network topology discover system makes the guess by looking at how far each candidate is from the router and how many switches are connected below them, and then try to pick the one which most closely matches the current switch. +
- +
-In practice the use of CDP makes this process very reliable for the devices supporting it, and this makes it easier to correctly determine the remaining topology even in the case of missing information. +
- +
-===== networkDiscovery_vlan ===== +
- +
- +
-==== Key information ==== +
-^ Process name         | networkDiscovery.sh vlan| +
-^ Alias                | Vlan Topology Builder | +
-^ Polls network ​       | No  | +
-^ Brief description ​   | Builds the per vlan topology on the swithed network with interconnected trunks. The algorithm is a top-down depth-first traversel starting at the primary router port for the vlan. | +
-^ Depends upon         | The physical topology need to be in place, this process therefore supersedes the physical topology builder.| +
-^ Updates tables ​      | swportvlan | +
-^ Run mode             | cron | +
-^ Default scheduling ​  | every hour (38 * * * *) | +
-^ Config file          | None |  +
-^ Log file             | networkDiscovery/​networkDiscovery-vlan.html og networkDiscovery/​networkDiscovery-stderr.log ​  |  +
-^ Programming language | Java | +
-^ Lines of code        | See the physical topology builder above | +
-^ Further doc          | [[http://​metanav.uninett.no/​static/​reports/​tigaNAV.pdf|tigaNAV report ch 5.5.5]] | +
- +
- +
-==== Details ====+
  
-FIXME This is cut and paste from the tigaNAV reportConsider a rewrite.+In practice the use of CDP makes this process very reliable for the devices 
 +supporting it, and this makes it easier to correctly determine ​the remaining 
 +topology even in the case of missing information CDP is, however, not 
 +trusted more than switch forwarding tables, as CDP packets may pass unaltered 
 +through switches that don't support CDP, causing CDP data to be inaccurate.
  
-After the physical ​topology ​of the network has been mapped by the network topology discover system it still remains to explore the logical topology, or the VLANs. Since modern switches support trunking, which can transport several independent VLANs over a single physical link, the logical topology can be non-trivial and indeed, in practice it usually is.+=== VLAN topology ​===
  
-The vlan discovery system uses a simple top-down depth-first graph traversal algorithm to discover which VLANs are actually running on the different trunks and in which direction. Direction is here defined relative to the router portwhich is the top of the tree, currently owning the lowest gateway IP or the virtual IP in the case of HSRPIn additionsince NAV v3 now fully supports the reuse of VLAN numbers, the vlan discovery system will also make the connection from VLAN number to actual vlan as defined in the vlan table for all non-trunk ports it encounters.+After the physical topology model of the network has been built, the logical 
 +topology ​of the VLANs still remains Since modern switches support 802.1Q 
 +trunkingwhich can transport several independent VLANs over a single physical 
 +link, the logical topology can be non-trivial and indeed, in practice ​it 
 +usually is.
  
-A special case are //closed// VLANs which do not have a gateway IP; the vlan discovery system ​will still traverse these VLANs without setting any direction ​and also creating a new VLAN record ​in the vlan tableThe NAV administrator can fill inn descriptive information afterward if desired.+The vlan discovery system ​uses a simple top-down depth-first graph traversal 
 +algorithm to discover which VLANs are actually running on the different trunks 
 +and in which direction. Direction is here defined relative to the router port, 
 +which is the top of the tree, currently owning the lowest gateway IP or the 
 +virtual IP in the case of HSRP ​Re-use of VLAN numbers in physicallyq disjoint 
 +parts of the network is supported.
  
-The implementation of this subsystem is again complicated by factors such as the need for checking at both ends of a trunk if the VLAN is allowed to traverse it, the fact that VLAN numbers on each end of non-trunk links need not match (the number closer to the top of the tree should then be given precedence and the lower VLAN numbers rewritten to match), that both trunks and non-trunks can be blocked (again at either end) by the spanning tree protocol and of course that it needs to be highly efficient and scalable in the case of large networks with thousands of switches and tens of thousands of switch ports.+The VLAN topology detector does not currently support mapping unrouted VLANs.
  
 ====== Monitoring the network ====== ====== Monitoring the network ======
  
 ===== pping ===== ===== pping =====
 +
  
  
Line 234: Line 180:
 ^ Log file             | pping.log ​  ​| ​ ^ Log file             | pping.log ​  ​| ​
 ^ Programming language | Python | ^ Programming language | Python |
-^ Lines of code        | Approx 4200, shared with servicemon | +^ Further doc          | See below, based on and translated from [[http://nav.uninett.no/​static/​reports/​NAVMore.pdf|NAVMore report ch 3.4]] (Norwegian) | 
-^ Further doc          | [[http://metanav.uninett.no/​static/​reports/​NAVMore.pdf|NAVMore report ch 3.4]] (Norwegian) |+ 
 + 
 + 
 + 
  
  
 ==== Details ==== ==== Details ====
    
-FIXME This is Norwegian text - cut and paste from the NAVMore docRewrite.+pping is a daemon with its own (configurable) scheduling. pping works in parallel which makes each ping sweep very  
 +efficient. The frequency of each ping sweep is per default 20 seconds. The maximum allowed response time for a host is 5 seconds (per default). A host is declared down on the event queue after four consecutive "no responses"​ (also configurable). This means 
 +that it takes between 80 and 99 seconds ​from a host is down till pping declares it as down.  
 + 
 +Please note the [[#​eventengine|event engine]] will 
 +have a grace period of one minute (configurable) before a "box down warning"​ is posted on the alert queue, and another three minutes before the box is declared down (also configurable)In summery expect 5-6 minutes before a host is declared down 
 + 
 +The configuration file ''​pping.conf''​ lets you adjust the following:​ 
 +^parameter ^description ^default | 
 +| user | the user that runs the service | navcron | 
 +| packet size |size of the icmp packet | 64 byte | 
 +| check interval | how often you want to run a ping sweep | 20 seconds | 
 +| timeout |seconds to wait for reply after last ping request is sent | 5 seconds | 
 +| nrping |number of requests without answer before marking the device as unavailable | 4 | 
 +| delay | ms between each ping request | 2 ms |
  
-pping is a daemon with its own scheduling. ​In shortpping retrieves in a configurable frequency a +In addition you can configure debug levellocation ​of log file and location ​of pid file.
-list of IP-addresses to ping (i.e. a list of all IP devices in NAV). pping returns a list +
-of ping response times, where the value "​None"​ means no answer within the configured timeout  +
-(defaults to 5 seconds).+
  
-The configuration file ''​pping.conf''​ is where you can adjust: +Note: In order to uniquely identify the icmp echo response packets ​pping needs to tailor make the packets with its own signatureThis delays the overall throughput a bit, but pping can still manage 90-100 hosts per second, which should be sufficient for most needs.
-  * ping frequency +
-  * packet size +
-  * timeout +
-  * and more+
  
-The ping sweep algorithm is explained below. 
  
-For å ta høyde for at man kan få svar fra et annet interface enn det man 
-sender pakken til, er vi nødt til å lage en skreddersydd icmp-pakke til hver 
-host. Dette gjør at utsending av pakker tar noe lenger tid enn ønskelig, men 
-vi er likevel oppe i en rate på mellom 90-100 hoster/sek. 
  
 === Algorithm - one ping sweep === === Algorithm - one ping sweep ===
  
 <​code>​ <​code>​
-Scriptet går i tre tråder+pping has three threads
-  1. Tråd genererer og sender ut pakker +  1. Thread ​generates and sends out the icmp packets. 
-  2. Tråd tar imotsjekker og lagrer pakkene +  2. Thread ​receives echo replieschecks the signature and stores the result to RRD. 
-  3. Hovedtråden står bare og venter på de andre og snurrer på en skråstrek.+  3. The main thread does the main scheduling and reports to the event queue.
  
-Tråd virker slik+Thread ​works this way
-  ​For hver host: +  ​FOR every host DO
-  1. Lag en pakke med ping-streng som inneholder: (destinationIPklokkeslett,programidentifikator+    1. Generate an icmp echo packet with: (destination IPtimestampsignature
-  2. Send ut pakken+    2. Send the icmp echo
-  3. Legg til i ”Venter på svar” +    3. Add host to the "​Waiting for response"​ queue. 
-  4. Sov i 10 millisekunder ​(hvilket gjør at tråd 1 minimalt kjører i 7.37 sekunder med 737 +    4. Sleep in the configured ''​delay''​ ms (default 2 ms). This delay will spread out the response times, which in  
-     ​hoster). Sovingen er viktig for å unngå at alle svar kommer samtidig (hvilket vil gi +       turn reduces the receive thread queue and will in effect make the measured response time more accurate.
-     køing på mottakstråden (tråd 2) og dertil unøyaktige responstidstall).+
  
-Tråd virker slik+Thread ​works this way
-  ​Så lenge tråd1 fortsatt går eller så lenge vi har noen i svarkøen: +  ​As long as thread 1 is operating and as long as we have hosts in the "​Waiting for response"​ queue, with a 
-  ​1. Se om det er kommet data til socketen. ​(timeout5 sek) +  ​timout of 5 seconds ​(configurable)
-  2. Hent data +    1. Check if we have received packets 
-  3. Pakk ut icmp-pakken +    ​2. Get the data (the icmp reply packet) 
-  4. Sjekk at pakken er til vår pid (trådsikkert,​ "​pid"​ er lagret av hovedtråden ved oppstart) +    3. Verify that the packet is to our pid.  
-  5. Split icmp-pakken i (destinationIPklokkeslettprogramidentikator+    4. Split the packet in (destination IPtimestampsignature
-     Om dette ikke gårdump pakken (ikke vår) +       If IP is wrong or signature is wrongdiscard
-  6Sjekk at identifikatoren er lik vår +    5If we recognize the IP address on the "​Waiting for response"​ queueupdate response time for the host and 
-  7Dersom ​IP-adressen er en vi har i svarkøoppdater '​pingtid' ​for hosten med time.time() – senttime +       remove the host from the "​Waiting for response"​ queue.
-  8. Fjern ifra '​venter på svar'​-listen.+
  
-Hvis vi kommer hit har vi timet ut eller så er vi ferdigeså hostene vi har igjen har ikke +When thread 2 finishes the sweep is over. If hosts are remaining on the "​Waiting for response"​ queuewe set 
-svart i tide. Vi setter pingtid ​None på disse og rapporter til ringbuffer (for grovfiltrering før +response time to "None" and increments the "​number of consecute no-reply"​ counter ​for the host.
-eventkøen). +
-</​code>​+
  
-Pingeren rapporter nede og oppemeldinger til event engine (via eventkøen)+When thread 3 detects that a host has to many no-replies a down event is posted on the event queue
-Event engine vil se nedemeldinger i sammenheng med andre nedemeldinger +</​code>​
-og bruke topologiinformasjon til å avgjøre skyggeforhold. Event engine vil +
-også ha robusthetskriterier for å avgjøre om nedetilstanden er en transient. +
-Se kap 3.6 for mer.+
  
-For å øke fleksibeliteten ønsker vi også muligheten for å grovfiltrere fra +Note that the response times are recorded to RRD which gives us response time and packet loss data as an extra bonus.
-pingeren sin side. I pinger.conf kan man angi hvor mange etterfølgende +
-pingmålinger som skal gjennomføres før nedemelding sendes til eventkøen. +
-Dersom denne verdien for eksempel settes til 4, så vil først 4 etterfølgende +
-”None”-svar kvalifisere til nedemelding. Oppemelding vil fortsatt kun kreve +
-et ikke-None svar.+
  
-Pingeren får naturlig nok masse responstid- og pakketapsdata. Vi har i 
-prosjektet sett på en måte å ta vare på disse dataene. Det er gjort ved at vi 
-leverer dataene til RRD. Vi har to målesett, ett for responstid, og ett for 
-pakketap. Nøyaktigheten av målingene blir en funksjon av pingfrekvensen. 
-Vi ser at det er noe divergerende behov ved å pinge for å stadfeste om en 
-enhet er nede versus å pinge for å få gode responstids- og pakketapsdata. Vi 
-tror UNINETT sin MPING-løsning er mer optimal i forhold til gode 
-statistiske verdier (med poisson fordelt pinging m.m.). Like fullt tror vi at 
-pping RRD-verdiene kan gi gode nok data til at driftspersonell får nyttige 
-indisier på trege enheter, store pakketap og andre fenomener som registreres 
-over tid. 
-Vi mangler fortsatt å strukturere alle RRD-målingene inn mot Cricket. Vi 
-ser dette i sammenheng med en ny mulig overbygning over RRD som 
-potensielt kan supplere Cricket. Dette blir en mulig aktivitet i 2003 (se 
-kapittel 4.1.4). 
  
 ===== servicemon ===== ===== servicemon =====
Line 338: Line 263:
 ^ Log file             | servicemon.log ​  ​| ​ ^ Log file             | servicemon.log ​  ​| ​
 ^ Programming language | Python | ^ Programming language | Python |
-^ Lines of code        | See pping above, shared code base | +^ Further doc          | See the [[servicemon]] page and/​or ​[[http://nav.uninett.no/​static/​reports/​NAVMore.pdf|NAVMore report ch 3.5]] (Norwegian) |
-^ Further doc          | [[http://metanav.uninett.no/​static/​reports/​NAVMore.pdf|NAVMore report ch 3.5]] (Norwegian) |+
  
 ==== Details ==== ==== Details ====
Line 361: Line 285:
 ^ Log file             | thresholdMon.log ​  ​| ​ ^ Log file             | thresholdMon.log ​  ​| ​
 ^ Programming language | Python | ^ Programming language | Python |
-^ Lines of code        | Approx 400 | 
 ^ Further doc          | See [[ThresholdMonitor]] | ^ Further doc          | See [[ThresholdMonitor]] |
  
Line 368: Line 291:
  
   * See [[ThresholdMonitor]]   * See [[ThresholdMonitor]]
- 
-===== moduleMon ===== 
- 
- 
-==== Key information ==== 
-^ Process name         | getDeviceData data plugin moduleMon | 
-^ Alias                | The module monitor | 
-^ Polls network ​       | Yes | 
-^ Brief description ​   | A plugin to gDD. A dedicated OID is polled. If this is a HP switch, a specific HP OID is used (oidkey hpStackStatsMemberOperStatus),​ similarly for 3Com (oidkey 3cIfMauType). For other equipment the genereric moduleMon OID is used. For 3com and HP the OID actually tells us if a module is down or not. For the generic test we (in lack of something better) check if an arbitrary ifindex on the module in question responds. If the module has no ports, no check is done.  | 
-^ Depends upon         | The switch or router to be processed by gDD with apropriate data in module and gwport/​swport. | 
-^ Updates tables ​      | posts moduleMon events on the eventq. Sets in addition the boolean module.up value. | 
-^ Run mode             | daemon, a part of gDD. | 
-^ Default scheduling ​  | Depends on the defaultfreq of the moduleMon OID (equivalently for the HP and 3com OIDs) Defaults to 1 hour. | 
-^ Config file          | see gDD |  
-^ Log file             | see gDD   ​| ​ 
-^ Programming language | Java | 
-^ Lines of code        | Part of gDD, see gDD. | 
-^ Further doc          | Not much. | 
  
  
Line 405: Line 310:
 ^ Log file             | eventEngine.log ​  ​| ​ ^ Log file             | eventEngine.log ​  ​| ​
 ^ Programming language | Java | ^ Programming language | Java |
-^ Lines of code        | Approx 3000 lines | +^ Further doc          | [[http://nav.uninett.no/​static/​reports/​NAVMore.pdf|NAVMore report ch 3.6]] (Norwegian). Updates in [[http://nav.uninett.no/​static/​reports/​tigaNAV.pdf|tigaNAV report ch 4.3.1]]. |
-^ Further doc          | [[http://metanav.uninett.no/​static/​reports/​NAVMore.pdf|NAVMore report ch 3.6]] (Norwegian). Updates in [[http://metanav.uninett.no/​static/​reports/​tigaNAV.pdf|tigaNAV report ch 4.3.1]]. |+
  
 ==== Details ==== ==== Details ====
Line 413: Line 317:
  
 ===== maintengine ===== ===== maintengine =====
 +
 +
  
  
Line 419: Line 325:
 ^ Alias                | The maintenance engine | ^ Alias                | The maintenance engine |
 ^ Polls network ​       | No | ^ Polls network ​       | No |
-^ Brief description ​   | Checks the defines ​maintenance schedules. If start or end of a maintenance period occurs at this run time, the relevant maintenanceEvents are posted on the eventq, one for each netbox/​module ​and/or service in question. | +^ Brief description ​   | Checks the defined ​maintenance schedules. If start or end of a maintenance period occurs at this run time, the relevant maintenanceEvents are posted on the eventq, one for each netbox and/or service in question. | 
-^ Depends upon         | NAV users must set up maintenance schedule which in turn is stored in the maintenance tables (emotdmaintenance,​ emotd_related). | +^ Depends upon         | NAV users must set up maintenance schedule which in turn is stored in the maintenance tables (maint_taskmaint_component). | 
-^ Updates tables ​      | Posts maintenance events on the eventq. Also updates the maintenance.state. |+^ Updates tables ​      | Posts maintenance events on the eventq. Also updates the maint_task.state. |
 ^ Run mode             | cron | ^ Run mode             | cron |
 ^ Default scheduling ​  | Every 5 minutes ( */5 * * * * )| ^ Default scheduling ​  | Every 5 minutes ( */5 * * * * )|
Line 427: Line 333:
 ^ Log file             | maintengine.log ​  ​| ​ ^ Log file             | maintengine.log ​  ​| ​
 ^ Programming language | Python | ^ Programming language | Python |
-^ Lines of code        | Approx 300 | +^ Further doc          | Old doc: [[http://nav.uninett.no/​static/​reports/​tigaNAV.pdf|tigaNAV report ch 8]]. The maintenance system was rewritten for NAV 3.1. See [[devel:​tasklist2006#​t3rewrite_the_message_and_maintenance_tool|here]] for more. |
-^ Further doc          | [[http://metanav.uninett.no/​static/​reports/​tigaNAV.pdf|tigaNAV report ch 8]].|+
  
 ==== Details ==== ==== Details ====
Line 448: Line 353:
 ^ Config file          | alertengine.cfg |  ^ Config file          | alertengine.cfg | 
 ^ Log file             | alertengine.log og alertengine.err.log ​  ​| ​ ^ Log file             | alertengine.log og alertengine.err.log ​  ​| ​
-^ Programming language | perl | +^ Programming language | Python ​
-^ Lines of code        | Approx 1900 +^ Further doc          | [[http://nav.uninett.no/​static/​reports/​NAVMore.pdf|NAVMore report ch 3.7 and 3.8]] (Norwegian). |
-^ Further doc          | [[http://metanav.uninett.no/​static/​reports/​NAVMore.pdf|NAVMore report ch 3.7 and 3.8]] (Norwegian). |+
  
 ==== Details ==== ==== Details ====
Line 457: Line 361:
  
 ===== smsd ===== ===== smsd =====
 +
 +
  
  
Line 463: Line 369:
 ^ Alias                | The SMS daemon | ^ Alias                | The SMS daemon |
 ^ Polls network ​       | No | ^ Polls network ​       | No |
-^ Brief description ​   | Checks the sms queue for new messages, formats the messages into one SMS and dispatches it via one or more dispatchers with a general interface. Support for multiple dispatchers are handled by a dispatcher handler layer. | +^ Brief description ​   | Checks the navprofiles.smsq table for new messages, formats the messages into one SMS and dispatches it via one or more dispatchers with a general interface. Support for multiple dispatchers are handled by a dispatcher handler layer. | 
-^ Depends upon         | alertEngine fills the smsq |+^ Depends upon         | alertEngine fills the navprofiles.smsq table |
 ^ Updates tables ​      | Updates the sent and timesent values of navprofiles.smsq | ^ Updates tables ​      | Updates the sent and timesent values of navprofiles.smsq |
 ^ Run mode             | Daemon process | ^ Run mode             | Daemon process |
 ^ Default scheduling ​  | Polls the sms queue every x minutes | ^ Default scheduling ​  | Polls the sms queue every x minutes |
 ^ Config file          | smsd.conf |  ^ Config file          | smsd.conf | 
-^ Log file             | smsd.log ​  ​|  +^ Log file             | smsd.log |  
-^ Programming language | Python ​ in NAV 3.2 (perl in 3.1) | +^ Programming language | Python (Perl in 3.1) | 
-Lines of code        ​In NAV 3.2approx 1200 | +Further doc          ​subsystem/​smsd/​README in the NAV sources describes the available dispatchers and more | 
-Further doc          ​|+ 
 + 
 + 
 + 
 +==== Details ==== 
 + 
 + 
 +=== Usage === 
 + 
 +As described when given the ''​-''''​–help''​ argument: 
 + 
 +  Usage: smsd [-h] [-c] [-d sec] [-t phone no.
 +   
 +    -h, --help ​           Show this help text 
 +    -c, --cancel ​         Cancel (mark as ignored) all unsent messages 
 +    -d, --delay ​          Set delay (in seconds) between queue checks 
 +    -t, --test ​           Send a test message to <phone no.> 
 + 
 +Especially note the ''​-''''​-test''​ option, which is useful for debugging when experiencing problems with smsd. 
 + 
 + 
 +=== Configuration === 
 + 
 +The configuration file smsd.conf lets you configure the following: 
 + 
 +parameter ​ ^ description ​                                    ^ default ​  ^ 
 +username ​  System user the process should try to run as    | navcron ​  | 
 +| delay      | Delay in seconds between queue runs             | 30        | 
 +| autocancel | Automatically cancel all messages older than '​autocancel',​ 0 to disable. Format like the PostgreSQL interval type, e.g. '1 day 12 hours'​. | 0 | 
 +| loglevel ​  | Filter level for log messages. Valid options are DEBUG, INFO, WARNING, ERROR, CRITICAL | INFO | 
 +| mailwarnlevel | Filter level for log messages sent by mail.  | ERROR     | 
 +| mailserver | Mail server to send log messages via.           | localhost | 
 +| dispatcherretry | Time, in seconds, before a dispatcher is retried after a failure | 300 | 
 +| dispatcherN | Dispatchers in prioritized order. Cheapest first, safest last. N should be 1,2,3,... | dispatcher1 defaults to GammuDispatcher | 
 + 
 +In addition, some dispatchers need extra configuration as described in comments in the config file.
  
  
-===== The snmptrapd =====+===== snmptrapd =====
  
  
Line 493: Line 434:
 ^ Log file             | snmptrapd.log and snmptraps.log ​ |  ^ Log file             | snmptrapd.log and snmptraps.log ​ | 
 ^ Programming language | Python ​  | ^ Programming language | Python ​  |
-^ Lines of code        | Approx 200 + traphandlers | 
 ^ Further doc          | - | ^ Further doc          | - |
  
Line 510: Line 450:
 ^ Polls network ​       | No | ^ Polls network ​       | No |
 ^ Brief description ​   | | ^ Brief description ​   | |
-^ Depends upon         | That gDD has filled the gwport, swport tables (and more...) |+^ Depends upon         | That ipdevpoll ​has filled the gwport, swport tables (and more...) |
 ^ Updates tables ​      | The RRD database (rrd_file and rrd_datasource) | ^ Updates tables ​      | The RRD database (rrd_file and rrd_datasource) |
 ^ Run mode             | cron | ^ Run mode             | cron |
Line 516: Line 456:
 ^ Config file          | None |  ^ Config file          | None | 
 ^ Log file             | cricket-changelog ​  ​| ​ ^ Log file             | cricket-changelog ​  ​| ​
-^ Programming language | perl | +^ Programming language | python ​|
-^ Lines of code        | Approx 1600 |+
 ^ Further doc          | [[howtoconfigurecricket|How to configure Cricket addons in NAV v3]] | ^ Further doc          | [[howtoconfigurecricket|How to configure Cricket addons in NAV v3]] |
  
Line 538: Line 477:
 ^ Log file             | cricket/​giga.log og cricket/​normal.log ​  ​| ​ ^ Log file             | cricket/​giga.log og cricket/​normal.log ​  ​| ​
 ^ Programming language | not relevant | ^ Programming language | not relevant |
-^ Lines of code        | not relevant | 
 ^ Further doc          | not relevant | ^ Further doc          | not relevant |
  
Line 557: Line 495:
 ^ Log file             | ?   ​| ​ ^ Log file             | ?   ​| ​
 ^ Programming language | Perl | ^ Programming language | Perl |
-^ Lines of code        | Approx 200 | 
 ^ Further doc          | - | ^ Further doc          | - |
  
Line 583: Line 520:
 ^ Log file             | None   ​| ​ ^ Log file             | None   ​| ​
 ^ Programming language | Python | ^ Programming language | Python |
-^ Lines of code        | Approx 350 | +^ Further doc          | [[http://nav.uninett.no/​static/​reports/​NAVMore.pdf|NAVMore report ch 2.4]] (Norwegian). |
-^ Further doc          | [[http://metanav.uninett.no/​static/​reports/​NAVMore.pdf|NAVMore report ch 2.4]] (Norwegian). |+
  
 ==== Details ==== ==== Details ====
Line 605: Line 541:
 ^ Default scheduling ​  | | ^ Default scheduling ​  | |
 ^ Programming language | | ^ Programming language | |
-^ Lines of code        | | 
 ^ Further doc          | [[Arnold|Arnold]] | ^ Further doc          | [[Arnold|Arnold]] |
  
backendprocesses.1191939915.txt.gz · Last modified: 2007/10/09 16:25 by faltin