OSPF TROUBLESHOOTING
OSPF runs on top of IP and uses protocol
number 89
OSPF doesn't use any transport protocol,
such as TCP, for reliability. The protocol itself has a reliable mechanism of
transportation.
Debugs in OSPF normally are not very CPU-intensive unless the problem
is impacting the entire OSPF network. For example, if OSPF neighbors are not
coming up, turning on debug ip ospf adj is not CPU-intensive unless 300
neighbors are having problems at the same time.
Troubleshooting OSPF neighbor
relationships
Troubleshooting OSPF route
advertisement
Troubleshooting OSPF route
installation
Troubleshooting redistribution
problems in OSPF
Troubleshooting route summarization in
OSPF
Troubleshooting CPUHOG problems
Troubleshooting dial-on-demand routing
(DDR) issues in OSPF
Troubleshooting SPF calculation and
route flapping
Common OSPF error messages
1)Troubleshooting
OSPF neighbor relationships
-
OSPF neighbor
relationship problems can be of any of these types:
The OSPF neighbor list is empty.
An OSPF neighbor is stuck in ATTEMPT.
An OSPF neighbor is stuck in INIT.
An OSPF neighbor is stuck in 2-WAY.
An OSPF neighbor is stuck in EXSTART/EXCHANGE.
An OSPF neighbor is stuck in LOADING.
1. Problem: The
OSPF neighbor list is empty
·
OSPF is not
enabled on the interface.
·
Layer 1/2 is
down.
·
The interface
is defined as passive under OSPF.
- When an
interface is defined as passive under router OSPF, it suppresses OSPF Hellos.
This means that OSPF does not send or receive any Hellos on such interfaces.
Therefore, no adjacency is formed.
- passive-interface: the command is entered so that the router cannot
take part in any OSPF process on that segment. This is the case when you don't
want to form any neighbor relationship on an interface but you do want to
advertise that interface.
- In OSPF, a
passive interface means "do not send or receive OSPF Hellos on this
interface." So, making an interface passive under OSPF with the intention
of preventing the router from sending any routes on that interface but
receiving all the routes is wrong.
·
An access list
is blocking OSPF Hellos on both sides.
- OSPF sends its
Hello on a multicast address of 224.0.0.5. This address should be permitted.
·
A subnet
number/mask has been mismatched over a broadcast link.
·
The Hello/dead
interval has been mismatched.
·
The
authentication type (plain text versus MD5) has been mismatched.
·
An
authentication key has been mismatched.
·
An area ID has
been mismatched.
·
Stub/transit/NSSA
area options have been mismatched.
·
An OSPF
adjacency exists with secondary IP addressing.
ü show ip ospf neighbor - the output displays the OSPF neighbor status
ü show ip ospf interface - to verify ospf interfaces are up/down, to verify if interface is defined
as passive in the output check :(No Hellos
(Passive interface))
ü debug ip ospf adj - Look out for outputs mismatch
2. An OSPF
neighbor is stuck in ATTEMPT.
This problem is valid only for NMBA networks in
which neighbor statements are defined. Stuck in ATTEMPT means that a router is
trying to contact a neighbor by sending its Hello but hasn't received any
response.
Causes:
·
Misconfigured neighbor
statement
·
Unicast
Connectivity Is Broken on NBMA, cause for this broken connectivity can be an
access list is blocking the unicast.
3. An OSPF
neighbor is stuck in INIT.
When a router receives an
OSPF Hello from a neighbor, it sends the Hello packet by including that
neighbor's router ID in the Hello packet. If it doesn't include the neighbor's
router ID, the neighbor will be stuck in INIT
Causes:
·
An access list
on one side is blocking OSPF Hellos.
·
Authentication
is enabled on only one side (virtual link example).
·
Hellos are
getting lost on one side at Layer 2.
4. An OSPF
neighbor is stuck in 2-WAY.
Cause Priority 0 Is Configured on All Routers
It is normal in broadcast media to have a 2-WAY
state because not every router becomes adjacent on broadcast media. Every
router enters into FULL state with the DR and the BDR.In this example, there
are only two routers on Ethernet; both are configured with priority 0. Priority
0 means that this router will not take part in DR/BDR election process. This
configuration is useful when there are "low-end" routers on the segment
and the desire is not to make those low-end routers DRs. For this purpose, you
should configure priority 0. By default, the priority is set to 1. A
router with the highest priority on a segment
wins a DR election. If all priorities are kept to the default, the router with
the highest router ID becomes the DR
If all the routers on an
Ethernet segment are configured with priority 0, no routers on the segment will
be in FULL state with any other router. This creates problems. At least one
router on the segment must have a priority that is not set to 0.
Solution:
To fix this problem, remove
the priority 0 command on at least one router so that router becomes a
DR and forms a FULL
adjacency
5. An OSPF
neighbor is stuck in EXSTART/EXCHANGE.
In this state, the router elects a master and a
slave and the initial sequence number. The whole database also is exchanged
during this state. If a neighbor is stuck in EXSTART/EXCHANGE for a long time,
it is an indication of a problem
The most common possible causes of this problem
are as follows:
·
Mismatched
interface MTU
Solution check output of #debug ip ospf adj
Shows o/p as: OSPF: Nbr 131.108.1.2 has larger interface MTU
·
Duplicate
router IDs on neighbors
·
Inability to
ping across with more than certain MTU size
·
Broken unicast connectivity
because of the following:
- Access list blocking the unicast
- NAT translating the unicast
If NAT is misconfigured, it
will start translating the unicast packet coming toward it, which will break
the unicast connectivity. R1 is configured with NAT. The outside inter-face of
R1 is Serial 0.2, which connects to R2
When R2 sends a unicast
packet to R1, R1 tries to translate that packet and R2 never receives the ping
reply. The main thing to watch for is the access list in NAT. If the access
list is permitting everything, this problem will occur To solve this problem,
change access list 1 and permit only those IP address that require
translation.The access list could be different from network to network. The
whole idea is that the access list permit statement should not cover the
neighbor's IP address. Include only the inside network 10.0.0.0/8 is permitted.
6. An OSPF
neighbor is stuck in LOADING.
When a neighbor is stuck in the LOADING state,
the local router has sent a link-state request packet to the neighbor
requesting an outdated or missing LSA and is waiting for an update from its
neighbor. If a neighbor doesn't reply or a neighbors' reply never reaches the
local router, the router will be stuck in the LOADING state.
·
The most common
possible causes of this problem are as follows:
-
Mismatched MTU
-
Corrupted
link-state request packet
o
When a
link-state request packet is corrupted, the neighbor discards the packet and
the local router never receives the response from the neighbor. This causes the OSPF neighbor to be stuck in
the LOADING state.
Link-state request packets
usually become corrupted because of the following reasons:
I.
A device
between the neighbors, such as a switch, is corrupting the packet.
II.
The sending
router's packet is invalid. In this case, either the sending router's interface
is bad or the error is caused by a software bug.
III.
The receiving
router is calculating the wrong checksum. In this case, either the receiving
router's interface is bad or the error is caused by a software bug. This is the
least likely cause of this error message.
Solution
Most of the time, this problem is fixed by
replacing hardware. This could be a simple bad port on the
switch or a bad interface card on the
sending/receiving router
2)Troubleshooting
OSPF route advertisement
OSPF is a link-state protocol. When it forms neighbor relationships,
it exchanges the entire link-state database with its neighbor(s).
The most common reasons for OSPF to not share the database information
about a specific link are as follows:
-
The OSPF neighbor
is not advertising routes.
-
The OSPF
neighbor (ABR) is not advertising the summary route.
-
The OSPF
neighbor is not advertising external routes.
-
The OSPF
neighbor is not advertising the default route.
1.
OSPF Neighbor Is Not Advertising Routes
When a neighbor doesn't advertise a route, that route
will not show up in the local router's routing table. This means that the
neighbor has not included the route in its database; otherwise, the local
router must have received it.
The most common possible
causes of this problem are as follows:
·
OSPF is not enabled on
the interface that is supposed to be advertised.
·
The advertising interface
is down.
·
The secondary interface
is in a different area than the primary interface.
2. OSPF Neighbor (ABR) Not Advertising the Summary
Route
The ABR generates the summary LSA for one area and
sends it to another area. When the ABR fails to generate the summary LSA, the
areas become isolated from each other.
The most common possible causes of this problem are as
follows:
·
An area is configured as
a totally stubby area.
·
An ABR is not connected
to area 0.
·
A discontiguous area 0
exists.
3. OSPF Neighbor Is Not Advertising External Routes
Whenever there is a redistribution in OSPF, it
generates an external LSA (Type 5) that is flooded throughout the OSPF network.
External LSAs are not leaked into stub, totally stubby, and NSSA areas.
The most common possible causes of this problem are as
follows:
·
The area is configured as
a stub or NSSA.
·
The NSSA ABR is not
translating Type 7 into Type 5 LSA.
4. OSPF Neighbor Not Advertising Default Routes
The most common possible causes for an OSPF router not
to advertise the default route are as follows:
·
The default-information
originate command is missing.
·
The default route is
missing from the neighbor's routing table.
·
A neighbor is trying to
originate a default into a stub area.
·
The NSSA ABR/ASBR is not
originating the Type 7 default.
4) Troubleshooting OSPF Route Installation
It happens that OSPF routers have fully
synchronized their databases with those of their neighbors but are not
installing routes in the routing table.
After the route is in the database, there can be
several reasons that the route is not installed in the database
The most common reasons for OSPF failing to install
routes in the routing table are as follows:
·
OSPF is not installing
any routes in the routing table.
·
OSPF is not installing
external routes in the routing table.
1. OSPF is not installing
any routes in the routing table.
This is common problem in OSPF to find routes in the
database but not in the routing table.
When OSPF finds any kind of discrepancy in the
database, it does not install any routes in the routing table.
·
The most common possible
causes of this problem are as follows:
·
The network type is
mismatched.
·
IP addresses are flipped
in dual serial-connected routers or a subnet/mask mismatch has occurred.
·
One side is a numbered
and the other side is an unnumbered point-to-point link.
·
A distribute list is blocking
the routes' installation.
4) Troubleshooting Redistribution Problems in
OSPF
When a router in OSPF does the redistribution, it
becomes an ASBR. The routes that are redistributed into OSPF could be directly
connected routes, static routes, or dynamically learned routes from another
routing protocol or another OSPF process.
5) Troubleshooting Route Summarization in OSPF
The idea is that if there are contiguous ranges of
addresses, instead of advertising every network, you can form a group of contiguous
networks and summarize those networks in one, two, or fewer blocks and
advertise those blocks. This feature helps reduce the size of the routing
table. Reducing the routing table size decreases the convergence time and
increases OSPF performance. Thus, summarization needs to be configured manually
on the router.
OSPF can use two types of summarization:
·
Interarea summarization
that can be done on the ABR
·
External summarization
that can be done on the ASBR
Two common problems related to summarization in OSPF
are as follows:
·
A router is not
summarizing interarea routes.
Cause: area range Command Is
Not Configured on ABR
ensure that the area range command is
configured on the correct router. Area range
summarization can be done only on the ABR. In
summarization, instead of originating separate LSAs for each network, the ABR
originates summary LSAs to cover those ranges of addresses.
When configuring the area range command, make
sure that the summarization mask is in the form of a prefix mask rather than a
wildcard mask
·
A router is not
summarizing external routes
Cause: summary-address Command Is NotConfigured on ASBR
6) Troubleshooting CPUHOG Problems
The CPUHOG messages usually appear in two significant
stages:
·
Neighbor formation
process
·
LSA refresh process
Problem: CPUHOG Messages During Adjacency
Formation—Cause: Router Is Not Running Packet-Pacing Code
Problem: CPUHOG Messages During LSA Refresh
Period—Cause: Router Is Not Running LSA Group-Pacing Code
7) Troubleshooting
SPF Calculation and Route Flapping
Whenever there is a change in topology, OSPF runs the
SPF algorithm to compute the shortest path first tree again. Unstable links
existing within the OSPF network could cause constant SPF calculation. This
section discusses the problem of SPF running constantly in the network for the
following reasons:
·
Interface flap within the
network
·
Neighbor flap within the
network
·
Duplicate router ID
1. SPF
Running Constantly—Cause: Interface Flap Within the Network
Whenever there is a link flap in an area, OSPF runs
SPF. So, if a network has unstable links, it can cause constant SPF run. SPF
itself is not a problem because OSPF is just adjusting the change in database
through calculating SPF. The real prob-lem occurs if there are small routers in
the network and a constant SPF run might cause a CPU spike in a router. A link
flap is shown in Figure. Because R1 also is included in area 0, any link flap
in area 0 causes all routers in area 0 to run SPF.
Determining How Often SPF Is Running use command show ip ospf and check for the output SPF algorithm executed x times
to find out which particular LSA is flapping is to
turn on debug ip ospf monitor. This
debug shows exactly which LSA is flapping.
R1# debug ip ospf monitor
OSPF: Schedule SPF in area 0.0.0.0
Change in LS ID 192.168.1.129, LSA type R,
OSPF: schedule SPF: spf_time 1620348064ms
wait_interval 10s
next step is to go on that router whose router LSA is
flapping and check the log for any interface flap.
Actually two solutions exist in this case:
·
Fix the link flap.
·
Redefine the area
boundaries.
Actually two solutions exist in this case:
l Fix
the link flap.
l Redefine
the area boundaries.
2. SPF Running Constantly—Cause: Neighbor Flap
Within the Network
When a neighbor goes down, it causes a change in
topology, so SPF runs
There is a way to track the neighbor changes in OSPF.
Configure ospf log-adjacency-changes under router ospf to track
all the neighbor changes.
router ospf 1
ospf log-adjacency-changes
When this command is configured, it saves all the
neighbor state changes in the router's sys log.
3. SPF Running Constantly—Cause: Duplicate Router
ID
When two routers have identical router IDs, confusion
results in the OSPF topology database, and the route
keeps getting added and deleted. The most common symptom of this problem is
that the LS Age field always has a small value.
This problem usually is generated by a cut and paste
of a router configuration into another router. This results in two routers with
identical router IDs
Common OSPF Error Messages
1)"OSPF: Could not allocate router id"
This message appears in two situations:
l No
up/up interface with a valid IP address
l Not
enough up interfaces with a valid IP address for multiple OSPF processes
OSPF requires a valid IP address that is up/up so that
it can allocate a router ID for the OSPF
process. The IP address must be assigned on an up/up
interface. If a router fails to allocate router
IDs, OSPF will not function. This problem can be
corrected by using loopback addresses.
The loopback interface solution works for both
situations. Just configure a loopback interface for one
process. If you are trying to run more than one
process, you might need more than one loopback
interface.
2)"%OSPF-4-BADLSATYPE"
"%OSPF-4-BADLSATYPE: Invalid lsa: Bad LSA
type" Type 6
Error Message
This is normal if the neighboring router is sending
the multicast OSPF (MOSPF) packet. For more
information on MOSPF, refer to RFC 1584. Cisco routers
do not support MOSPF, so they simply ignore
it. To get rid of these messages, simply type the
following:
router ospf 1
ignore lsa mospf
If the type is something other than 6, it's probably a
bug or a memory corruption error
3)"%OSPF-4-ERRRCV"
This message means that OSPF received an invalid
packet.
Three common types of this message can occur:
a)
Mismatch area ID
b)
Bad checksum
c)
OSPF not enabled on the
receiving interface
a) Mismatched Area ID
This message looks like this:
%OSPF-4-ERRRCV: Received invalid packet: mismatch area
ID, from backbone area must be virtual-link but not found from 170.170.3.3,
Ethernet0
This means that the neighbor's interface connecting to
this interface is in area 0 but that this interface is not in area 0. In this situation, the
router will not form an OSPF adjacency with the neighbor that this packet comes
from. This also happens if one side's virtual link is misconfigured.To avoid
these messages, make sure that both sides have the same area ID by checking the
network statement under OSPF in the router configuration. For example,
if the link 10.10.10.0/24 between two routers should be in area 1, make sure
that the network statement on both routers includes this particular link
in area 1.
The network command would look like this:
router ospf 1
network 10.10.10.0 0.0.0.255 area 1
If a virtual link is configured, double-check the
configuration for virtual link.
b) Bad Checksum
The message looks like this:
%OSPF-4-ERRRCV: Received invalid packet: Bad Checksum
from 144.100.21.141, TokenRing0/0
This means that OSPF encountered an error in a packet
that was received. This is because the OSPF checksum does not match the OSPF
packet that was received by this router.
This problem has three causes:
1.
A device between the
neighbors, such as a switch, is corrupting the packet.
2.
The sending router's
packet is invalid. In this case, either the sending router's interface is bad
or a software bug is causing the error.
3.
The receiving router is calculating the wrong checksum. In this case,
either the receiving router's interface is bad or a software bug is causing the
error. This is the least likely cause of this error message.
This problem can be difficult to troubleshoot, but you
can start with the following solution, which is effective in 90 percent of
cases. It's important that you follow the steps in order:
Step 1. Change the cable between the routers. For the example
given in this section, this
would be the router that is sending the bad packet
(144.100.21.141) and the router that is
complaining about these bad packets.
Step 2. If Step 1 doesn't fix the problem, use a different
port on the switch between the
routers.
Step 3. If Step 2 doesn't fix the problem, connect the routers
directly using a cross-over
cable. If you receive no further messages, the switch
most likely is corrupting the packet.
If none of these steps solves the problem, contact the
Cisco Technical Assistance Center (TAC) and work with an engineer to look for a
bug in Cisco IOS Software or to obtain a possible Return Material Authorization
(RMA) for partial or full parts replacement.
c) OSPF Not Enabled on
the Receiving Interface
The message looks like this:
%OSPF-4-ERRRCV: Received invalid packet: OSPF not
enabled on interface from
141.108.16.4, Serial0.100
The router generating this message received a packet
from 141.108.16.4 on Serial0.100, but OSPF isnot enabled on the Serial0.100
interface. This message is generated only once for a non-OSPF interface.
View/Debug Commands
show ip ospf interface
show ip ospf database
show ip ospf database network (lsa type 2)
show ip ospf database router (lsa type 1)
show ip ospf database summary (lsa type 3)
show ip ospf database asbr-summary (lsa type 4)
show ip ospf database external (lsa type 5)
show ip ospf database nssa-external (lsa type 7)
show ip ospf virtual-links
show ip ospf border-routers
show ip ospf statistics
debug ip ospf hello
debug ip ospf adj