ISP Network Routing and Switching Technologies and System Testing: OSPF troubleshooting

OSPF TROUBLESHOOTING

OSPF runs on top of IP and uses protocol number 89

OSPF doesn't use any transport protocol, such as TCP, for reliability. The protocol itself has a reliable mechanism of transportation.

Debugs in OSPF normally are not very CPU-intensive unless the problem is impacting the entire OSPF network. For example, if OSPF neighbors are not coming up, turning on debug ip ospf adj is not CPU-intensive unless 300 neighbors are having problems at the same time.

Troubleshooting OSPF neighbor relationships

Troubleshooting OSPF route advertisement

Troubleshooting OSPF route installation

Troubleshooting redistribution problems in OSPF

Troubleshooting route summarization in OSPF

Troubleshooting CPUHOG problems

Troubleshooting dial-on-demand routing (DDR) issues in OSPF

Troubleshooting SPF calculation and route flapping

Common OSPF error messages

1)Troubleshooting OSPF neighbor relationships

- OSPF neighbor relationship problems can be of any of these types:

The OSPF neighbor list is empty.

An OSPF neighbor is stuck in ATTEMPT.

An OSPF neighbor is stuck in INIT.

An OSPF neighbor is stuck in 2-WAY.

An OSPF neighbor is stuck in EXSTART/EXCHANGE.

An OSPF neighbor is stuck in LOADING.

1. Problem: The OSPF neighbor list is empty

· OSPF is not enabled on the interface.

· Layer 1/2 is down.

· The interface is defined as passive under OSPF.

- When an interface is defined as passive under router OSPF, it suppresses OSPF Hellos. This means that OSPF does not send or receive any Hellos on such interfaces. Therefore, no adjacency is formed.

- passive-interface: the command is entered so that the router cannot take part in any OSPF process on that segment. This is the case when you don't want to form any neighbor relationship on an interface but you do want to advertise that interface.

- In OSPF, a passive interface means "do not send or receive OSPF Hellos on this interface." So, making an interface passive under OSPF with the intention of preventing the router from sending any routes on that interface but receiving all the routes is wrong.

· An access list is blocking OSPF Hellos on both sides.

- OSPF sends its Hello on a multicast address of 224.0.0.5. This address should be permitted.

· A subnet number/mask has been mismatched over a broadcast link.

· The Hello/dead interval has been mismatched.

· The authentication type (plain text versus MD5) has been mismatched.

· An authentication key has been mismatched.

· An area ID has been mismatched.

· Stub/transit/NSSA area options have been mismatched.

· An OSPF adjacency exists with secondary IP addressing.

ü show ip ospf neighbor - the output displays the OSPF neighbor status

ü show ip ospf interface - to verify ospf interfaces are up/down, to verify if interface is defined as passive in the output check :(No Hellos (Passive interface))

ü debug ip ospf adj - Look out for outputs mismatch

2. An OSPF neighbor is stuck in ATTEMPT.

This problem is valid only for NMBA networks in which neighbor statements are defined. Stuck in ATTEMPT means that a router is trying to contact a neighbor by sending its Hello but hasn't received any response.

Causes:

· Misconfigured neighbor statement

· Unicast Connectivity Is Broken on NBMA, cause for this broken connectivity can be an access list is blocking the unicast.

3. An OSPF neighbor is stuck in INIT.

When a router receives an OSPF Hello from a neighbor, it sends the Hello packet by including that neighbor's router ID in the Hello packet. If it doesn't include the neighbor's router ID, the neighbor will be stuck in INIT

Causes:

· An access list on one side is blocking OSPF Hellos.

· Authentication is enabled on only one side (virtual link example).

· Hellos are getting lost on one side at Layer 2.

4. An OSPF neighbor is stuck in 2-WAY.

Cause Priority 0 Is Configured on All Routers

It is normal in broadcast media to have a 2-WAY state because not every router becomes adjacent on broadcast media. Every router enters into FULL state with the DR and the BDR.In this example, there are only two routers on Ethernet; both are configured with priority 0. Priority 0 means that this router will not take part in DR/BDR election process. This configuration is useful when there are "low-end" routers on the segment and the desire is not to make those low-end routers DRs. For this purpose, you should configure priority 0. By default, the priority is set to 1. A

router with the highest priority on a segment wins a DR election. If all priorities are kept to the default, the router with the highest router ID becomes the DR

If all the routers on an Ethernet segment are configured with priority 0, no routers on the segment will be in FULL state with any other router. This creates problems. At least one router on the segment must have a priority that is not set to 0.

Solution:

To fix this problem, remove the priority 0 command on at least one router so that router becomes a

DR and forms a FULL adjacency

5. An OSPF neighbor is stuck in EXSTART/EXCHANGE.

In this state, the router elects a master and a slave and the initial sequence number. The whole database also is exchanged during this state. If a neighbor is stuck in EXSTART/EXCHANGE for a long time, it is an indication of a problem

The most common possible causes of this problem are as follows:

· Mismatched interface MTU

Solution check output of #debug ip ospf adj

Shows o/p as: OSPF: Nbr 131.108.1.2 has larger interface MTU

· Duplicate router IDs on neighbors

· Inability to ping across with more than certain MTU size

· Broken unicast connectivity because of the following:

- Access list blocking the unicast

- NAT translating the unicast

If NAT is misconfigured, it will start translating the unicast packet coming toward it, which will break the unicast connectivity. R1 is configured with NAT. The outside inter-face of R1 is Serial 0.2, which connects to R2

When R2 sends a unicast packet to R1, R1 tries to translate that packet and R2 never receives the ping reply. The main thing to watch for is the access list in NAT. If the access list is permitting everything, this problem will occur To solve this problem, change access list 1 and permit only those IP address that require translation.The access list could be different from network to network. The whole idea is that the access list permit statement should not cover the neighbor's IP address. Include only the inside network 10.0.0.0/8 is permitted.

6. An OSPF neighbor is stuck in LOADING.

When a neighbor is stuck in the LOADING state, the local router has sent a link-state request packet to the neighbor requesting an outdated or missing LSA and is waiting for an update from its neighbor. If a neighbor doesn't reply or a neighbors' reply never reaches the local router, the router will be stuck in the LOADING state.

· The most common possible causes of this problem are as follows:

- Mismatched MTU

- Corrupted link-state request packet

o When a link-state request packet is corrupted, the neighbor discards the packet and the local router never receives the response from the neighbor. This causes the OSPF neighbor to be stuck in the LOADING state.

Link-state request packets usually become corrupted because of the following reasons:

I. A device between the neighbors, such as a switch, is corrupting the packet.

II. The sending router's packet is invalid. In this case, either the sending router's interface is bad or the error is caused by a software bug.

III. The receiving router is calculating the wrong checksum. In this case, either the receiving router's interface is bad or the error is caused by a software bug. This is the least likely cause of this error message.

Solution

Most of the time, this problem is fixed by replacing hardware. This could be a simple bad port on the

switch or a bad interface card on the sending/receiving router

2)Troubleshooting OSPF route advertisement

OSPF is a link-state protocol. When it forms neighbor relationships, it exchanges the entire link-state database with its neighbor(s).

The most common reasons for OSPF to not share the database information about a specific link are as follows:

- The OSPF neighbor is not advertising routes.

- The OSPF neighbor (ABR) is not advertising the summary route.

- The OSPF neighbor is not advertising external routes.

- The OSPF neighbor is not advertising the default route.

1. OSPF Neighbor Is Not Advertising Routes

When a neighbor doesn't advertise a route, that route will not show up in the local router's routing table. This means that the neighbor has not included the route in its database; otherwise, the local router must have received it.

The most common possible causes of this problem are as follows:

· OSPF is not enabled on the interface that is supposed to be advertised.

· The advertising interface is down.

· The secondary interface is in a different area than the primary interface.

2. OSPF Neighbor (ABR) Not Advertising the Summary Route

The ABR generates the summary LSA for one area and sends it to another area. When the ABR fails to generate the summary LSA, the areas become isolated from each other.

The most common possible causes of this problem are as follows:

· An area is configured as a totally stubby area.

· An ABR is not connected to area 0.

· A discontiguous area 0 exists.

3. OSPF Neighbor Is Not Advertising External Routes

Whenever there is a redistribution in OSPF, it generates an external LSA (Type 5) that is flooded throughout the OSPF network. External LSAs are not leaked into stub, totally stubby, and NSSA areas.

The most common possible causes of this problem are as follows:

· The area is configured as a stub or NSSA.

· The NSSA ABR is not translating Type 7 into Type 5 LSA.

4. OSPF Neighbor Not Advertising Default Routes

The most common possible causes for an OSPF router not to advertise the default route are as follows:

· The default-information originate command is missing.

· The default route is missing from the neighbor's routing table.

· A neighbor is trying to originate a default into a stub area.

· The NSSA ABR/ASBR is not originating the Type 7 default.

4) Troubleshooting OSPF Route Installation

It happens that OSPF routers have fully synchronized their databases with those of their neighbors but are not installing routes in the routing table.

After the route is in the database, there can be several reasons that the route is not installed in the database

The most common reasons for OSPF failing to install routes in the routing table are as follows:

· OSPF is not installing any routes in the routing table.

· OSPF is not installing external routes in the routing table.

1. OSPF is not installing any routes in the routing table.

This is common problem in OSPF to find routes in the database but not in the routing table.

When OSPF finds any kind of discrepancy in the database, it does not install any routes in the routing table.

· The most common possible causes of this problem are as follows:

· The network type is mismatched.

· IP addresses are flipped in dual serial-connected routers or a subnet/mask mismatch has occurred.

· One side is a numbered and the other side is an unnumbered point-to-point link.

· A distribute list is blocking the routes' installation.

4) Troubleshooting Redistribution Problems in OSPF

When a router in OSPF does the redistribution, it becomes an ASBR. The routes that are redistributed into OSPF could be directly connected routes, static routes, or dynamically learned routes from another routing protocol or another OSPF process.

5) Troubleshooting Route Summarization in OSPF

The idea is that if there are contiguous ranges of addresses, instead of advertising every network, you can form a group of contiguous networks and summarize those networks in one, two, or fewer blocks and advertise those blocks. This feature helps reduce the size of the routing table. Reducing the routing table size decreases the convergence time and increases OSPF performance. Thus, summarization needs to be configured manually on the router.

OSPF can use two types of summarization:

· Interarea summarization that can be done on the ABR

· External summarization that can be done on the ASBR

Two common problems related to summarization in OSPF are as follows:

· A router is not summarizing interarea routes.

Cause: area range Command Is Not Configured on ABR

ensure that the area range command is configured on the correct router. Area range

summarization can be done only on the ABR. In summarization, instead of originating separate LSAs for each network, the ABR originates summary LSAs to cover those ranges of addresses.

When configuring the area range command, make sure that the summarization mask is in the form of a prefix mask rather than a wildcard mask

· A router is not summarizing external routes

Cause: summary-address Command Is NotConfigured on ASBR

6) Troubleshooting CPUHOG Problems

The CPUHOG messages usually appear in two significant stages:

· Neighbor formation process

· LSA refresh process

Problem: CPUHOG Messages During Adjacency Formation—Cause: Router Is Not Running Packet-Pacing Code

Problem: CPUHOG Messages During LSA Refresh Period—Cause: Router Is Not Running LSA Group-Pacing Code

7) Troubleshooting SPF Calculation and Route Flapping

Whenever there is a change in topology, OSPF runs the SPF algorithm to compute the shortest path first tree again. Unstable links existing within the OSPF network could cause constant SPF calculation. This section discusses the problem of SPF running constantly in the network for the following reasons:

· Interface flap within the network

· Neighbor flap within the network

· Duplicate router ID

1. SPF Running Constantly—Cause: Interface Flap Within the Network

Whenever there is a link flap in an area, OSPF runs SPF. So, if a network has unstable links, it can cause constant SPF run. SPF itself is not a problem because OSPF is just adjusting the change in database through calculating SPF. The real prob-lem occurs if there are small routers in the network and a constant SPF run might cause a CPU spike in a router. A link flap is shown in Figure. Because R1 also is included in area 0, any link flap in area 0 causes all routers in area 0 to run SPF.

Determining How Often SPF Is Running use command show ip ospf and check for the output SPF algorithm executed x times

to find out which particular LSA is flapping is to turn on debug ip ospf monitor. This

debug shows exactly which LSA is flapping.

R1# debug ip ospf monitor

OSPF: Schedule SPF in area 0.0.0.0

Change in LS ID 192.168.1.129, LSA type R,

OSPF: schedule SPF: spf_time 1620348064ms wait_interval 10s

next step is to go on that router whose router LSA is flapping and check the log for any interface flap.

Actually two solutions exist in this case:

· Fix the link flap.

· Redefine the area boundaries.

Actually two solutions exist in this case:

l Fix the link flap.

l Redefine the area boundaries.

2. SPF Running Constantly—Cause: Neighbor Flap Within the Network

When a neighbor goes down, it causes a change in topology, so SPF runs

There is a way to track the neighbor changes in OSPF. Configure ospf log-adjacency-changes under router ospf to track all the neighbor changes.

router ospf 1

ospf log-adjacency-changes

When this command is configured, it saves all the neighbor state changes in the router's sys log.

3. SPF Running Constantly—Cause: Duplicate Router ID

When two routers have identical router IDs, confusion

results in the OSPF topology database, and the route keeps getting added and deleted. The most common symptom of this problem is that the LS Age field always has a small value.

This problem usually is generated by a cut and paste of a router configuration into another router. This results in two routers with identical router IDs

Common OSPF Error Messages

1)"OSPF: Could not allocate router id"

This message appears in two situations:

l No up/up interface with a valid IP address

l Not enough up interfaces with a valid IP address for multiple OSPF processes

OSPF requires a valid IP address that is up/up so that it can allocate a router ID for the OSPF

process. The IP address must be assigned on an up/up interface. If a router fails to allocate router

IDs, OSPF will not function. This problem can be corrected by using loopback addresses.

The loopback interface solution works for both situations. Just configure a loopback interface for one

process. If you are trying to run more than one process, you might need more than one loopback

interface.

2)"%OSPF-4-BADLSATYPE"

"%OSPF-4-BADLSATYPE: Invalid lsa: Bad LSA type" Type 6

Error Message

This is normal if the neighboring router is sending the multicast OSPF (MOSPF) packet. For more

information on MOSPF, refer to RFC 1584. Cisco routers do not support MOSPF, so they simply ignore

it. To get rid of these messages, simply type the following:

router ospf 1

ignore lsa mospf

If the type is something other than 6, it's probably a bug or a memory corruption error

3)"%OSPF-4-ERRRCV"

This message means that OSPF received an invalid packet.

Three common types of this message can occur:

a) Mismatch area ID

b) Bad checksum

c) OSPF not enabled on the receiving interface

a) Mismatched Area ID

This message looks like this:

%OSPF-4-ERRRCV: Received invalid packet: mismatch area ID, from backbone area must be virtual-link but not found from 170.170.3.3, Ethernet0

This means that the neighbor's interface connecting to this interface is in area 0 but that this interface is not in area 0. In this situation, the router will not form an OSPF adjacency with the neighbor that this packet comes from. This also happens if one side's virtual link is misconfigured.To avoid these messages, make sure that both sides have the same area ID by checking the network statement under OSPF in the router configuration. For example, if the link 10.10.10.0/24 between two routers should be in area 1, make sure that the network statement on both routers includes this particular link in area 1.

The network command would look like this:

router ospf 1

network 10.10.10.0 0.0.0.255 area 1

If a virtual link is configured, double-check the configuration for virtual link.

b) Bad Checksum

The message looks like this:

%OSPF-4-ERRRCV: Received invalid packet: Bad Checksum from 144.100.21.141, TokenRing0/0

This means that OSPF encountered an error in a packet that was received. This is because the OSPF checksum does not match the OSPF packet that was received by this router.

This problem has three causes:

1. A device between the neighbors, such as a switch, is corrupting the packet.

2. The sending router's packet is invalid. In this case, either the sending router's interface is bad or a software bug is causing the error.

3. The receiving router is calculating the wrong checksum. In this case, either the receiving router's interface is bad or a software bug is causing the error. This is the least likely cause of this error message.

This problem can be difficult to troubleshoot, but you can start with the following solution, which is effective in 90 percent of cases. It's important that you follow the steps in order:

Step 1. Change the cable between the routers. For the example given in this section, this

would be the router that is sending the bad packet (144.100.21.141) and the router that is

complaining about these bad packets.

Step 2. If Step 1 doesn't fix the problem, use a different port on the switch between the

routers.

Step 3. If Step 2 doesn't fix the problem, connect the routers directly using a cross-over

cable. If you receive no further messages, the switch most likely is corrupting the packet.

If none of these steps solves the problem, contact the Cisco Technical Assistance Center (TAC) and work with an engineer to look for a bug in Cisco IOS Software or to obtain a possible Return Material Authorization (RMA) for partial or full parts replacement.

c) OSPF Not Enabled on the Receiving Interface

The message looks like this:

%OSPF-4-ERRRCV: Received invalid packet: OSPF not enabled on interface from

141.108.16.4, Serial0.100

The router generating this message received a packet from 141.108.16.4 on Serial0.100, but OSPF isnot enabled on the Serial0.100 interface. This message is generated only once for a non-OSPF interface.

View/Debug Commands

show ip ospf interface
show ip ospf database
show ip ospf database network (lsa type 2)
show ip ospf database router (lsa type 1)
show ip ospf database summary (lsa type 3)
show ip ospf database asbr-summary (lsa type 4)
show ip ospf database external (lsa type 5)
show ip ospf database nssa-external (lsa type 7)
show ip ospf virtual-links
show ip ospf border-routers
show ip ospf statistics
debug ip ospf hello
debug ip ospf adj

ISP Network Routing and Switching Technologies and System Testing

Saturday, October 24, 2015

OSPF troubleshooting

1 comment:

Blog Archive