Broadband-Hamnet™ Forum :: UBNT Firmware
Welcome Guest   [Register]  [Login]
 Subject :Intermittant Link Resolution - Cold boot vs. Warm boot.. 2014-08-10- 16:43:34 
AF5RS
Member
Joined: 2014-06-23- 22:21:23
Posts: 9
Location: Highland Village, TX EM13LC

We're finding the latest firmware (1.2.1 (-v2)) for the Rocket M2 shows differing link results on a known prior good link. We have confirmed link may re reappear on a warm boot, may not. But a cold boot (POE power down and back up), the link most likely reappears. It also seems to be time related, the longer the link is up, the more likely it will become unstable, a warm boot doesn't seem to help, but then on a cold boot,and  presto! back to normal. What's happening here? Kernel leak? Seems to take about a day of up-time before a cold boot is required. We're seeing this on multiple nodes with the -v2 firmware. Also, can't seem to reconcile the LQ with the S/N. Any body else seeing this?

Bob AF5RS


IP Logged
 Subject :Re:Intermittant Link Resolution - Cold boot vs. Warm boot.. 2014-08-10- 17:28:48 
K6AH
Member
Joined: 2012-03-05- 10:47:45
Posts: 181
Location: San Diego, CA
There appears to be a bug in release BBHN 1.1.2 which causes OLSR to crash periodically. It seems to be limited to larger, UBNT-based, networks, as it has not been reported on Linksys nor recreatable on UBNT networks of only a few nodes. We are looking into the issue and will likely release a patch in the coming days which includes a watchdog reset of the OLSR as an interim workaround until the problem can be better understood and a fix put in place. We are sorry for the inconvenience it may cause deployed networks, but wanted to get the word out to anyone planning such a deployment in the near term. Andre, K6AH
IP Logged
Member of:
Beta Test Team
San Diego Mesh Working Group
Running 3.0.1
 Subject :Re:Intermittant Link Resolution - Cold boot vs. Warm boot.. 2014-08-10- 17:59:40 
AF5RS
Member
Joined: 2014-06-23- 22:21:23
Posts: 9
Location: Highland Village, TX EM13LC

Andre, if it helps the effort, our networks is not large, however, the links are weak at this time due to experimentation and poor paths. I suggest it has to do with links coming in and out often which most likely is more probable in larger networks, however, it's happening on intermittent small networks. i.e. 2 to 3 nodes.

Bob AF5RS



IP Logged
 Subject :Re:Intermittant Link Resolution - Cold boot vs. Warm boot.. 2014-08-10- 18:03:44 
K6AH
Member
Joined: 2012-03-05- 10:47:45
Posts: 181
Location: San Diego, CA
Thanks Bob. All clues help. Andre, K6AH
IP Logged
Member of:
Beta Test Team
San Diego Mesh Working Group
Running 3.0.1
 Subject :Re:Intermittant Link Resolution - Cold boot vs. Warm boot.. 2014-08-10- 20:43:01 
k5wxr
Member
Joined: 2014-07-13- 12:56:54
Posts: 3
Location: North Texas
Hello Andre, Here is another clue or two... We have recreated this situation several times just between any two Rocket M2s and between a Rocket and WRT54GL only 50 feet apart with LQ 90 to 100%. Won't link up until power unplug-replug, then may fail again in less than 12 hours. Even seems to forget previous neighbors even though system clock and up time indicate no power interruption... Hope this doesn't sound like a complaint. Just wanting to help with more clues. 73, - Lee K5WXR
IP Logged
 Subject :Re:Intermittant Link Resolution - Cold boot vs. Warm boot.. 2014-08-15- 11:48:23 
AE4ML
Member
Joined: 2014-06-01- 15:17:42
Posts: 47
Location: Spotsylvania VA USA
 




I ran some tests today on my mesh. The first test was with 4 nodes up. All nodes are running the latest  version of code and all are Ubiquity Nano-Stations and Nano-Bridges.

I was able to ping from one end of the mesh to the other just fine. Then I ping the mesh gateway and I I was successful for about 30 second and then I would get a message that the network was unreachable. My internal network is 192.168.20.0/24 the mesh gateway address is 192.168.20.1 the default gateway on the internal network is 192.168.20.254.

From across the mesh network I could ping everything on the mesh including 192.168.20.1 however at time 192.168.20.254 was not reachable and I would also get a message stating the network wasn't reachable and not in the routing table. A telnet session into the mesh gateway revealed 0.0.0.0 to the gateway of 192.168.20.254 . When I telnet into the station I was connected to there was no 192.168.20.0 or 0.0.0.0 in the route statement on that mesh.

A continuous ping showed the 192.168.20.254 was there , wasn't and then the network was no in the routing. Then it would reappear about 1 - 2 minutes later but only for about 10 - 20 seconds.

 I to a trip to the remote end at lunch time and looked at the router and gateway to verify everything looked good. The permit ICMP any any  in the router ACL seem to take care of keeping the 0.0.0.0 route in place. However I really thing that an ICMP of the next hop router off of the mesh gateway would be ideal for the people that want to isolate the mesh from the internet and still connect up to the local LAN.

I took the two extra nodes out of the equation and ran a point to point test. I had the same results. up and down. I had more success with the two nodes over the 4 nodes. I could maintain a telnet session with a router at the far end and when the mesh went down and came backup I still had my same telnet session.


IP Logged
Michael Lussier
AE4ML
 Subject :Re:Intermittant Link Resolution - Cold boot vs. Warm boot.. 2014-08-15- 12:38:54 
KG6JEI
Member
Joined: 2013-12-02- 19:52:05
Posts: 516
Location

If pings to the internet fail the 0.0.0.0/0 route will be removed.

also the route had to exist if you can pig it.  You likely are not looking at ALL the routing tables. Linux computers have multiple tables.

IP Logged
Note: Most posts submitted from iPhone
 Subject :Re:Intermittant Link Resolution - Cold boot vs. Warm boot.. 2014-08-15- 13:59:34 
AE4ML
Member
Joined: 2014-06-01- 15:17:42
Posts: 47
Location: Spotsylvania VA USA
 

I agree there has to be access to the internet for ICMP to create the default route on the mesh. That is something I hope that can be changed to the next hope router ip on the local network. The mesh network is not something that I may want to have internet access but access to the local network and or limited ports to the internet.

The behavior is interesting. from the furthest mesh node I can ping the IP of the WAN interface but not the router 192.168.20.254 that is part of the same network. Yes on the router I have ip route back to the mesh network. The pings to 20.254 come up and go down on what appears to be a regular schedule. My internet connection is not that bad.

I prefer EIGRP myself.  OLSR is an interesting concept. however it appears that it doesn't keep a table of backup routes to help mitigate packet loss. The time to reconvergence  of new routes and updating the table appears to have a problem that grows exponentially as you add more nodes to the network.

We can talk tables. I dont want to get into a conflict here. Only trying to figure out what is going on. and hopefully find some clues to help fix the problem.

IP Logged
Michael Lussier
AE4ML
 Subject :Re:Intermittant Link Resolution - Cold boot vs. Warm boot.. 2014-08-15- 15:53:33 
KG6JEI
Member
Joined: 2013-12-02- 19:52:05
Posts: 516
Location

This sounds a like its not related to the OP's question if you have a good stable link through the mesh but not outside the mesh.

The fact you can ping one IP address on the remote node seems to say you have a good mesh connection and the issue is local on that side (unless you have a traceroute that shows different)

Relying on the fact that your the only 0.0.0.0/0 route to publish services is a VERY bad idea. As soon as any other user comes on the mesh and publishes a GW you can no longer be sure they will be able to reach your network.  You should add any services you want to publish into your actual network (either NAT mode with redirects OR  as hosts on a direct subnet) 

After that I can understand the filtering, just make sure everyone knows what filters are in place (and be ready to justify them to locals should one of them put up an open gateway --- remember your node will be the preferred gateway for those closer and you can disrupt others traffic) I would personaly suggest looking into transparent proxying and other methods to provide active feedback so users know what is happening and not just blocking.

As for "next hop" router that will not be likely to be put into the code by official builds. The whole point of the check is to be sure a user really does have a route to the internet as whole.  This is what the 0.0.0.0/0 route means "I can get to the internet"

As a WAN link your router shouldn't need a link back because the mesh node should be in NAT mode so not really sure that is needed (Unless your trying this in NAT mode in which case it makes sense as we don't nat the source from MESH to LAN)

Routing Tables in use on a mesh node at this time: (see ip rules  for more details in how each is configured for packet passing)

255 - Local
254 - Main
253 - Default
029 - Mesh Local Network
030 - Table of Mesh Nodes
031 - Wan GW's

OLSR keeps track of all routes to the internet.  Only one route can be active at any time, routing tables do not keep track of packet loss so this concern isn't really a big deal as the mesh node will change the route based on ETX for you when the route is recalculated.

Convergence - This may not be as bad as you think, yes it can take time to propagate, but the main thing is as long as the nodes closest to the GW figure it out  they can re-route the packet mid route.

Yes loss can still occur while network converges depending on the layout. But it may not be as bad as having to go all the way across to the other side of the network (depending upon layout) in some layouts  only a single node would need to learn about the change, in other layouts  a max of 50% would need to find out about it (A long string path of A-B-C-D-E-etc C could be the stall point)

 

IP Logged
Note: Most posts submitted from iPhone
Page # 


Powered by ccBoard


SPONSORED AD: