- Add an ORCONN_BW event to Tor to emit read/write info and also queue sizes - See tordiffs/orconn-bw.diff but it probably should be a separate event, not hacked onto ORCONN - Use nodemon.py to rank nodes based on total bytes, queue sizes, and the ratio of these two - Does it agree with results from metatroller's bandwidth stats? - More NodeRestrictions/PathRestrictions in TorCtl/PathSupport.py - BwWeightedGenerator - NodeRestrictions: - Uptime/LongLivedPorts (Does/should hibernation count?) - Published/Updated - GeoIP (http://www.maxmind.com/app/python) - NodeCountry - PathRestrictions: - Family - GeoIP (http://www.maxmind.com/app/python) - OceanPhobicRestrictor (avoids Pacific Ocean or two atlantic crossings) or ContinentRestrictor (avoids doing more than N continent crossings) - OceanPhilicRestrictor or ContinentJumperRestrictor - Can be used as counterpoint to see how bad for performance it is - EchelonPhobicRestrictor - Does not cross international boundaries for client->Entry or Exit->destination hops - Perform statistical analysis on paths - How often does Tor choose foolish paths normally? - (4 atlantic/pacific crossings) - Use speedracer to determine how much slower these paths relly are - What is the distribution for Pr(ClientLocation|MiddleNode,ExitNode) and Pr(EntryNode|MiddleNode,ExitNode) for these various path choices? - Mathematical analysis probably required because this is a large joint distribution (not GSoC) - Empirical observation possible if you limit to the top 10% of the nodes (which carry something like 90% of bandwidth anyways). - Make few million paths without actually building real circuits and tally them up in a 3D table - See PathSupport.py unit tests for some examples on this - See also: http://swiki.cc.gatech.edu:8080/ugResearch/uploads/7/ImprovingTor.pdf - You can also perform predecessor observation of this strategy empirically. But it is likely the GeoIP stuff is easier to implement and just as effective. - Create a PathWatcher that StatsHandler can extend from so people can gather stats from regular Tor usage - Use GeoIP to make a map of tor servers color coded by their reliability - Or augment an existing Tor map project with this data - Add circuit prebuilding and port history learning for keeping an optimal pool of circuits available for use - Build circuits in parallel to speed up scanning - Rewrite soat.pl in python - Improve SSL cert handling/verification. openssl client is broken. - The way we store certs is lame. No need to store so many copies for diff IPs if they are all the same. - Also verify STARTTLS is not molested on smtp, pop and imap ports - Means need to make sure openssl lib supports STARTTLS - Report failing nodes via SETCONF AuthDirBadExit to potentially alternate control port than used by metatroller - dynamic content scanning - tag structure fingerprinting - Optionally use same origin policy for dynamic content checks - Anything in same origin should not change? - filter out dynamic tags with multiple fetches outside of Tor? - Or just target specific tags and verify their content doesn't change - css, script, and object tags and tags that can contain script (there are a LOT of these, but we'd only need to check their attributes) - Perhaps "double check" to see if a document has changed outside of tor after a failure through tor - GeoIP-based exit node grouping to reduce geo-location false positives? - make sure all http headers match a real browser - DNS rebind attack scan - http://christ1an.blogspot.com/2007/07/dns-pinning-explained.html - Basically we want to make sure that no exit nodes resolve arbitrary domains to internal IP addresses - http://www.faqs.org/rfcs/rfc1918.html - This could be done with periodic calls to "getinfo address-mappings/cache" during scanning, or by changing metatroller to inspect STREAM NEWRESOLVE/REMAP events - Improve checking of changes to documents outside of Tor - Make a multilingual keyword list of commonly censored terms to google for using this scanner - Check Exit policy for sketchyness. Mark BadExit if they allow: - pop but not pops - imap not but imaps - telnet but not ssh - smtp but not smtps - http but not https - This also means we have to verify encrypted ports actually work and all exits will honor connections through them (in addition to checkign certs) - Support multiple scanners in metatroller - Improve interaction between soat+metatroller so soat knows which exit was responsible for a given ip/url - SYN+Reverse DNS resolve scan - This can detect exit sniffers that reverse resolve IPs. However, it is high-effort (requires someone to run reverse DNS for us), and requires keeping their IP range secret. - Design Reputation System - Emit some kind of penalty multiplier based on circuit/stream failure rate and the ratio of directory "observed" bandwidth vs avg stream bandwidth - Add keyword to directory for clients to use instead of observed bandwidth for routing decisions - Make sure scanners don't listen to this keyword to avoid "Creeping Death" - Queue lengths from the node monitor can also figure into this penalty multiplier - Figure out interface to report this and also BadExit determinations - Probably involves voting among many scanners - Justify this is worthwhile, sane, and at least as resistant as the current Tor network to attack - Does a reputation system make it easier for an adversary w/ X% of the network to influence it? - Preliminary: http://archives.seul.org/or/dev/Nov-2006/msg00004.html - Sybil attacks - What about clients that ignore the reputations? Can their behavior game the system, or are they just behaving suboptimally? - First impressions: meh; suboptimal - Does changings in ratings leak any information about clients? - Does it influence their paths in predictable ways in a greater degree than bandwidth ranking already does? - What about detecting the scan and giving better service? Time of day, source IP, exit IP? - Stopgap for bootstrapping - push traffic through the 0.1.1.x with 0 dirport and earlier servers that claim less than 20KB traffic