Control Port Filter Proxy Python (cpfpy) / anon-ws-disable-stacked-tor

I started on rewriting CPFP in python, based on the Tails script at https://git-tails.immerda.ch/tails/plain/config/chroot_local-includes/usr/local/sbin/tor-controlport-filter.

Still at an early stage, but it looks promising. I am posting early because, in the course of my tests, I have encountered many errors using a slightly modified of “tor_bootstrap_check.py”. I have added some additional exceptions handling. new exceptions handling · troubadoour/anon-shared-helper-scripts@268fa66 · GitHub.

Might be handy for debugging Tor protocol.

I will come back soon with more details on the progress.

Great!

  • As the ultimate test for the functionality of of command relaying proxy <-> ControlPort, please try to run arm through the control port (while having no filter rules in place obviously).
arm -i 127.0.0.1:9052

If that doesn’t work, the implementation is somehow buggy / not fully understood.

  • Please keep the .d style config folder /etc/controlportfilt.d. (Maybe a different config layout may be required.), I really do like the concept of stackable configs.) Unless this is really really hard in python.

  • As a bonus, wildcard support for white listed commands would be nice. Such as…
    – SETCONF HiddenServiceDir /some/dir
    – SETCONF HiddenServicePort 3543

So the “/some/dir” part can be variable while the “SETCONF HiddenServiceDir” part needs to be on the white list.

I don’t have the right to tell you how to spend your time troubadour but instead of writing it in python I found that porting the code to mksh increases security for users that enable CPFP because it the only interpreter mentioned as ever going through a security auditing process.

https://github.com/Whonix/Whonix/issues/301

I don’t agree with mksh yet, but I wouldn’t mind a port to mksh either way. Since I learned troubadour is more a python than bash/mksh coder, I’d am expecting (as in I guess so!) (not as in make it so!) a python port. Anyhow, whoever does the work, gets to decide about the implementation details.

  • Please keep in mind that both sources of data input are not to be trusted,
    – input from client applications wanting to use ControlPort (filter proxy),
    – as well es replies from Tor should better be not trusted. It’s not that Tor is inherently untrusted or that “signal newnym” could result in a complex answer… But… If later (with custom configs), CPFP gets asked to reply some info about the Tor network, it might happen that some Tor replies something that contains malicious string formed by malicious Tor relays.

  • Please keep in mind, that both sources of data input might either maliciously or due to a bug send excessively long string string lengths. Or flood attacks. Rather than doing cpu intensive parsing and hdd intensive logging them, those should be dropped if detected.

  • How to name that package you’re working on? pcpfp / PCPFP? (Python Control Port Filter Proxy) We could develop it as separate package and at some point, when it’s ready, switch the default installed package from control-port-filter (bash) to pcpfp.

@hulahoop. mksh looks good because it’s security audited and used in Android, but I would be at a loss and pretty much useless trying to write even a simple program (I do mean program, not script) in any flavor of shell scripting. So, for me, Python it is. For mksh, unless we find another contributor, Patrick is the only resource.

On the other hand, I think that Python is audited ‘de facto’ by the tens of thousands or programs running today, from small home apps to huge projects. And its flexibility cannot be matched by a shell script. I am going to release the initial version of PyCPFP (or whatever we call it) soon, and we’ll be able to discuss at length the best ways to sanitize the inputs (requests from clients and answers from the server). I also plan to add some CPFP_MODE option, such as STRICT, HIDDEN_SERVICES_ALLOWED or TRANSPARENT, to be discussed. All this can be done [relatively] easily in Python, but I guess it would be a lot longer and much more difficult in [ba][mk]sh.

Please understand that I am not advocating Python, just trying to be realistic.

Are also .d configuration folders and extensible variables possible?

Are also .d configuration folders and extensible variables possible?

Yes. How, I don’t know yet. There are a couple of options, It seems.

Troubadour I found a hardened python implementation project for high sec scenarios. The documentation is pretty deep and deals with function safety and other related topics. pysec is a project by the OWASP non-profit organization that is dedicated to the safety of web apps.

http://www.pythonsecurity.org/

Please consider writing Mandatory Access profiles for your filter when done and possibly the other one if it materializes. Something along the lines described:

It is possible for example to restrict something like perl or python with extended access controls. On OpenBSD if a user or an attacker has access to perl or python, then they can run whichever scripts they like. With extended access controls, it is possible to restrict only certain scripts to have access to an interpreter (and additionally make those scripts immutable), and prevent the interpreter from running at all unless called by those specific scripts. There is no equivalent fine grained granularity on OpenBSD.
Reference: https://allthatiswrong.wordpress.com/2010/01/20/the-insecurity-of-openbsd/

I am going to push the first version of the Python CPFP, but it might be unavailable for a while. My repositories are blocked with an error 404. I can log in though.

One of our mostly harmless robots seems to think you are not a human.
Because of that, it’s hidden your profile from the public. If you really are human, please contact support to have your profile reinstated.
We promise we won’t require DNA proof of your humanity.

Most likely banning Tor.

I have contacted support. Have you ever seen this message?

Yes. Support quickly sorted that out.

I have pushed GitHub - troubadoour/control-port-filter-python but it is still hidden to the world.

A couple of issues:
“make deb-icup” complains about it being the first packaging of a new upstream software package, and about the man page missing (because I modify /usr/bin/controlportfilt). However, the package is built and can be installed with “sudo dpkg -i control-port-filter-python_0.0-1_all.deb”. I did not take the time to look deeper in debian packaging. I am using it now.

For the moment, the CPFP configuration, including the white list, is hard coded in the script “cpfp.py”. We could get the configuration from the environment, but after reading Whonix Forum and other threads, even if Python is not bash and diffrentiates data and code, it might be a good idea to try something else.

What about the following?

  • create a configuration file.
  • create a hash file (sha1) of the configuration file (installed along the configuration file).
  • each time CPFP gets a request, it runs the hash function of the configuration file and compares it to the hash file content.
  • we make a simple GUI CPFP configuration editor. Any user change creates a new hash file.

There would be no speed penalty. I have tested Python ‘hashlib’ on big files.

"make deb-icup" complains about it being the first packaging of a new upstream software package, and about the man page missing (because I modify /usr/bin/controlportfilt). However, the package is built and can be installed with "sudo dpkg -i control-port-filter-python_0.0-1_all.deb". I did not take the time to look deeper in debian packaging. I am using it now.
I'll fix the debian stuff as soon as visible to public.
For the moment, the CPFP configuration, including the white list, is hard coded in the script "cpfp.py". We could get the configuration from the environment, but after reading https://www.whonix.org/forum/index.php/topic,568.0.html and other threads, even if Python is not bash and diffrentiates data and code, it might be a good idea to try something else.
Security would be no objection. It's not like environment variables are inherently evil. But solely relying on environment variables would be pretty non-standard, hard to document.
What about the following?
  • create a configuration file.
  • create a hash file (sha1) of the configuration file (installed along the configuration file).
  • each time CPFP gets a request, it runs the hash function of the configuration file and compares it to the hash file content.

It would be best to go the standard way. The daemon load its config once started. To obtain new settings, either restart the daemon or send signal sighup (service reload).
- we make a simple GUI CPFP configuration editor.
Configuration GUI should be an optional bonus. Not the default way. Also please keep CLI users in mind.
Any user change creates a new hash file.
That's only work in ~/. And in this case in /root somewhere. (Reminds me to change to obey xdg data dir standard some day.) Otherwise a file in /etc/ that changes would cause an interactive dpkg conflict resolution dialog each time there is an upgrade.

.d config folders are great for usability, I think. When disto’s push an upgrade, user won’t get an interactive dpkg conflict resolution dialog. And users can extend the config however they wish. Derivatives can add their configs on top without patching. Or is there any issue in python parsing them?

What about the following?
  • create a configuration file.
  • create a hash file (sha1) of the configuration file (installed along the configuration file).
  • each time CPFP gets a request, it runs the hash function of the configuration file and compares it to the hash file content.
  • we make a simple GUI CPFP configuration editor. Any user change creates a new hash file.

This sounds very similar to something I proposed sometime ago, can’t seem to find it now. So you’re comparing string hashes instead of operating on the input itself right?

The latest idea I’ve come up with for CPFP is https://github.com/Whonix/Whonix/issues/344 where iptables will be used as a first line of defence and CPFP as a backstop. I thought I’d tell you so you can integrate this somehow in your work or suggest something. The more eyes on a problem the better.

troubadour I have just finished writing a proposal that simplifies the design of the controlport filter you are writing. The first part is the relevant part to you. Please read it and the guide referenced in the two uppermost links to get the picture of the the idea I’m suggesting. I would like to hear your comments.

https://github.com/Whonix/Whonix/issues/348

The idea comes from something I have read recently, and it was probably one of your propositions. Sorry I could not find it back either.

I am comparing a file hash. When loaded, CPFP would read its configuration from a file, but first compute a hash of the file and compare it to a stored hash of the same file, and do that on every command. When the user modifies the configuration file, the hash file would be updated (the daemon would have to be restarted).

The latest idea I've come up with for CPFP is https://github.com/Whonix/Whonix/issues/344 where iptables will be used as a first line of defence and CPFP as a backstop. I thought I'd tell you so you can integrate this somehow in your work or suggest something. The more eyes on a problem the better.

The more defenses, the better, I guess. But for the time being, I am afraid it’s out of my league. I will concentrate on CPFP itself.

Not necessaroyy on every command, only when starting or restarting.

I am reposting this to avoid it getting buried and would like to know what you think. This is the latest proposal and directly concerns controlport fitler rather than the iptables layer. Its about using tcpserver in ucspi-tcp to do the filtering. The rules can be scripted in any language you prefer (Python).

[quote=“Patrick, post:13, topic:533”][quote]What about the following?

  • create a configuration file.
  • create a hash file (sha1) of the configuration file (installed along the configuration file).
  • each time CPFP gets a request, it runs the hash function of the configuration file and compares it to the hash file content.[/quote]
    It would be best to go the standard way. The daemon load its config once started. To obtain new settings, either restart the daemon or send signal sighup (service reload).[/quote]

Yes, comparing the hash on every request is useless. Modified steps:

  • create a configuration file.
  • create a hash file (sha1) of the configuration file (installed along the configuration file).
  • when CPFP is started, it runs the hash function of the configuration file and compares it to the hash file content.
Configuration GUI should be an optional bonus. Not the default way. Also please keep CLI users in mind.

If we want to compute the hash after the changes and save the string in a file, I do not see another way. There could be a warning message on top of it (generic_gui_message), in the style “Modify this file only if you know what you are doing. Do you really want to proceed?”.

[quote]Any user change creates a new hash file.[/quote] That's only work in ~/. And in this case in /root somewhere. (Reminds me to change to obey xdg data dir standard some day.) Otherwise a file in /etc/ that changes would cause an interactive dpkg conflict resolution dialog each time there is an upgrade.

I overlooked that (a daemon should not have root privileges). The files could be in “$HOME/.config/cftp/”.

.d config folders are great for usability, I think. When disto's push an upgrade, user won't get an interactive dpkg conflict resolution dialog. And users can extend the config however they wish. Derivatives can add their configs on top without patching. Or is there any issue in python parsing them?

I do agree on the usability of the .d folders. Yes, the problem is parsing.

What you are doing in bash

if [ -d /etc/controlportfilt.d ]; then
   for i in /etc/controlportfilt.d/*; do
      if [ -f "$i" ]; then
         ~~~~
         bash -n "$i"
         source "$i"
      fi

is very neat (although I do not fully understand the magic). It would be a whole different game to parse the controlportfilt.d directory in Python and make out what the real configuration is.

The configuration file I am working with:

[CONFIGURATION]
CONTROL_PORT_FILTER_LIMIT_GETINFO_NET_LISTENERS_SOCKS = True
CONTROL_PORT_FILTER_LIMIT_STRING_LENGTH = True
CONTROL_PORT_FILTER_EXCESSIVE_STRING_LENGTH = 128

[WHITE_LIST]
1 = GETINFO net/listeners/socks
2 = SIGNAL NEWNYM
3 = GETINFO status/bootstrap-phase
4 = GETINFO status/circuit-established

It is parsed in a few lines with Python configparser.

By the way, my repositories are visible. GitHub - troubadoour/control-port-filter-python

Added some minor (mostly packaging) fixed on top:

More comments later.