Setting up SaltStack is a fairly easy task. There is plenty of documentation here. This is not an install tutorial, this is an explanation and trouble shooting of what is going on with SaltStack Master and Minion communication. Mostly when using the CLI to send commands from the Master to the Minions.
After you have installed your Salt Master and your Salt Minions software the first thing to do after starting your Master is open your Minion's config file in /etc/salt/minion and fill out the line "master: " to tell the Minion where his Master is. Then start/restart your Salt Minion. Do this for all your Minions.
Go back to the Master and accept all of of the Minions keys. See here on how to do this. If you don't see a certain Minions key here are some things you should check.
Now lets say you have all of basic network and and key issues worked out and would like to send some jobs to your Minions. You can do this via the Salt CLI. Something like salt \* cmd.run 'echo HI'. This is considered a job by Salt. The Minions get this request and run the command and return the job information to the Master. The CLI talks to the Master who is listening for the return messages as they are coming in on the ZMQ bus. The CLI then reports back that status and output of the job.
That is a basic view of this process. But, sometimes Minions don't return job information. Then you ask yourself what the heck happened. You know the Minion is running fine. Eventually you find out you don't really understand Minion Master job communication at all.
By default when the job information gets returned to the Master and is stored on disk in the job cache. We will assume this is the case below.
The Salt CLI is just an small bit of code that interfaces with the API SaltStack has written that allows anyone to send commands to the Minions programmatically. The CLI is not connected directly to the Minions when the job request is made. When the CLI makes a job request, is handed to the Master to fulfill.
There are 2 key timeout periods you need be aware of before we go into a explanation of how a job request is handled. They are "timeout" and "gather_job_timeout".
When the CLI command is issued, the Master gathers a list of Minions with valid keys so it knows which Minions are on the system. It validates and filters the targeting information from the given target list and sets that as its list (targets) of Minions for the job. Now the Master has a list of who should return information when queried. The Master takes the requested command, target list, job id, and a few pieces of info, and broadcasts a message on the ZeroMQ bus to all of Minions. When all Minions get the message, they look at the target list and decide if they should execute the job or not. If the Minion sees he is in the target list he executes the job. If a Minion sees he is not part of the target list, he just ignores the message. The Minion that decided to run the command creates an local job id for the job and then performs the work.
While the Minions are working their jobs the CLI is waiting for the first initial timeout period (-t or timeout:) to start. When that hits, the CLI sends sends the first "find_job" query. This kicks off the gather_job_timeout timer. The Minions receive the the first find_job request with the original job_id. If they are still running the job, the Minion responds to "find job" request with a status of "still working" or "Job Finished". If a Minion does not respond to the request within the gather_job_timeout time period (10 secs), the CLI marks the Minion as "non responsive" for the polling interval. All Minions will keep being queried on the gather_job_timeout interval. If the Minions do not reply within this timeout, or all report that they are no longer running the job in question, the CLI command will return. If one of more minions replies that they are still running the job, the initial timeout is triggered again and the cycle repeats.
The CLI will show the output from the Minions as they finish their jobs. For the Minions that did not respond, but are connected to the Master, you will see the message "Minion did not return". If a Minion does not even look like it has a TCP connection with the Master, you will see "Minion did not return. [Not connected]".
By this time the Master should have marked the job as finished. The jobs info should now be available in the job cache. The above explanation is a high level explanation of how Master and Minions communicate. There are more details to this process than the above info, but this should give you a basic idea of how it works.
Sometimes you know there are Minions up and working, but you get "Minion did not return" or you did not see any info from the Minion at all before the CLI timed out. It is frustrating, as you can send the same Minion that just failed a job and it finishes it with no problem. There can be many reasons for this. Try/check the following things.