## documentation## preface
Besides that,
The basic working principle when using ## file formats
There are different file formats that *raw file format*-
Each line contains a character string and a decimal value, separated by whitespace characters. The decimal separator can either be a point or a comma. This is the native file format for *noise-free-cnv*.SNP_A-2131660/1/1145994 0.153 SNP_A-1967418/1/2224111 0.2446 SNP_A-1969580/1/2319424 0.0582 SNP_A-4263484/1/2543484 0.3893 SNP_A-1978185/1/2926730 -0.0206 SNP_A-4264431/1/2941694 0.5014 SNP_A-1980898/1/3084986 0.5544 SNP_A-1983139/1/3155127 -0.2811 SNP_A-4265735/1/3292731 0.4831 SNP_A-1995832/1/3695086 0.2174 SNP_A-1995893/1/3710825 -0.0473 SNP_A-1997896/1/3756100 0.3541 SNP_A-1997922/1/3756146 0.0921 SNP_A-2000230/1/4240737 0.169 SNP_A-2000332/1/4243294 -0.0256 SNP_A-4268173/1/4276892 0.1099 SNP_A-2002663/1/4371593 0.1591 SNP_A-2004169/1/4459761 -0.3124 SNP_A-2004249/1/4461025 0.5297 SNP_A-4268681/1/4461905 0.4747 SNP_A-2004332/1/4464544 -0.0218 0 0.000 10 0.174 20 0.342 30 0.500 40 0.643 50 0.766 60 0.866 70 0.940 80 0.985 90 1.0 100 0.985 110 0.940 120 0.866 130 0.766 140 0.643 150 0.500 160 0.342 170 0.174 180 0.000 190 -0.174 200 -0.342 210 -0.500 220 -0.643 230 -0.766 240 -0.866 250 -0.940 260 -0.985 270 -1.0 280 -0.985 290 -0.940 300 -0.866 310 -0.766 320 -0.643 330 -0.500 340 -0.342 350 -0.174 *PennCNV file format*-
The file format used by PennCNV. It has to contain at least the columns "Name", "Chr", "Position", ".Log R Ratio" and ".B Allele Freq" and is loaded as two sequences, one containing the LRR and one containing the BAF values. The names of the data points are composed of the "Name", "Chr" and "Position" columns. Accordingly, two sequences are needed to save a PennCNV file. Saving only works well if the sequences originate from a PennCNV file, so that *noise-free-cnv*is able to extract the chromosome and position number out of the data point's names.Name Chr Position test.Log R Ratio test.B Allele Freq SNP_A-2131660 1 1145994 0.1530 1.0000 SNP_A-1967418 1 2224111 0.2446 1.0000 SNP_A-1969580 1 2319424 0.0582 0.9662 SNP_A-4263484 1 2543484 0.3893 0.9696 SNP_A-1978185 1 2926730 -0.0206 0.0000 SNP_A-4264431 1 2941694 0.5014 0.0341 SNP_A-1980898 1 3084986 0.5544 1.0000 SNP_A-1983139 1 3155127 -0.2811 0.2422 SNP_A-4265735 1 3292731 0.4831 0.0315 SNP_A-1995832 1 3695086 0.2174 0.4495 SNP_A-1995893 1 3710825 -0.0473 0.5547 SNP_A-1997896 1 3756100 0.3541 0.4624 SNP_A-1997922 1 3756146 0.0921 0.0000 SNP_A-2000230 1 4240737 0.1690 0.2961 SNP_A-2000332 1 4243294 -0.0256 0.9852 SNP_A-4268173 1 4276892 0.1099 0.0000 SNP_A-2002663 1 4371593 0.1591 0.0694 SNP_A-2004169 1 4459761 -0.3124 0.9914 SNP_A-2004249 1 4461025 0.5297 0.9839 SNP_A-4268681 1 4461905 0.4747 0.0000 SNP_A-2004332 1 4464544 -0.0218 0.0000
## operations## single sequence*add/sub/mul/div*-
Applies a basic mathematical operation to every data point of the sequence, using the value in the entry field. *pow/root*-
Applies the pow/root function to every data point of the sequence, using the value in the entry field. Imaginary components are discarded. *blur*-
Applies a low-pass filter to the sequence. In detail, it performs a Weierstrass transform with the value in the entry field as standard deviation. *trunc/cut*-
All values with a absolute value bigger than the value in the entry field are truncated/omitted. All others are not changed. *exp/log/erf*-
The exponential/natural logarithm/Gauss error function is applied to every data point. *rank*-
For a perfect normal distribution with a standard deviation of sqrt(2), this is the same as *erf*. *abs*-
Every data point is assigned its absolute value. *avg*-
The average value of the whole sequence is assigned to every data point. *strip XY*-
Removes data points of the X and Y chromosomes.
## two sequences*add/sub/mul/div*-
Two sequneces are added/subtracted/multiplied/divided point after point. The input sequences have to be in the same order. If there are data points missing in any of the sequences, *noise-free-cnv*will skip those (and only those). *sort*-
The data points of the second sequence are sorted so that the identifiers are in the same order as in the first sequence. Data points with identifiers that are not present in both sequences are omitted.
## multiple sequences
For these operations, all input sequences have to be in the same
order. If there are data points missing in some of the sequences,
*add/mul/arithmetic/geometric*-
For every data point, the sum/product/arithmetic mean/geometric mean throughout all input sequences is computed. *median*-
For every data point, the median mean throughout all input sequences is computed. If there is an even number of sequences, the arithmetic mean of the upper and lower median is used. *deviation*-
For every data point, the standard deviation from zero throughout all input sequences is computed. If there is an even number of sequences, the arithmetic mean of the upper and lower median is used. *align*-
For every input sequence, one output sequence is generated. It contains all the points of the corresponding input sequence unchanged that are present in all input sequences.
| |||||||||||