noise-free-cnv

home

examples

documentation

downloads

sourceforge

contact

documentation

preface
file formats
- raw file format
- PennCNV file format
operations
- single sequence
  - add/sub/mul/div
  - pow/root
  - blur
  - trunc/cut
  - exp/log/erf
  - rank
  - abs
  - avg
  - sorting
  - strip XY
- two sequences
  - add/sub/mul/div
  - sort
- multiple sequences

preface

noise-free-cnv is a program for analyzing and manipulating DNA microarray data (for example from Affymetrix or Illumina platforms).

Besides that, noise-free-cnv can be used to work with sequences of numerical values of any type, that are in a meaningful order and that consist of data points that are, ideally, distinguishable by names. When several sequences with corresponding data points (values with the same name) are available, it is possible to compare them to find similarities and differences.

The basic working principle when using noise-free-cnv is to load sequences of data from files, manipulate them and then either examine the result in noise-free-cnv to find irregularities or to save it for further use with other programs.

file formats

There are different file formats that noise-free-cnv can handle. All of them are text file formats that contain one data point per line. If single lines are invalid, they are ignored.

raw file format

Each line contains a character string and a decimal value, separated by whitespace characters. The decimal separator can either be a point or a comma.

This is the native file format for noise-free-cnv.

SNP_A-2131660/1/1145994  0.153
SNP_A-1967418/1/2224111  0.2446
SNP_A-1969580/1/2319424  0.0582
SNP_A-4263484/1/2543484  0.3893
SNP_A-1978185/1/2926730  -0.0206
SNP_A-4264431/1/2941694  0.5014
SNP_A-1980898/1/3084986  0.5544
SNP_A-1983139/1/3155127  -0.2811
SNP_A-4265735/1/3292731  0.4831
SNP_A-1995832/1/3695086  0.2174
SNP_A-1995893/1/3710825  -0.0473
SNP_A-1997896/1/3756100  0.3541
SNP_A-1997922/1/3756146  0.0921
SNP_A-2000230/1/4240737  0.169
SNP_A-2000332/1/4243294  -0.0256
SNP_A-4268173/1/4276892  0.1099
SNP_A-2002663/1/4371593  0.1591
SNP_A-2004169/1/4459761  -0.3124
SNP_A-2004249/1/4461025  0.5297
SNP_A-4268681/1/4461905  0.4747
SNP_A-2004332/1/4464544  -0.0218

download example file

0  0.000
10  0.174
20  0.342
30  0.500
40  0.643
50  0.766
60  0.866
70  0.940
80  0.985
90  1.0
100  0.985
110  0.940
120  0.866
130  0.766
140  0.643
150  0.500
160  0.342
170  0.174
180  0.000
190  -0.174
200  -0.342
210  -0.500
220  -0.643
230  -0.766
240  -0.866
250  -0.940
260  -0.985
270  -1.0
280  -0.985
290  -0.940
300  -0.866
310  -0.766
320  -0.643
330  -0.500
340  -0.342
350  -0.174

download example file

PennCNV file format

The file format used by PennCNV. It has to contain at least the columns "Name", "Chr", "Position", ".Log R Ratio" and ".B Allele Freq" and is loaded as two sequences, one containing the LRR and one containing the BAF values. The names of the data points are composed of the "Name", "Chr" and "Position" columns.

Accordingly, two sequences are needed to save a PennCNV file. Saving only works well if the sequences originate from a PennCNV file, so that noise-free-cnv is able to extract the chromosome and position number out of the data point's names.

Name	Chr	Position	test.Log R Ratio	test.B Allele Freq
SNP_A-2131660	1	1145994	0.1530	1.0000
SNP_A-1967418	1	2224111	0.2446	1.0000
SNP_A-1969580	1	2319424	0.0582	0.9662
SNP_A-4263484	1	2543484	0.3893	0.9696
SNP_A-1978185	1	2926730	-0.0206	0.0000
SNP_A-4264431	1	2941694	0.5014	0.0341
SNP_A-1980898	1	3084986	0.5544	1.0000
SNP_A-1983139	1	3155127	-0.2811	0.2422
SNP_A-4265735	1	3292731	0.4831	0.0315
SNP_A-1995832	1	3695086	0.2174	0.4495
SNP_A-1995893	1	3710825	-0.0473	0.5547
SNP_A-1997896	1	3756100	0.3541	0.4624
SNP_A-1997922	1	3756146	0.0921	0.0000
SNP_A-2000230	1	4240737	0.1690	0.2961
SNP_A-2000332	1	4243294	-0.0256	0.9852
SNP_A-4268173	1	4276892	0.1099	0.0000
SNP_A-2002663	1	4371593	0.1591	0.0694
SNP_A-2004169	1	4459761	-0.3124	0.9914
SNP_A-2004249	1	4461025	0.5297	0.9839
SNP_A-4268681	1	4461905	0.4747	0.0000
SNP_A-2004332	1	4464544	-0.0218	0.0000

download example file

operations

single sequence

add/sub/mul/div: Applies a basic mathematical operation to every data point of the sequence, using the value in the entry field.
pow/root: Applies the pow/root function to every data point of the sequence, using the value in the entry field. Imaginary components are discarded.
blur: Applies a low-pass filter to the sequence. In detail, it performs a Weierstrass transform with the value in the entry field as standard deviation.
trunc/cut: All values with a absolute value bigger than the value in the entry field are truncated/omitted. All others are not changed.
exp/log/erf: The exponential/natural logarithm/Gauss error function is applied to every data point.
rank: For a perfect normal distribution with a standard deviation of sqrt(2), this is the same as erf.
abs: Every data point is assigned its absolute value.
avg: The average value of the whole sequence is assigned to every data point.
strip XY: Removes data points of the X and Y chromosomes.

two sequences

add/sub/mul/div: Two sequneces are added/subtracted/multiplied/divided point after point. The input sequences have to be in the same order. If there are data points missing in any of the sequences, noise-free-cnv will skip those (and only those).
sort: The data points of the second sequence are sorted so that the identifiers are in the same order as in the first sequence. Data points with identifiers that are not present in both sequences are omitted.

multiple sequences

For these operations, all input sequences have to be in the same order. If there are data points missing in some of the sequences, noise-free-cnv will skip those (and only those).

add/mul/arithmetic/geometric: For every data point, the sum/product/arithmetic mean/geometric mean throughout all input sequences is computed.
median: For every data point, the median mean throughout all input sequences is computed. If there is an even number of sequences, the arithmetic mean of the upper and lower median is used.
deviation: For every data point, the standard deviation from zero throughout all input sequences is computed. If there is an even number of sequences, the arithmetic mean of the upper and lower median is used.
align: For every input sequence, one output sequence is generated. It contains all the points of the corresponding input sequence unchanged that are present in all input sequences.