SphinxBase  0.6
cont_ad.h File Reference

Continuous A/D listening and silence filtering module. More...

#include <sphinxbase/sphinxbase_export.h>
#include <sphinxbase/prim_type.h>
#include <sphinxbase/ad.h>
#include <stdio.h>

Go to the source code of this file.

Data Structures

struct  spseg_s
 
struct  cont_ad_t
 Continuous listening module or object Continuous listening module or object. More...
 

Macros

#define CONT_AD_STATE_SIL   0
 
#define CONT_AD_STATE_SPEECH   1
 

Typedefs

typedef struct spseg_s spseg_t
 

Functions

SPHINXBASE_EXPORT cont_ad_tcont_ad_init (ad_rec_t *ad, int32(*adfunc)(ad_rec_t *ad, int16 *buf, int32 max))
 Initialize a continuous listening/silence filtering object. More...
 
SPHINXBASE_EXPORT cont_ad_tcont_ad_init_rawmode (ad_rec_t *ad, int32(*adfunc)(ad_rec_t *ad, int16 *buf, int32 max))
 Initializes a continuous listening object which simply passes data through (!) More...
 
SPHINXBASE_EXPORT int32 cont_ad_read (cont_ad_t *r, int16 *buf, int32 max)
 Read raw audio data into the silence filter. More...
 
SPHINXBASE_EXPORT int32 cont_ad_buffer_space (cont_ad_t *r)
 Get the maximum number of samples which can be passed into cont_ad_read().
 
SPHINXBASE_EXPORT int32 cont_ad_calib (cont_ad_t *cont)
 Calibrate the silence filter. More...
 
SPHINXBASE_EXPORT int32 cont_ad_calib_loop (cont_ad_t *r, int16 *buf, int32 max)
 Calibrate the silence filter without an audio device. More...
 
SPHINXBASE_EXPORT int32 cont_ad_calib_size (cont_ad_t *r)
 Get the number of samples required to calibrate the silence filter. More...
 
SPHINXBASE_EXPORT int32 cont_ad_set_thresh (cont_ad_t *cont, int32 sil, int32 sp)
 Set silence and speech threshold parameters. More...
 
SPHINXBASE_EXPORT int32 cont_ad_set_params (cont_ad_t *r, int32 delta_sil, int32 delta_speech, int32 min_noise, int32 max_noise, int32 winsize, int32 speech_onset, int32 sil_onset, int32 leader, int32 trailer, float32 adapt_rate)
 Set the changable parameters. More...
 
SPHINXBASE_EXPORT int32 cont_ad_get_params (cont_ad_t *r, int32 *delta_sil, int32 *delta_speech, int32 *min_noise, int32 *max_noise, int32 *winsize, int32 *speech_onset, int32 *sil_onset, int32 *leader, int32 *trailer, float32 *adapt_rate)
 PWP 1/14/98 – get the changable params. More...
 
SPHINXBASE_EXPORT int32 cont_ad_reset (cont_ad_t *cont)
 Reset, discarding any accumulated speech segments. More...
 
SPHINXBASE_EXPORT int32 cont_ad_close (cont_ad_t *cont)
 Close the continuous listening object.
 
SPHINXBASE_EXPORT void cont_ad_powhist_dump (FILE *fp, cont_ad_t *cont)
 Dump the power histogram. More...
 
SPHINXBASE_EXPORT int32 cont_ad_detach (cont_ad_t *c)
 Detach the given continuous listening module from the associated audio device. More...
 
SPHINXBASE_EXPORT int32 cont_ad_attach (cont_ad_t *c, ad_rec_t *a, int32(*func)(ad_rec_t *, int16 *, int32))
 Attach the continuous listening module to the given audio device/function. More...
 
SPHINXBASE_EXPORT int32 cont_ad_set_rawfp (cont_ad_t *c, FILE *fp)
 Set a file for dumping raw audio input. More...
 
SPHINXBASE_EXPORT int32 cont_ad_set_logfp (cont_ad_t *c, FILE *fp)
 Set the file to which cont_ad logs its progress. More...
 
SPHINXBASE_EXPORT int32 cont_set_thresh (cont_ad_t *r, int32 silence, int32 speech)
 Set the silence and speech thresholds. More...
 

Detailed Description

Continuous A/D listening and silence filtering module.

This module is intended to be interposed as a filter between any raw A/D source and the application to remove silence regions. Its main purpose is to remove regions of silence from the raw input speech. It is initialized with a raw A/D source function (during the cont_ad_init call). The application is responsible for setting up the A/D source, turning recording on and off as it desires. Filtered A/D data can be read by the application using the cont_ad_read function.

In other words, the application calls cont_ad_read instead of the raw A/D source function (e.g., ad_read in libad) to obtain filtered A/D data with silence regions removed. This module itself does not enforce any other structural changes to the application.

The cont_ad_read function also updates an "absolute" timestamp (see cont_ad_t.read_ts) at the end of each invocation. The timestamp indicates the total number of samples of A/D data read until this point, including data discarded as silence frames. The application is responsible for using this timestamp to make any policy decisions regarding utterance boundaries or whatever.

Definition in file cont_ad.h.

Function Documentation

SPHINXBASE_EXPORT int32 cont_ad_attach ( cont_ad_t c,
ad_rec_t a,
int32(*)(ad_rec_t *, int16 *, int32)  func 
)

Attach the continuous listening module to the given audio device/function.

(Like cont_ad_init, but without the calibration.)

Returns
0 if successful, -1 otherwise.

Definition at line 1293 of file cont_ad_base.c.

References cont_ad_t::ad, and cont_ad_t::eof.

SPHINXBASE_EXPORT int32 cont_ad_calib ( cont_ad_t cont)

Calibrate the silence filter.

Calibration to determine an initial silence threshold. This function can be called any number of times. It should be called at least once immediately after cont_ad_init. The silence threshold is also updated internally once in a while, so this function only needs to be called in the middle if there is a definite change in the recording environment. The application is responsible for making sure that the raw audio source is turned on before the calibration. Return value: 0 if successful, <0 otherwise.

Parameters
contIn: object pointer returned by cont_ad_init

Definition at line 1022 of file cont_ad_base.c.

References cont_ad_t::ad, cont_ad_t::adbuf, cont_ad_t::headfrm, cont_ad_t::n_calib_frame, cont_ad_t::n_frm, cont_ad_t::pow_hist, cont_ad_t::spf, and cont_ad_t::thresh_update.

SPHINXBASE_EXPORT int32 cont_ad_calib_loop ( cont_ad_t r,
int16 *  buf,
int32  max 
)

Calibrate the silence filter without an audio device.

If the application has not passed an audio device into the silence filter at initialisation, this routine can be used to calibrate the filter. The buf (of length max samples) should contain audio data for calibration. This data is assumed to be completely consumed. More than one call may be necessary to fully calibrate. Return value: 0 if successful, <0 on failure, >0 if calibration not complete.

Definition at line 1064 of file cont_ad_base.c.

References cont_ad_t::adbuf, cont_ad_t::headfrm, cont_ad_t::n_calib_frame, cont_ad_t::n_frm, cont_ad_t::pow_hist, cont_ad_t::spf, and cont_ad_t::thresh_update.

SPHINXBASE_EXPORT int32 cont_ad_calib_size ( cont_ad_t r)

Get the number of samples required to calibrate the silence filter.

Since, as mentioned above, the calibration data is assumed to be fully consumed, it may be desirable to "hold onto" this data in case it contains useful speech. This function returns the number of samples required to calibrate the silence filter, which is useful in allocating a buffer to store this data.

Returns
Number of samples required for successful calibration.

Definition at line 1058 of file cont_ad_base.c.

References cont_ad_t::spf.

SPHINXBASE_EXPORT int32 cont_ad_detach ( cont_ad_t c)

Detach the given continuous listening module from the associated audio device.

Returns
0 if successful, -1 otherwise.

Definition at line 1281 of file cont_ad_base.c.

References cont_ad_t::ad.

SPHINXBASE_EXPORT int32 cont_ad_get_params ( cont_ad_t r,
int32 *  delta_sil,
int32 *  delta_speech,
int32 *  min_noise,
int32 *  max_noise,
int32 *  winsize,
int32 *  speech_onset,
int32 *  sil_onset,
int32 *  leader,
int32 *  trailer,
float32 *  adapt_rate 
)

PWP 1/14/98 – get the changable params.

delta_sil, delta_speech, min_noise, and max_noise are in dB, winsize, speech_onset, sil_onset, leader and trailer are in frames of 16 ms length (256 samples @ 16kHz sampling).

Definition at line 1199 of file cont_ad_base.c.

References cont_ad_t::adapt_rate, cont_ad_t::delta_sil, cont_ad_t::delta_speech, cont_ad_t::leader, cont_ad_t::max_noise, cont_ad_t::min_noise, cont_ad_t::sil_onset, cont_ad_t::speech_onset, cont_ad_t::trailer, and cont_ad_t::winsize.

SPHINXBASE_EXPORT cont_ad_t* cont_ad_init ( ad_rec_t ad,
int32(*)(ad_rec_t *ad, int16 *buf, int32 max)  adfunc 
)

Initialize a continuous listening/silence filtering object.

One time initialization of a continuous listening/silence filtering object/module. This can work in either "stream mode", where it reads data from an audio device represented by ad_rec_t, or in "block mode", where it filters out silence regions from blocks of data passed into it.

Parameters
adAn audio device to read from, or NULL to operate in block mode.
adfuncThe function used to read audio from ad, or NULL to operate in block mode. This is usually ad_read().
Returns
A pointer to a READ-ONLY structure used in other calls to the object. If any error occurs, the return value is NULL.
Parameters
adIn: The A/D source object to be filtered
adfuncIn: adfunc = source function to be invoked to obtain raw A/D data. See ad.h for the required prototype definition.
SPHINXBASE_EXPORT cont_ad_t* cont_ad_init_rawmode ( ad_rec_t ad,
int32(*)(ad_rec_t *ad, int16 *buf, int32 max)  adfunc 
)

Initializes a continuous listening object which simply passes data through (!)

Like cont_ad_init, but put the module in raw mode; i.e., all data is passed through, unfiltered. (By special request.)

SPHINXBASE_EXPORT void cont_ad_powhist_dump ( FILE *  fp,
cont_ad_t cont 
)

Dump the power histogram.

For debugging...

Definition at line 231 of file cont_ad_base.c.

References cont_ad_t::pow_hist, cont_ad_t::spf, cont_ad_t::sps, and cont_ad_t::tot_frm.

SPHINXBASE_EXPORT int32 cont_ad_read ( cont_ad_t r,
int16 *  buf,
int32  max 
)

Read raw audio data into the silence filter.

The main read routine for reading speech/silence segmented audio data. Audio data is copied into the caller provided buffer, much like a file read routine.

In "block mode", i.e. if NULL was passed as a read function to cont_ad_init, the data in buf is taken as input, and any non-silence data is written back to buf on exit. In this case, you must take care that max does not overflow the internal buffer of the silence filter. The available number of samples can be obtained by calling cont_ad_buffer_space(). Any excess data will be discarded.

In normal mode, only speech segments are copied; silence segments are dropped. In rawmode (cont_ad module initialized using cont_ad_init_rawmode()), all data are passed through to the caller. But, in either case, any single call to cont_ad_read will never return data that crosses a speech/silence segment boundary.

The following variables are updated for use by the caller (see cont_ad_t above): cont_ad_t.state, cont_ad_t.read_ts, cont_ad_t.seglen, cont_ad_t.siglvl.

Return value: Number of samples actually read, possibly 0; <0 if EOF on A/D source.

Parameters
rIn: Object pointer returned by cont_ad_init
bufIn/Out: In block mode, contains input data. On return, buf contains A/D data returned by this function, if any.
maxIn: Maximum number of samples to be filled into buf. NOTE: max must be at least 256; otherwise the functions returns -1.

Definition at line 863 of file cont_ad_base.c.

References E_ERROR, cont_ad_t::eof, cont_ad_t::headfrm, cont_ad_t::leader, cont_ad_t::logfp, cont_ad_t::n_frm, cont_ad_t::n_other, cont_ad_t::n_sample, cont_ad_t::rawmode, cont_ad_t::read_ts, cont_ad_t::seglen, cont_ad_t::siglvl, cont_ad_t::spf, cont_ad_t::spseg_head, cont_ad_t::spseg_tail, cont_ad_t::state, cont_ad_t::tail_state, cont_ad_t::tot_frm, cont_ad_t::win_startfrm, cont_ad_t::win_validfrm, and cont_ad_t::winsize.

SPHINXBASE_EXPORT int32 cont_ad_reset ( cont_ad_t cont)

Reset, discarding any accumulated speech segments.

Returns
0 if successful, <0 otherwise.

Definition at line 1236 of file cont_ad_base.c.

References cont_ad_t::headfrm, cont_ad_t::n_frm, cont_ad_t::n_other, cont_ad_t::n_sample, cont_ad_t::spseg_head, cont_ad_t::spseg_tail, cont_ad_t::tail_state, cont_ad_t::win_startfrm, and cont_ad_t::win_validfrm.

Referenced by cont_ad_close().

SPHINXBASE_EXPORT int32 cont_ad_set_logfp ( cont_ad_t c,
FILE *  fp 
)

Set the file to which cont_ad logs its progress.

Mainly for debugging. If fp is NULL, logging is turned off.

Returns
0 if successful, -1 otherwise.

Definition at line 1360 of file cont_ad_base.c.

References cont_ad_t::logfp.

SPHINXBASE_EXPORT int32 cont_ad_set_params ( cont_ad_t r,
int32  delta_sil,
int32  delta_speech,
int32  min_noise,
int32  max_noise,
int32  winsize,
int32  speech_onset,
int32  sil_onset,
int32  leader,
int32  trailer,
float32  adapt_rate 
)

Set the changable parameters.

delta_sil, delta_speech, min_noise, and max_noise are in dB, winsize, speech_onset, sil_onset, leader and trailer are in frames of 16 ms length (256 samples @ 16kHz sampling).

Definition at line 1126 of file cont_ad_base.c.

References cont_ad_t::adapt_rate, cont_ad_t::delta_sil, cont_ad_t::delta_speech, E_ERROR, cont_ad_t::leader, cont_ad_t::max_noise, cont_ad_t::min_noise, cont_ad_t::sil_onset, cont_ad_t::speech_onset, cont_ad_t::trailer, cont_ad_t::win_validfrm, and cont_ad_t::winsize.

SPHINXBASE_EXPORT int32 cont_ad_set_rawfp ( cont_ad_t c,
FILE *  fp 
)

Set a file for dumping raw audio input.

The application can ask cont_ad to dump the raw audio input that cont_ad processes to a file. Use this function to give the FILE* to the cont_ad object. If invoked with fp == NULL, dumping is turned off. The application is responsible for opening and closing the file. If fp is non-NULL, cont_ad assumes the file pointer is valid and opened for writing.

Returns
0 if successful, -1 otherwise.

Definition at line 1346 of file cont_ad_base.c.

References cont_ad_t::rawfp.

SPHINXBASE_EXPORT int32 cont_ad_set_thresh ( cont_ad_t cont,
int32  sil,
int32  sp 
)

Set silence and speech threshold parameters.

The silence threshold is the max power level, RELATIVE to the peak background noise level, in any silence frame. Similarly, the speech threshold is the min power level, RELATIVE to the peak background noise level, in any speech frame. In general, silence threshold <= speech threshold. Increasing the thresholds (say, from the default value of 2 to 3 or 4) reduces the sensitivity to background noise, but may also increase the chances of clipping actual speech.

Returns
: 0 if successful, <0 otherwise.
Parameters
contIn: Object ptr from cont_ad_init
silIn: silence threshold (default 2)
spIn: speech threshold (default 2)

Definition at line 1100 of file cont_ad_base.c.

References cont_ad_t::delta_sil, and cont_ad_t::delta_speech.

SPHINXBASE_EXPORT int32 cont_set_thresh ( cont_ad_t r,
int32  silence,
int32  speech 
)

Set the silence and speech thresholds.

For this to remain permanently in effect, the auto_thresh field of the continuous listening module should be set to FALSE or 0. Otherwise the thresholds may be modified by the noise- level adaptation.

Definition at line 1308 of file cont_ad_base.c.

References cont_ad_t::frm_pow, cont_ad_t::n_other, cont_ad_t::tail_state, cont_ad_t::thresh_sil, cont_ad_t::thresh_speech, cont_ad_t::win_startfrm, and cont_ad_t::win_validfrm.