
- Clean text function in r driver#
- Clean text function in r code#
#' #' library( tm ) #' txt = c( "thhis s! and bonkus 4:33pm and Jan 3, 2015. #' #' tm #' #' bigcorp A tm Corpus object. Non-alpha characters converted to spaces. #' #' #' Changes multiline documents to single line.
Clean text function in r code#
# Get raw data ready to be 'summarized' by textreg C++ function # Based on Robin's version based on Luke's earlier code # 12-9-2012 #' Clean text and get it ready for textreg.
tm_gregexpr: Call gregexpr on the content of a tm Corpus. textreg-package: Sparse regression package for text that allows for multiple.
textreg: Sparse regression of labeling vector onto all phrases in a. testCorpora: Some small, fake test corpora. .files: Save corpus to text (and RData) file. agments: Sample fragments of text to contextualize a phrase. : Pretty print results of textreg regression. : Pretty print results of phrase sampling object. : Predict labeling with the selected phrases. : Plot the sequence of features as they are introduced with the. phrases: Get the phrases from the textreg.result object?. phrase.matrix: Make matrix of where phrases appear in corpus. : Calculate similarity matrix for set of phrases. Returns the specified number of characters from the start of a text string. Converts a value to text according to the specified format. Rounds a number to the specified number of decimals and returns the result as text. make_search_phrases: Convert phrases to appropriate search string. Returns the starting position of one text string within another text string. : Make a table of where phrases appear in a corpus. : Generate visualization of phrase overlap. : Generate matrix describing gradient descent path of textreg. make.CV.chart: Plot K-fold cross validation curves. : Count number of times documents have a given phrase. : Make phrase appearance matrix from textreg result. : Graphic showing multiple word lists side-by-side. is.textreg.result: Is object a textreg.result object?. is.fragment.sample: Is object a fragment.sample object?. agments: Grab all fragments in a corpus with given phrase. : Conduct permutation test on labeling to get null distribution. Usage clean.text (bigcorp) Arguments bigcorp A tm Corpus object. Description Changes multiline documents to single line. find.CV.C: K-fold cross-validation to determine optimal tuning parameter clean.text function - RDocumentation (version 0.1.5) clean.text: Clean text and get it ready for textreg. dirtyBathtub: Sample of raw-text OSHA accident summaries. Clean text function in r driver#
cpp_textreg: Driver function for the C++ function.cpp_rpus: Driver function for the C++ function.convert.tm.to.character: Convert tm corpus to vector of strings.cluster.phrases: Cluster phrases based on similarity of appearance.clean.text: Clean text and get it ready for textreg.calc.loss: Calculate total loss of model (Squared hinge loss).rpus: Build a corpus that can be used in the textreg call.bathtub: Sample of cleaned OSHA accident summaries.