logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

xtract - NCBI Entrez Direct XML conversion and transformation tool

Description

xtract converts an XML document into a table of data values according to user-specified rules.

Name

       xtract - NCBI Entrez Direct XML conversion and transformation tool

Notes

       String constraints use case-insensitive comparisons.

       Numeric constraints and selection arguments use integer values.

       -num and -len selections are synonyms for Object Count (#) and Item Length (%).

       -words, -pairs, and -indices convert to lower case.

Options

ProcessingFlags-strict
              Remove HTML and MathML tags.

       -mixed Allow mixed content XML.

       -self  Allow detection of empty self-closing tags.

       -accent
              Delete Unicode accents and diacritical marks.

       -ascii Convert Unicode to numeric HTML character entities.

       -compress
              Compress runs of spaces.

       -stops Retain stop words in selected phrases.

   DataSource-inputfilename
              Read XML from file instead of standard input.

       -transformfilename
              File of substitutions for -translate.

       -aliasesfilename
              Mappings file for -classify operation.

   ExplorationArgumentHierarchy-patternexpr-groupexpr-blockexpr-subsetexpr
              Name of record within set.  Use of different argument names allows command-line control of  nested
              looping.

   PathNavigation-pathpath
              Explore by list of adjacent object names.

   ExplorationConstructs
       Object         DateRevised
       Parent/Child   Book/AuthorList
       Path           MedlineCitation/Article/Journal/JournalIssue/PubDate
       Heterogeneous  "PubmedArticleSet/*"
       Exhaustive     "History/**"
       Nested         "*/Taxon"ConditionalExecution-ifexpr [constraint]
              Element (or @attribute) must exist and satisfy any specified constraint.

       -unlessexpr [constraint]
              Skip if element matches.

       -andcondition
              Preceding and following tests must both pass.

       -orcondition
              Any passing test suffices.

       -else  Execute if conditional test failed.

       -positionposfirst/last/outer/inner/even/odd/all.

   StringConstraints-equalsstr
              String must match exactly.

       -containsstr
              Substring must be present.

       -includesstr
              Substring must match at word boundaries.

       -is-withinstr
              String must be present.

       -starts-withstr
              Substring must be at beginning.

       -ends-withstr
              Substring must be at end.

       -is-notstr
              String must not match.

       -is-beforestr
              First string < second string.

       -is-afterstr
              First string > second string.

       -matchesstr
              Matches without commas or semicolons.

       -resemblesstr
              Requires all words, but in any order.

   ObjectConstraints-is-equal-toexpr
              Object values must match.

       -differs-fromexpr
              Object values must differ.

   NumericConstraints-gtN  Greater than.

       -geN  Greater than or equal to.

       -ltN  Less than to.

       -leN  Less than or equal to.

       -eqN  Equal to.

       -neN  Not equal to.

   FormatCustomization-retstr
              Override line break between patterns.

       -tabstr
              Replace tab character between fields.

       -sepstr
              Separator between group members.

       -pfxstr
              Prefix to print before group.

       -sfxstr
              Suffix to print after group.

       -rst   Reset -sep through -elg.

       -clr   Clear queued tab separator.

       -pfcstr
              Preface combines -clr and -pfx.

       -deqstr
              Delete and replace queued tab separator.

       -defstr
              Default placeholder for missing fields.

       -lblstr
              Insert arbitrary text.

   XMLGeneration-settag
              XML tag for entire set.

       -rectag
              XML tag for each record.

       -wrptag
              Wrap elements in XML object.

       -enctag
              Encase instance in XML object.

       -plgstr
              Prologue to print before instance.

       -elgstr
              Epilogue to print after instance.

       -pkgtag
              Package subset in XML object.

       -fwdstr
              Foreword to print before subset.

       -awdstr
              Afterword to print after subset.

   TagandAttributeConstruction-tagtag
              Start with <tag.

       -attkeyvalue
              Attribute key and value.

       -cls   Close with >.

       -slf   Self-close with />.

       -endtag
              End contents with </tag>.

   ElementSelection-elementelement
              Print all items that match tag name.

       -firstelement
              Only print value of first item.

       -lastelement
              Only print value of last item.

       -backwardelement
              Print values in reverse order.

       -NAME  Record value in named variable.

       --STATS
              Accumulate values into variable.

   -elementConstructs
       Tag            Caption
       Group          Initials,LastName
       Parent/Child   MedlineCitation/PMID
       Recursive      "**/Gene-commentary_accession"
       Unrestricted   PubDate/*
       Attribute      DescriptorName@MajorTopicYN
       Range          MedlineDate[1:4]
       Substring      "Title[phospholipase|rattlesnake]"
       Object Count   "#Author"
       Item Length    "%Title"
       Element Depth  "^PMID"
       Variable       "&NAME"Special-elementOperations
       Parent Index   "+"
       Object Name    "?"
       Object Value   "~"
       XML Subtree    "*"
       Children       "$"
       Attributes     "@"
       ASN.1 Record   "."
       JSON Record    "%"NumericProcessing-numelement
              Count.

       -lenelement
              Length.

       -sumelement
              Sum.

       -accelement
              Accumulator.

       -minelement
              Minimum.

       -maxelement
              Maximum.

       -incelement
              Increment.

       -decelement
              Decrement.

       -subelement
              Difference.

       -avgelement
              Average.

       -develement
              Deviation.

       -medelement
              Median.

       -mulelement
              Product.

       -divelement
              Quotient.

       -modelement
              Remainder.

       -binelement
              Binary.

       -octelement
              Octal.

       -hexelement
              Hexadecimal.

       -bitelement
              Bit count.

       -padelement
              Zero-pad to eight digits.

   CharacterProcessing-encodeelement
              XML-encode <, >, &, ", and ' characters.

       -upperelement
              Convert text to uppercase.

       -lowerelement
              Convert text to lowercase.

       -chainelement
              Change spaces to underscores.

       -titleelement
              Capitalize initial letters of words.

       -mirrorelement
              Reverse order of letters.

       -alnumelement
              Non-alphanumeric characters to space.

   StringProcessing-basicelement
              Convert superscripts and subscripts.

       -plainelement
              Remove embedded mixed-content markup tags.

       -simpleelement
              Normalize accented letters; spell Greek letters.

       -authorelement
              Multi-step author cleanup.

       -proseelement
              Text conversion to ASCII.

   TextProcessing-termselement
              Partition text at spaces.

       -wordselement
              Split at punctuation marks.

       -pairselement
              Adjacent informative words.

       -orderelement
              Rearrange words in sorted order.

       -reverseelement
              Reverse words in string.

       -letterselement
              Separate individual letters.

       -clauseselement
              Break at phrase separators.

   CitationFunctions-yearelement
              Extract first 4-digit year from string.

       -monthelement
              Match first month name and return a corresponding integer.

       -dateelementYYYY/MM/DD from -unit"PubDate"-date"*"-pageelement
              Get digits (and letters) of first page number.

       -authelement
              Change GenBank authors to Medline form.

       -initialselement
              Parse initials from forename or given name.

       -jourelement
              Clean up journal name punctuation.

       -trimelement
              Remove extra spaces and leading zeros.

       -wctelement
              Count number of -words in a string.

       -doielement
              Add https://doi.org/ prefix, URL encode.

   ValueTransformation-translateelement
              Substitute values with -transform table.

       -classifyelement
              Substring word or phrase matches to -aliases table.

   RegularExpression-replace
              Substitute text using regular expressions.
              -regtarget    Target expression.
              -exppattern   Replacement pattern.

   SequenceProcessing-revcomp
              Reverse complement nucleotide sequence.

       -nucleic
              Subrange determines forward or revcomp.

       -fasta Split sequence into blocks of 70 uppercase letters.

       -ncbi2na
              Expand ncbi2na to IUPAC.  (May need to truncate result to actual sequence length.)

       -ncbi4na
              Expand ncbi4na to IUPAC.  (May need to truncate result to actual sequence length.)

       -molwt Calculate molecular weight of peptide.

   SequenceCoordinates-0-basedelement
              Zero-based.

       -1-basedelement
              One-based.

       -ucsc-basedelement
              Half-open.

   CommandGenerator-insdarg ...
              Generate  INSDSeq  extraction  commands.  Print them if invoked standalone; run them if invoked as
              part of a pipeline.  Requires one or more arguments, which may appear in the following order:

              Descriptor(s)  INSDSeq_sequence/INSDSeq_definition/INSDSeq_division/... [...]

              Completeness   complete/partial

              Feature(s)     CDS/mRNA/...[,...]

              Qualifier(s)   INSDFeature_key/"#INSDInterval"/gene/product/feat_location/sub_sequence/... [...]

   FrequencyTable-histogram
              Collects data for sort-uniq-count(1) on entire set of records.

   EntrezIndexing-e2index [extras]
              Create Entrez index XML.  extras (true or false; false by default) indicates whether to index  ex‐
              tra fields.

       -indiceselement
              Index normalized words.

       -articleelement
              Title positional index.

       -abstractelement
              Abstract positional index.

       -paragraphelement
              Index text paragraphs.

       -stemmedelement
              Apply Porter2 algorithm.

   OutputOrganization-headstr
              Print before everything else.

       -tailstr
              Print after everything else.

       -hdstr
              Print before each record.

       -tlstr
              Print after each record.

   RecordSelection-selectcondition
              Select record subset by conditions.

       -infilename
              File of identifiers to use for selection.

   RecordRearrangement-sort[-fwd] element
              Element to use as sort key.

       -sort-revelement
              Sort records in reverse order.

   Reformatting-formatfmtcopy     Fast block copy (still applies processing flags).
              compact  Compress runs of spaces.
              flush    Suppress line indentation.
              indent   Indent according to nesting depth.
              expand   Place each attribute on a separate line.

   Validation-verify
              Report XML data integrity problems.

   Summary-outline
              Display outline of XML structure.

       -synopsis
              Display individual XML paths.

       -contour [delimiter]
              Display XML paths to leaf nodes (delimited by / by default).

   FullExplorationCommandPrecedence-pattern-path-division-group-branch-block-section-subset-unitDocumentation-help  Print usage information and some example argument combinations.

       -examples
              Complete usage examples, involving additional Entrez Direct tools.

       -unix  Illustrate common Unix command arguments.

       -version
              Print version number.

See Also

archive-pmc(1),  archive-pubmed(1),  custom-index(1), disambiguate-nucleotides(1), download-ncbi-data(1),
       ds2pme(1), esample(1), fetch-pmc(1), fetch-pubmed(1), find-in-gene(1),  fuse-segments(1),  gene2range(1),
       hgvs2spdi(1),   index-extras(1),   index-pubmed(1),   pma2pme(1),   rchive(1),  snp2hgvs(1),  snp2tbl(1),
       sort-uniq-count(1),  spdi2tbl(1),  tbl2prod(1),  transmute(1),  uniq-table(1),  xml2fsa(1),   xml2tbl(1),
       xy-plot(1).

NCBI                                               2023-03-31                                          XTRACT(1)

Synopsis

xtract  [-help]  [-strict]  [-mixed]  [-self]  [-accent]  [-ascii] [-compress] [-stops] [-inputfilename]
       [-transformfilename]  [-aliasesfilename]  [-patternexpr]  [-groupexpr]  [-blockexpr]  [-subsetexpr]
       [-pathpath] [-ifexpr [constraint]] [-unlessexpr [constraint]] [-andcondition] [-orcondition] [-else]
       [-positionpos]   [-equalsstr]   [-containsstr]   [-includesstr]  [-is-withinstr]  [-starts-withstr]
       [-ends-withstr]   [-is-notstr]   [-is-beforestr]   [-is-afterstr]   [-matchesstr]   [-resemblesstr]
       [-is-equal-toexpr]  [-differs-fromexpr]  [-gtN]  [-geN]  [-ltN]  [-leN]  [-eqN] [-neN] [-retstr]
       [-tabstr] [-sepstr] [-pfxstr] [-sfxstr] [-rst] [-clr]  [-pfcstr]  [-deqstr]  [-defstr]  [-lblstr]
       [-settag]  [-rectag]  [-wrptag]  [-enctag]  [-plgstr]  [-elgstr]  [-pkgtag]  [-fwdstr] [-awdstr]
       [-tagtag] [-attkeyvalue] [-cls] [-slf] [-endtag] [-elementelement] [-firstelement]  [-lastelement]
       [-backwardelement]   [-NAME]   [--STATS]  [-numelement]  [-lenelement]  [-sumelement]  [-accelement]
       [-minelement] [-maxelement] [-incelement] [-decelement] [-subelement] [-avgelement]  [-develement]
       [-medelement]  [-mulelement] [-divelement] [-modelement] [-binelement] [-octelement] [-hexelement]
       [-bitelement]  [-padelement]  [-encodeelement]  [-upperelement]   [-lowerelement]   [-chainelement]
       [-titleelement]  [-mirrorelement]  [-alnumelement] [-basicelement] [-plainelement] [-simpleelement]
       [-authorelement] [-proseelement] [-termselement]  [-wordselement]  [-pairselement]  [-orderelement]
       [-reverseelement] [-letterselement] [-clauseselement] [-yearelement] [-monthelement] [-dateelement]
       [-pageelement]   [-authelement]   [-initialselement]  [-jourelement]  [-trimelement]  [-wctelement]
       [-doielement]   [-translateelement]   [-classifyelement]   [-replace-regtarget-expreplacement]
       [-revcomp]  [-nucleic]  [-fasta]  [-ncbi2na]  [-ncbi4na]  [-molwt]  [-0-basedelement] [-1-basedelement]
       [-ucsc-basedelement]     [-insdarg ...]     [-histogram]     [-e2index [extras]]     [-indiceselement]
       [-articleelement]  [-abstractelement]  [-paragraphelement]  [-stemmedelement] [-headstr] [-tailstr]
       [-hdstr]  [-tlstr]   [-selectcondition]   [-infilename]   [-sort[-fwd] element]   [-sort-revelement]
       [-formatfmt   [-unicodestyle]]  [-verify]  [-outline]  [-synopsis]  [-contour [delimiter]]  [-examples]
       [-unix] [-version]

See Also