logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

Statistical Mapping Utility for China's Administrative Divisions in Stata

This utility provides specialized Stata commands and datasets for generating detailed cartographic representations of China's provincial boundaries. It incorporates essential geographic aids, such as scale bars, orientation markers, and significant demarcation lines like the Qinling-Huaihe line and the Hu Huanyong line. As a scientific instrument, it facilitates the visualization of empirical data, supporting both discrete classifications and continuous variable distributions across administrative units. Advanced features include overlaying point data and creating composite charts like proportional symbols directly onto the map base.

Author

Statistical Mapping Utility for China's Administrative Divisions in Stata logo

agan2021

GNU General Public License v3.0

Quick Info

GitHub GitHub Stars 0
NPM Weekly Downloads 0
Tools 1
Last Updated 2026-02-19

Tags

chartsstatamapsmaps stataprovincial mapscharts maps

Introduction

This package extends Stata's capabilities for creating sophisticated geographic visualizations pertinent to scientific research endeavors. Researchers frequently require accurate spatial representations of empirical data for comprehensive analysis. This tool allows for the precise plotting of boundary data, alongside several critical national demarcation lines. This functionality supports the visualization of spatial patterns across China's provincial divisions, which is vital for many forms of geographical and social science inquiry.

Core Enhancements Summary

Recent revisions to this dataset and associated mapping procedures introduced several significant improvements over prior versions. Key updates include:

  1. The north arrow orientation has been standardized to vertical alignment.
  2. The dataset now incorporates boundaries corresponding to undelineated territories.
  3. Coastal outlines have been integrated into the base map layers.
  4. The Qinling-Huaihe demarcation line is now included.
  5. The Hu Huanyong (Population) line has been added for reference.
  6. An option for generating English-language map labels is now available.
  7. The positional referencing for the scale bar and north arrow components is now adjustable.

To demonstrate the utility of these advanced features, several practical application examples follow.

Demonstrations of Cartographic Rendering

2019 PRC Provincial Demarcation Visualization

This segment illustrates how to produce choropleth maps utilizing categorical variables. For instance, if the source dataset (chinaprov40_db.dta) contains a nominal variable, the encode command should be employed to generate a numeric factor suitable for color mapping:

* Rendering maps based on discrete variables
use chinaprov40_db.dta, clear 
encode 类型, gen(type)
codebook type

grmap type using chinaprov40_coord.dta, ///
  id(ID) osize(vvthin ...) ocolor(white ...) ///
  clmethod(custom) clbreaks(0 1 2 3 4 5) ///
  fcolor("254 212 57" "253 116 70" "138 145 151" "213 228 162" "210 175 129") ///
  leg(order(2 "不统计" 3 "特别行政区" 4 "直辖市" 5 "省" 6 "自治区" 11 "秦岭-淮河线" 14 "胡焕庸线")) ///
  graphr(margin(medium)) ///
  line(data(chinaprov40_line_coord.dta) by(group) size(vvthin *1 *0.5 *1.2 *0.5 *0.5 *1.2) pattern(solid ...) ///
    color(white /// Provincial boundary color
        black /// National boundary color
        "0 85 170" /// Coastline color
        "24 188 156" /// Qinling-Huaihe line color
        black /// Inset frame grid color
        black /// Scale bar and north arrow color
        "227 26 28" /// Hu Huanyong line color
        )) ///
  polygon(data(polygon) fcolor(black) ///
    osize(vvthin)) ///
  label(data(chinaprov40_label) x(X) y(Y) label(cname) length(20) size(*0.8)) ///
  ti("使用 Stata 绘制 2019 年中国省级行政区划") ///
  subti("绘制:微信公众号 RStata") ///
  caption("版本:使用 Stata 绘制中国省级地图数据包 4.0", size(*0.8))
gr export pic1.png, replace width(1200)

The line() option specifies seven distinct sizes and colors, corresponding sequentially to provincial borders, national borders, coastlines, the Qinling-Huaihe line, internal map frame grids, the scale bar/orientation marker, and the Hu Huanyong line.

Color palette selection guidance is available at: https://tidyfriday.cn/colors

English Language Map Output

Labeling content and positioning are governed by the coordinates in the chinaprov40_label.dta file. To generate a map version using English text, reference the ename variable:

grmap type using chinaprov40_coord.dta, ///
  id(ID) osize(vvthin ...) ocolor(white ...) ///
  clmethod(custom) clbreaks(0 1 2 3 4 5) ///
  fcolor("254 212 57" "253 116 70" "138 145 151" "213 228 162" "210 175 129") ///
  leg(order(2 "Not within the scope of statistics" 3 "Special administrative region" 4 "Municipality directly under" "the Central Government" 5 "Province" 6 "Autonomous Region" 11 "Qinling Huaihe River Line" 14 "Hu Huanyong line")) ///
  graphr(margin(medium)) ///
  line(data(chinaprov40_line_coord.dta) by(group) size(vvthin *1 *0.5 *1.2 *0.5 *0.5 *1.2) pattern(solid ...) ///
    color(white /// Provincial boundary color
        black /// National boundary color
        "0 85 170" /// Coastline color
        "24 188 156" /// Qinling-Huaihe line color
        black /// Inset frame grid color
        black /// Scale bar and north arrow color
        "227 26 28" /// Hu Huanyong line color
        )) ///
  polygon(data(polygon) fcolor(black) ///
    osize(vvthin)) ///
  label(data(chinaprov40_label) x(X) y(Y) label(ename) length(20) size(*0.6)) ///
  ti("Using Stata to draw China's provincial map in 2019") ///
  subti("WeChat Subscription: RStata") ///
  caption("Version: 4.0", size(*0.8))
gr export pic2.png, replace width(1200)

2020 Gross Regional Product (GRP) Mapping

Continuous variable visualization parallels the discrete method; appropriate binning and color assignment yield informative graphics. The following example details mapping the 2020 Gross Regional Product for Chinese provinces:

import delimited using "2020年中国各省市地区生产总值.csv", clear encoding(utf8)
gen prov = substr(省份, 1, 6)
save 2020年中国各省市地区生产总值, replace 

use chinaprov40_db.dta, clear 
gen prov = substr(省, 1, 6)
merge 1:1 prov using 2020年中国各省市地区生产总值
replace 地区生产总值 = -1 if missing(地区生产总值)
grmap 地区生产总值 using chinaprov40_coord.dta, ///
  id(ID) osize(vvthin ...) ocolor(white ...) ///
  clmethod(custom) clbreaks(-1 0 20000 40000 60000 80000 120000) ///
  fcolor(gray "224 242 241" "178 223 219" "128 203 196" "77 182 172" "38 166 154") ///
  leg(order(2 "无数据" 3 "< 2 万亿元" 4 "2~4 万亿元" 5 "4~6 万亿元" 6 "6~8 万亿元" 7 "> 8 万亿元")) ///
  graphr(margin(medium)) ///
  line(data(chinaprov40_line_coord.dta) ///
    /// Exclude Qinling-Huaihe (4) and Hu Huanyong (7) lines
    select(keep if inlist(group, 1, 2, 3, 5, 6)) ///
    by(group) size(vvthin *1 *0.5 *0.5 *0.5) ///
    pattern(solid ...) ///
    color(white /// Provincial boundary color
        black /// National boundary color
        "0 85 170" /// Coastline color
        black /// Inset frame grid color
        black /// Scale bar and north arrow color
        )) ///
  polygon(data(polygon) fcolor(black) ///
    osize(vvthin)) ///
  label(data(chinaprov40_label) x(X) y(Y) label(cname) length(20) size(*0.8)) ///
  ti("2020 年中国各省市地区生产总值") ///
  subti("绘制:微信公众号 RStata") ///
  caption("数据来源:各地统计局", size(*0.8))
gr export pic3.png, replace width(1200)

2013 Industrial Distribution and Distance from Qinling-Huaihe Line

Beyond choropleth visualizations, point mapping—the addition of coordinate points onto the map—is frequently required. The following example visualizes the location of industrial enterprises in 2013 relative to their distance from the Qinling-Huaihe line.

Because the base map data has undergone projection transformation, any latitude/longitude points plotted onto it must also undergo a corresponding projection transformation. I developed a web application to assist with this transformation: https://czxb.shinyapps.io/crs-trans/.

Alternatively, users with experience in the R programming language can employ the provided 坐标转换.R script for coordinate conversion.

* Coordinate system transformation
* Method one: https://czxb.shinyapps.io/crs-trans/
* Note: Uploaded CSV files must contain numeric lon and lat variables; maximum observations are around 100,000, and concurrent usage is discouraged.
use gq2013sample, clear 
keep 经度 纬度
ren 经度 lon
ren 纬度 lat
export delimited using "待转换.csv", replace 

* Method two: Use the attached R script for conversion

* Processing transformed data
import delimited using "转换后的数据.csv", clear 
gen id = _n
save 转换后的数据, replace 

use gq2013sample, clear 
gen id = _n
merge 1:1 id using 转换后的数据
drop _m id *度
encode 北方或南方, gen(north)
save pointdata, replace 

use chinaprov40_db.dta, clear
spmap using chinaprov40_coord.dta, id(ID) ///
  ocolor("black" ...) osize(vvthin ...) ///
    line(data(chinaprov40_line_coord.dta) ///
    /// Exclude Hu Huanyong line (7)
    select(keep if inlist(group, 1, 2, 3, 4, 5, 6)) ///
    by(group) size(vvthin *1 *0.5 *1.5 *0.5 *0.5) ///
    pattern(solid ...) ///
    color(white /// Provincial boundary color
        black /// National boundary color
        "0 85 170" /// Coastline color
        "0 85 170" /// Qinling-Huaihe line color
        black /// Inset frame grid color
        black /// Scale bar and north arrow color
        )) ///
  polygon(data(polygon) fcolor(black) ///
    osize(vvthin)) ///
  label(data(chinaprov40_label) x(X) y(Y) label(cname) length(20) size(*0.8)) ///
    point(data(pointdata) by(north) ///
      fcolor("227 26 28%30" "24 188 156%30") ///
        x(x) y(y) ///
        proportional(与秦岭淮河线的距离) ///
        size(*0.1) legenda(on)) ///
    leg(order(7 "秦岭-淮河线" 10 "北方工企业" 11 "南方工企业")) ///
    ti("2013 年中国工业企业与秦岭-淮河线的距离", color(black)) /// 
    subti("绘制:微信公众号 RStata") ///
    graphr(margin(medium)) ///
    caption("数据来源:2013 年中国工业企业数据库,使用高德地图地理编码接口解析经纬度", size(*0.8))
gr export pic4.png, replace width(1200)

2019 GRP and Industrial Structure

It is also possible to display maps overlaid with pie charts; the locations for these charts can be derived from the existing label coordinates.

use 各省历年GDP, clear 
drop if 省份 == "中国"

replace 地区生产总值_亿元 = 地区生产总值_亿元 / 1000
merge m:m 省代码 using chinaprov40_db.dta
replace 地区生产总值_亿元 = -1 if missing(年份)
grmap 地区生产总值_亿元 if 年份 == 2019 | missing(年份) ///
  using chinaprov40_coord.dta, id(ID) ///
  clmethod(custom) clbreaks(-1 0 40 60 80 100 120) /// 
  fcolor("gray" "237 248 233" "199 233 192" "161 217 155" "116 196 118" "49 163 84") ///
  ocolor("gray" ...) ///
  ti("2019 年中国各省地区生产总值 & 产业结构", size(*1.1)) ///
  subtitle("数据来源:CSMAR经济金融数据库") ///
  graphr(margin(medium)) ///
  osize(vvthin ...) ///
  legend(size(*1.1) ///
    order(2 "无数据" 3 "< 40千亿" ///
      4 "40~60千亿" 5 "60~80千亿" ///
      6 "80~100千亿" 7 "> 100千亿" ///
      14 "第一产业" 15 "第二产业" 16 "第三产业")) ///
  caption("绘制:微信公众号 RStata", size(*0.8)) ///
  line(data(chinaprov40_line_coord.dta) ///
    /// Exclude Qinling-Huaihe (4) and Hu Huanyong (7) lines
    select(keep if inlist(group, 1, 2, 3, 5, 6)) ///
    by(group) size(vvthin *1 *0.5 *0.5 *0.5) ///
    pattern(solid ...) ///
    color(white /// Provincial boundary color
        black /// National boundary color
        "0 85 170" /// Coastline color
        black /// Inset frame grid color
        black /// Scale bar and north arrow color
        )) ///
  polygon(data(polygon) fcolor(black) ///
    osize(vvthin)) ///
  label(data(chinaprov40_label) x(X) y(Y) label(cname) length(20) size(*0.8)) ///
  diagram(data(piedata) x(X) y(Y) v(第一产业占GDP比重_百分比 第二产业占GDP比重_百分比 第三产业占GDP比重_百分比) ///
    type(pie) legenda(on) os(vvthin) ///
      size(1.5) fc("102 194 165" "252 141 98" "229 196 148") ///
      oc("102 194 165" "252 141 98" "229 196 148"))

gr export "pic5.png", replace width(1200)

Alternatively, pie charts can be substituted with framed rectangle charts:

grmap 地区生产总值_亿元 if 年份 == 2019 | missing(年份) ///
  using chinaprov40_coord.dta, id(ID) ///
  clmethod(custom) clbreaks(-1 0 40 60 80 100 120) /// 
  fcolor("gray" "237 248 233" "199 233 192" "161 217 155" "116 196 118" "49 163 84") ///
  ocolor("gray" ...) ///
  ti("2019 年中国各省地区生产总值 & 第一产业比重", size(*1.1)) ///
  subtitle("数据来源:CSMAR经济金融数据库") ///
  graphr(margin(medium)) ///
  osize(vvthin ...) ///
  legend(size(*1.1) ///
    order(2 "无数据" 3 "< 40千亿" ///
      4 "40~60千亿" 5 "60~80千亿" ///
      6 "80~100千亿" 7 "> 100千亿" ///
      15 "第一产业比重")) ///
  caption("绘制:微信公众号 RStata", size(*0.8)) ///
  line(data(chinaprov40_line_coord.dta) ///
    /// Exclude Qinling-Huaihe (4) and Hu Huanyong (7) lines
    select(keep if inlist(group, 1, 2, 3, 5, 6)) ///
    by(group) size(vvthin *1 *0.5 *0.5 *0.5) ///
    pattern(solid ...) ///
    color(white /// Provincial boundary color
        black /// National boundary color
        "0 85 170" /// Coastline color
        black /// Inset frame grid color
        black /// Scale bar and north arrow color
        )) ///
  polygon(data(polygon) fcolor(black) ///
    osize(vvthin)) ///
  diagram(data(piedata) x(X) y(Y) v(第一产业占GDP比重_百分比) ///
    type(frect) legenda(on) os(vvthin) ///
      size(1.5) fc("252 141 98") ///
      oc("252 141 98") refsize(none))
gr export "pic6.png", replace width(1200)

Provincial Population Density

Finally, we demonstrate the application of the Hu Huanyong line:

use 中国人口空间分布省级面板数据集.dta, clear 
ren 省份 省
merge m:1 省 using chinaprov40_db.dta
keep if 年份 == 2015 | missing(年份)
replace 均值 = -1 if missing(均值)
grmap 均值 using chinaprov40_coord.dta, ///
  id(ID) osize(vvthin ...) ocolor(white ...) ///
  clmethod(custom) clbreaks(-1 0 100 1000 2000 3000 4000) ///
  fcolor(gray "224 242 241" "178 223 219" "128 203 196" "77 182 172" "38 166 154") ///
  leg(order(2 "无数据" 3 "< 100 人/平方公里" 4 "100~1000 人/平方公里" 5 "1000~2000 人/平方公里" 6 "2000~3000 人/平方公里" 7 "> 3000 人/平方公里" 14 "胡焕庸线")) ///
  graphr(margin(medium)) ///
  line(data(chinaprov40_line_coord.dta) ///
    /// Exclude Qinling-Huaihe line (4)
    select(keep if inlist(group, 1, 2, 3, 5, 6, 7)) ///
    by(group) size(vvthin *1 *0.5 *0.5 *0.5 *1.2) ///
    pattern(solid ...) ///
    color(white /// Provincial boundary color
        black /// National boundary color
        "0 85 170" /// Coastline color
        black /// Inset frame grid color
        black /// Scale bar and north arrow color
        "227 26 28" /// Hu Huanyong line color
        )) ///
  polygon(data(polygon) fcolor(black) ///
    osize(vvthin)) ///
  label(data(chinaprov40_label) x(X) y(Y) label(cname) length(20) size(*0.8)) ///
  ti("2015 年中国各省平均人口密度") ///
  subti("绘制:微信公众号 RStata") ///
  caption("数据来源:中国科学院资源环境科学与数据中心", size(*0.8))
gr export pic7.png, replace width(1200)

Adjusting North Arrow and Scale Bar Placement

Both the orientation marker and the scale bar comprise three components: line data, polygon data, and label data. Therefore, relocating these elements necessitates modifying the coordinates within these three respective data structures. Before adjustment, applying the freestyle option in the plotting command assists in identifying the specific numerical values requiring modification:

To precisely reposition these elements, the coordinates within the associated data files must be altered.

* Shifting the north arrow to the upper right
use chinaprov40_line_db.dta, clear
* North arrow component IDs are 40 and 41
use chinaprov40_line_coord.dta, clear
replace _X = _X + 3000000 if inlist(_ID, 40, 41)
replace _Y = _Y + 4000000 if inlist(_ID, 40, 41)
save chinaprov40_line_coord2.dta, replace 

use polygon, clear
replace _X = _X + 3000000 if _ID == 38
replace _Y = _Y + 4000000 if _ID == 38
save polygon2, replace

use chinaprov40_label, clear
replace X = X + 3000000 if cname == "N"
replace Y = Y + 4000000 if cname == "N"
save chinaprov40_label2, replace 

use chinaprov40_db.dta, clear 
encode 类型, gen(type)
grmap type using chinaprov40_coord.dta, ///
  id(ID) osize(vvthin ...) ocolor(white ...) ///
  clmethod(custom) clbreaks(0 1 2 3 4 5) ///
  fcolor("254 212 57" "253 116 70" "138 145 151" "213 228 162" "210 175 129") ///
  leg(order(2 "不统计" 3 "特别行政区" 4 "直辖市" 5 "省" 6 "自治区" 11 "秦岭-淮河线" 14 "胡焕庸线")) ///
  graphr(margin(medium)) ///
  line(data(chinaprov40_line_coord2.dta) by(group) size(vvthin *1 *0.5 *1.2 *0.5 *0.5 *1.2) pattern(solid ...) ///
    color(white /// Provincial boundary color
        black /// National boundary color
        "0 85 170" /// Coastline color
        "24 188 156" /// Qinling-Huaihe line color
        black /// Inset frame grid color
        black /// Scale bar and north arrow color
        "227 26 28" /// Hu Huanyong line color
        )) ///
  polygon(data(polygon2) fcolor(black) ///
    osize(vvthin)) ///
  label(data(chinaprov40_label2) x(X) y(Y) label(cname) length(20) size(*0.8)) ///
  ti("使用 Stata 绘制 2019 年中国省级行政区划") ///
  subti("绘制:微信公众号 RStata") ///
  caption("版本:使用 Stata 绘制中国省级地图数据包 4.0", size(*0.8))
gr export pic8.png, replace width(1200)
* Adjusting the scale bar position (shifting slightly upward)
use chinaprov40_line_db.dta, clear
* Scale bar component IDs are 42 and 43
use chinaprov40_line_coord2.dta, clear
replace _Y = _Y + 200000 if inlist(_ID, 42, 43)
save chinaprov40_line_coord3.dta, replace 

use polygon2, clear
replace _Y = _Y + 200000 if _ID == 39
save polygon3, replace

use chinaprov40_label2, clear
replace Y = Y + 200000 if cname == "1000km"
save chinaprov40_label3, replace 

use chinaprov40_db.dta, clear 
encode 类型, gen(type)
grmap type using chinaprov40_coord.dta, ///
  id(ID) osize(vvthin ...) ocolor(white ...) ///
  clmethod(custom) clbreaks(0 1 2 3 4 5) ///
  fcolor("254 212 57" "253 116 70" "138 145 151" "213 228 162" "210 175 129") ///
  leg(order(2 "不统计" 3 "特别行政区" 4 "直辖市" 5 "省" 6 "自治区" 11 "秦岭-淮河线" 14 "胡焕庸线")) ///
  graphr(margin(medium)) ///
  line(data(chinaprov40_line_coord3.dta) by(group) size(vvthin *1 *0.5 *1.2 *0.5 *0.5 *1.2) pattern(solid ...) ///
    color(white /// Provincial boundary color
        black /// National boundary color
        "0 85 170" /// Coastline color
        "24 188 156" /// Qinling-Huaihe line color
        black /// Inset frame grid color
        black /// Scale bar and north arrow color
        "227 26 28" /// Hu Huanyong line color
        )) ///
  polygon(data(polygon3) fcolor(black) ///
    osize(vvthin)) ///
  label(data(chinaprov40_label3) x(X) y(Y) label(cname) length(20) size(*0.8)) ///
  ti("使用 Stata 绘制 2019 年中国省级行政区划") ///
  subti("绘制:微信公众号 RStata") ///
  caption("版本:使用 Stata 绘制中国省级地图数据包 4.0", size(*0.8))
gr export pic9.png, replace width(1200)
  • Choropleth mapping in statistical software
  • Geographic Information Systems (GIS) in social science
  • Spatial data visualization techniques
  • Cartographic projections for regional analysis
  • The Hu Huanyong Line as a demographic divider

Extra Details

This tool package relies on comprehensive coordinate data files that define various spatial elements beyond simple provincial outlines. These elements include specific geological or administrative lines critical for regional studies, such as the Qinling-Huaihe line, which often delineates ecological or climatic zones. Scientific instrument design prioritizes accuracy and reproducibility; thus, users must manage coordinate reference system (CRS) transformations carefully when integrating external georeferenced point data, as illustrated in the coordinate transformation example.

Conclusion

This Stata mapping utility provides a rigorous framework for transforming raw statistical inputs into visually informative geographic displays. Its capacity to customize essential cartographic elements ensures that the resulting visualizations serve as reliable instruments for empirical investigation within the domain of scientific research concerning regional Chinese geography and socioeconomic phenomena. Continued refinement of these spatial analysis capabilities enhances quantitative social science methodology.

return

See Also

`